Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
227 changes: 88 additions & 139 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@
- [Setup](#setup)
- [Usage](#usage)
- [As a CLI Tool](#as-a-cli-tool)
- [Generate Dictionary for Arguments](#generate-dictionary-for-arguments)
- [Input Streams Detection](#detect-input-streams)
- [Arguments Fuzzing](#fuzz-arguments)
- [Get Help](#get-help)
- [Arguments Dictionary Generation](#arguments-dictionary-generation)
- [Input Streams Detection](#input-streams-detection)
- [Arguments Fuzzing](#arguments-fuzzing)
- [Help](#help)
- [As a Python Module](#as-a-python-module)
- [Input Streams Detection](#detect-input-streams-1)
- [Arguments Fuzzing](#fuzz-arguments-1)
- [Input Streams Detection](#input-streams-detection-1)
- [Arguments Fuzzing](#arguments-fuzzing-1)

---

Expand All @@ -23,179 +23,128 @@
`attack_surface_approximation` is the CRS module that deals with the approximation of the attack surface in a vulnerable program.

Some input mechanisms are omitted: elements of the user interface, signals, devices and interrupts. At the moment, the supported mechanisms are the following:
- Files;
- Arguments;
- Standard input;
- Networking; and
- Environment variables.

- files
- command-line arguments
- standard input
- networking
- environment variables

In addition, a custom fuzzer is implemented to discover arguments that trigger different code coverage.
It takes arguments from a dictionary which can be handcrafted or generated with an exposed command, with an implemented heuristic.
In addition, a custom fuzzer is implemented to discover arguments that trigger different code coverage. It takes arguments from a dictionary which can be handcrafted or generated with an exposed command, with an implemented heuristic.

Examples of arguments dictionaries can be found in `examples/dictionaries`:

- `man.txt`: generated with the `man_parsing` heuristic and having 6605 entries
- `generation.txt`: generated with the `generation` heuristic and having 62 entries
- `man.txt`, generated with the `man_parsing` heuristic and having 6605 entries; and
- `common.txt`, generated with the `generation` heuristic and having 62 entries.

### Limitations

- ELF format
- x86 architecture
- dynamic binaries (static binaries are not supported)
- symbols present (namely, no stripping is involved)
- no obfuscation technique involved
- x86 architecture (32-bit)
- Non-static binaries
- Symbols present (namely, no stripping is involved); binaries compiled without debug symbols (`-g`) may cause Ghidra to fail resolving function calls, leading to incomplete detection results
- No obfuscation technique involved
- **Binary compatibility for fuzzing**: the argument fuzzer runs inside a Docker container based on Ubuntu 18.04 (GLIBC 2.27). Binaries compiled on modern systems that require a newer GLIBC version will fail to execute inside the container. To work around this, compile the target binary inside the QBDI Docker container itself before fuzzing.
- **Incomplete argument detection**: flags that trigger identical QBDI basic block paths (e.g., multiple simple flags that all resolve to a `break` in a switch statement) will share the same hash. Only the first occurrence is reported; subsequent flags with the same hash are suppressed by the deduplication mechanism.
- **False positive filtering relies on `getopt` stderr reporting**: the module filters out invalid options by checking whether the binary writes to stderr when run with that argument — standard `getopt` behavior. Programs that use custom option parsers and suppress error output may still produce false positives.

## How It Works

The module works by automating [Ghidra](https://ghidra-sre.org/) for static binary analysis.
It extracts information and applies heuristics to determine if a given input stream is present.
The module works by automating Ghidra for static binary analysis. It extracts information and applies heuristics to determine if a given input stream is present.

Examples of such heuristics are:
- For standard input, calls to `getc()` and `gets()`
- For networking, calls to `recv()` and `recvfrom()`
- For arguments, occurrences of `argc` and `argv` in the `main()`'s decompilation.

- for standard input: calls to `getc()` and `gets()`
- for networking: calls to `recv()` and `recvfrom()`
- for command-line arguments: occurrences of `argc` and `argv` in `main()`

The argument fuzzer uses [Docker](https://www.docker.com/) for running and [QBDI](https://qbdi.quarkslab.com/) to detect basic-block coverage.
The argument fuzzer uses Docker and QBDI to detect basic block coverage.

## Setup

1. Make sure you have set up the repositories and Python environment according to the [top-level instructions](https://github.com/open-crs#requirements).
That is:

- Docker is installed and is properly running.
Check using:

```console
docker version
docker ps -a
docker run --rm hello-world
```

These commands should run without errors.

- The current module repository and all other module repositories (particularly the [`dataset` repository](https://github.com/open-crs/dataset) and the [`commons` repository](https://github.com/open-crs/commons)) are cloned in the same directory.

- You are running all commands inside a Python virtual environment.
There should be `(.venv)` prefix to your prompt.

- You have installed Poetry in the virtual environment.
If you run:

```console
which poetry
```

you should get a path ending with `.venv/bin/poetry`.

1. Disable the Python Keyring:

```console
export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring
1. Ensure you have Docker installed.
2. Install the required Python 3 packages via `poetry install`.
3. Build the QBDI Docker image:
```

This is a problem that may occur in certain situations, preventing Poetry from getting packages.

1. Install the required packages with Poetry (based on `pyprojects.toml`):

```console
poetry install --only main
cd commons/commons/qbdi/docker
docker build -t opencrs/qbdi .
```

1. Create the `ghidra` and `qbdi_args_fuzzing` Docker images by using the [instructions in the `commons` repository](https://github.com/open-crs/commons?tab=readme-ov-file#setup).

1. Optionally, generate executables by using the [instructions in the `dataset` repository](https://github.com/open-crs/dataset).
4. Ensure the Docker API is accessible by:
- Running the module as `root`; or
- Changing the Docker socket permissions (unsecure approach) via `chmod 777 /var/run/docker.sock`.

## Usage

You can use the `attack_surface_approximation` module either standalone, as a CLI tool, or integrated into Python applications, as a Python module.

### As a CLI Tool

As a CLI tool, you can either use the `cli.py` module:

```console
python attack_surface_approximation/cli.py
```

or the Poetry interface:
#### Arguments Dictionary Generation

```console
poetry run attack_surface_approximation
```

#### Generate Dictionary for Arguments

```console
$ poetry run attack_surface_approximation generate --heuristic man_parsing --output args.txt --top 100
➜ poetry run attack_surface_approximation generate --heuristic man --output args.txt --top 10
Successfully generated dictionary with 10 arguments

$ head args.txt
--allow-unrelated-histories
--analysis-display-unstable-clusters
--auto-area-segmentation
--backup-dir
--callstack-filter
--cidfile
--class
--codename
--column
--contained
➜ cat args.txt
--and
--get
--get-feedbacks
--no-progress-meter
--print-name
-input
-lmydep2
-miniswhite
-nM
-prune
```

#### Detect Input Streams
#### Input Streams Detection

Use an ELF i386 (32 bit) executable as target for detecting input streams.

For example, you can use one of the executables generated in the [`dataset` repository](https://github.com/open-crs/dataset):
```
➜ ./crackme
Enter the password: pass
Wrong password!
➜ poetry run attack_surface_approximation detect --elf crackme
Several input mechanisms were detected for the given program:

```console
$ ../dataset/executables/toy_test_suite_1.elf
Gimme two lines of input:
aaa
bbb
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Stream ┃ Present ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ files │ No │
│ arguments │ No │
│ stdin │ Yes │
│ networking │ No │
│ environment_variables │ No │
└───────────────────────┴─────────┘
```

Now, do the attack surface approximation:
#### Arguments Fuzzing

```console
$ poetry run attack_surface_approximation detect --elf $(pwd)/../dataset/executables/toy_test_suite_1.elf
Several input mechanisms were detected for the given program:
The target binary must be a 32-bit ELF dynamically linked against GLIBC 2.27 or earlier. If your binary was compiled on a modern system, compile it inside the QBDI container first:

┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Stream ┃ Present ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ STDIN │ Yes │
│ ARGUMENTS │ Yes │
│ FILES │ Yes │
│ ENVIRONMENT_VARIABLE │ Yes │
│ NETWORKING │ Yes │
└──────────────────────┴─────────┘
```
➜ docker run --rm --user root \
-v $(pwd)/examples:/examples \
opencrs/qbdi \
bash -c "gcc -m32 /examples/target.c -o /examples/target"
```

The executable used uses all potential input streams.

#### Fuzz Arguments
Then run the fuzzer:

```console
$ poetry run attack_surface_approximation fuzz --elf $(pwd)/../dataset/executables/toy_test_suite_1.elf --dictionary args.txt
```
poetry run attack_surface_approximation fuzz --elf examples/target --dictionary args.txt
Several arguments were detected for the given program:

┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Argument ┃ Role ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ - │ FLAG │
│ --re │ FLAG │
│ --re string │ STRING_ENABLER │
│ -mmusl │ FLAG │
└─────────────┴────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Argument ┃ Role ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ -d │ FLAG │
│ -f string │ STRING_ENABLER │
│ -r │ FLAG │
│ -s │ FLAG │
│ -v │ FLAG │
│ -f /tmp/canary.opencrs │ FILE_ENABLER │
└────────────────────────┴────────────────┘
```

#### Get Help
#### Help

```console
$ poetry run attack_surface_approximation
```
poetry run attack_surface_approximation
Usage: attack_surface_approximation [OPTIONS] COMMAND [ARGS]...

Discovers the attack surface of vulnerable programs.
Expand All @@ -212,7 +161,7 @@ Commands:

### As a Python Module

#### Detect Input Streams
#### Input Streams Detection

```python
from attack_surface_approximation.static_input_streams_detection import \
Expand All @@ -222,7 +171,7 @@ detector = InputStreamsDetector(elf_filename)
streams_list = detector.detect_all()
```

#### Fuzz Arguments
#### Arguments Fuzzing

```python
from attack_surface_approximation.arguments_fuzzing import ArgumentsFuzzer
Expand Down
20 changes: 16 additions & 4 deletions attack_surface_approximation/arguments_fuzzing/fuzzer.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from attack_surface_approximation.arguments_fuzzing.arguments_types import (
ArgumentsPair,
)
from commons.arguments import ArgumentRole
from attack_surface_approximation.arguments_fuzzing.fuzzing_sequence_generator import (
FuzzingSequenceGenerator,
)
Expand Down Expand Up @@ -56,11 +57,15 @@ def __generate_baseline_hashes(self) -> typing.Generator[str, None, None]:
for argument in arguments:
analysis_result = self.analysis.analyze(argument)

yield analysis_result.bbs_hash
if analysis_result.bbs_hash is not None:
yield analysis_result.bbs_hash

def __check_if_argument_is_valid(
self, argument: ArgumentsPair, result: QBDIAnalysis
) -> None:
) -> bool:
if result.bbs_hash is None:
return False

if (
argument.get_roles_based_on_analysis(result, self.baseline_hashes)
and result.bbs_hash not in self.old_hashes # noqa: W503
Expand Down Expand Up @@ -91,9 +96,16 @@ def get_valid_argument(
# generates a different hash than the baseline ones, it will be detected
# as a false flag because of the sequence generation: --flag first, --flag
# <string> afterwards.
self.old_hashes.append(result.bbs_hash)
if result.bbs_hash is not None:
self.old_hashes.append(result.bbs_hash)

self.arguments_generator.update_last_analysis_result(result)

def __is_false_positive(self, argument: ArgumentsPair) -> bool:
if ArgumentRole.FLAG not in argument.valid_roles and ArgumentRole.STRING_ENABLER not in argument.valid_roles:
return False
return self.analysis.produces_stderr(argument)

def get_all_valid_arguments(self) -> typing.List[ArgumentsPair]:
return list(self.get_valid_argument())
candidates = list(self.get_valid_argument())
return [a for a in candidates if not self.__is_false_positive(a)]
Original file line number Diff line number Diff line change
Expand Up @@ -78,14 +78,15 @@ def generate_fuzzing_arguments(
) -> ArgumentsGenerator:
arg = FileArgument(self.canary_filename)
yield arg
if ArgumentRole.FILE_ENABLER not in arg.get_roles_based_on_analysis(
self.last_analysis_result, bbs_hashes_baseline
):
for argument in self.arguments:
yield ArgumentPlusFileArgument(argument, self.canary_filename)

yield ArgumentArgument("-")

for argument in self.arguments:
yield ArgumentArgument(argument)
yield ArgumentStringArgument(argument, self.canary_string)

if ArgumentRole.FILE_ENABLER not in arg.get_roles_based_on_analysis(
self.last_analysis_result, bbs_hashes_baseline
):
for argument in self.arguments:
yield ArgumentPlusFileArgument(argument, self.canary_filename)
Loading