open-crs · krpandrei05 · May 9, 2026 · May 9, 2026 · May 9, 2026 · May 9, 2026
diff --git a/README.md b/README.md
@@ -8,13 +8,13 @@
 - [Setup](#setup)
 - [Usage](#usage)
   - [As a CLI Tool](#as-a-cli-tool)
-    - [Generate Dictionary for Arguments](#generate-dictionary-for-arguments)
-    - [Input Streams Detection](#detect-input-streams)
-    - [Arguments Fuzzing](#fuzz-arguments)
-    - [Get Help](#get-help)
+    - [Arguments Dictionary Generation](#arguments-dictionary-generation)
+    - [Input Streams Detection](#input-streams-detection)
+    - [Arguments Fuzzing](#arguments-fuzzing)
+    - [Help](#help)
   - [As a Python Module](#as-a-python-module)
-    - [Input Streams Detection](#detect-input-streams-1)
-    - [Arguments Fuzzing](#fuzz-arguments-1)
+    - [Input Streams Detection](#input-streams-detection-1)
+    - [Arguments Fuzzing](#arguments-fuzzing-1)
 
 ---
 
@@ -23,179 +23,128 @@
 `attack_surface_approximation` is the CRS module that deals with the approximation of the attack surface in a vulnerable program.
 
 Some input mechanisms are omitted: elements of the user interface, signals, devices and interrupts. At the moment, the supported mechanisms are the following:
+- Files;
+- Arguments;
+- Standard input;
+- Networking; and
+- Environment variables.
 
-- files
-- command-line arguments
-- standard input
-- networking
-- environment variables
-
-In addition, a custom fuzzer is implemented to discover arguments that trigger different code coverage.
-It takes arguments from a dictionary which can be handcrafted or generated with an exposed command, with an implemented heuristic.
+In addition, a custom fuzzer is implemented to discover arguments that trigger different code coverage. It takes arguments from a dictionary which can be handcrafted or generated with an exposed command, with an implemented heuristic.
 
 Examples of arguments dictionaries can be found in `examples/dictionaries`:
-
-- `man.txt`: generated with the `man_parsing` heuristic and having 6605 entries
-- `generation.txt`: generated with the `generation` heuristic and having 62 entries
+- `man.txt`, generated with the `man_parsing` heuristic and having 6605 entries; and
+- `common.txt`, generated with the `generation` heuristic and having 62 entries.
 
 ### Limitations
 
 - ELF format
-- x86 architecture
-- dynamic binaries (static binaries are not supported)
-- symbols present (namely, no stripping is involved)
-- no obfuscation technique involved
+- x86 architecture (32-bit)
+- Non-static binaries
+- Symbols present (namely, no stripping is involved); binaries compiled without debug symbols (`-g`) may cause Ghidra to fail resolving function calls, leading to incomplete detection results
+- No obfuscation technique involved
+- **Binary compatibility for fuzzing**: the argument fuzzer runs inside a Docker container based on Ubuntu 18.04 (GLIBC 2.27). Binaries compiled on modern systems that require a newer GLIBC version will fail to execute inside the container. To work around this, compile the target binary inside the QBDI Docker container itself before fuzzing.
+- **Incomplete argument detection**: flags that trigger identical QBDI basic block paths (e.g., multiple simple flags that all resolve to a `break` in a switch statement) will share the same hash. Only the first occurrence is reported; subsequent flags with the same hash are suppressed by the deduplication mechanism.
+- **False positive filtering relies on `getopt` stderr reporting**: the module filters out invalid options by checking whether the binary writes to stderr when run with that argument — standard `getopt` behavior. Programs that use custom option parsers and suppress error output may still produce false positives.
 
 ## How It Works
 
-The module works by automating [Ghidra](https://ghidra-sre.org/) for static binary analysis.
-It extracts information and applies heuristics to determine if a given input stream is present.
+The module works by automating Ghidra for static binary analysis. It extracts information and applies heuristics to determine if a given input stream is present.
 
 Examples of such heuristics are:
+- For standard input, calls to `getc()` and `gets()`
+- For networking, calls to `recv()` and `recvfrom()`
+- For arguments, occurrences of `argc` and `argv` in the `main()`'s decompilation.
 
-- for standard input: calls to `getc()` and `gets()`
-- for networking: calls to `recv()` and `recvfrom()`
-- for command-line arguments: occurrences of `argc` and `argv` in `main()`
-
-The argument fuzzer uses [Docker](https://www.docker.com/) for running and [QBDI](https://qbdi.quarkslab.com/) to detect basic-block coverage.
+The argument fuzzer uses Docker and QBDI to detect basic block coverage.
 
 ## Setup
 
-1. Make sure you have set up the repositories and Python environment according to the [top-level instructions](https://github.com/open-crs#requirements).
-   That is:
-
-   - Docker is installed and is properly running.
-     Check using:
-
-     ```console
-     docker version
-     docker ps -a
-     docker run --rm hello-world
-     ```
-
-     These commands should run without errors.
-
-   - The current module repository and all other module repositories (particularly the [`dataset` repository](https://github.com/open-crs/dataset) and the [`commons` repository](https://github.com/open-crs/commons)) are cloned in the same directory.
-
-   - You are running all commands inside a Python virtual environment.
-     There should be `(.venv)` prefix to your prompt.
-
-   - You have installed Poetry in the virtual environment.
-     If you run:
-
-     ```console
-     which poetry
-     ```
-
-     you should get a path ending with `.venv/bin/poetry`.
-
-1. Disable the Python Keyring:
-
-   ```console
-   export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring
+1. Ensure you have Docker installed.
+2. Install the required Python 3 packages via `poetry install`.
+3. Build the QBDI Docker image:
    ```
-
-   This is a problem that may occur in certain situations, preventing Poetry from getting packages.
-
-1. Install the required packages with Poetry (based on `pyprojects.toml`):
-
-   ```console
-   poetry install --only main
+   cd commons/commons/qbdi/docker
+   docker build -t opencrs/qbdi .
    ```
-
-1. Create the `ghidra` and `qbdi_args_fuzzing` Docker images by using the [instructions in the `commons` repository](https://github.com/open-crs/commons?tab=readme-ov-file#setup).
-
-1. Optionally, generate executables by using the [instructions in the `dataset` repository](https://github.com/open-crs/dataset).
+4. Ensure the Docker API is accessible by:
+   - Running the module as `root`; or
+   - Changing the Docker socket permissions (unsecure approach) via `chmod 777 /var/run/docker.sock`.
 
 ## Usage
 
-You can use the `attack_surface_approximation` module either standalone, as a CLI tool, or integrated into Python applications, as a Python module.
-
 ### As a CLI Tool
 
-As a CLI tool, you can either use the `cli.py` module:
-
-```console
-python attack_surface_approximation/cli.py
-```
-
-or the Poetry interface:
+#### Arguments Dictionary Generation
 
-```console
-poetry run attack_surface_approximation
 ```
-
-#### Generate Dictionary for Arguments
-
-```console
-$ poetry run attack_surface_approximation generate --heuristic man_parsing --output args.txt --top 100
+➜ poetry run attack_surface_approximation generate --heuristic man --output args.txt --top 10
 Successfully generated dictionary with 10 arguments
-
-$ head args.txt
---allow-unrelated-histories
---analysis-display-unstable-clusters
---auto-area-segmentation
---backup-dir
---callstack-filter
---cidfile
---class
---codename
---column
---contained
+➜ cat args.txt
+--and
+--get
+--get-feedbacks
+--no-progress-meter
+--print-name
+-input
+-lmydep2
+-miniswhite
+-nM
+-prune
 ```
 
-#### Detect Input Streams
+#### Input Streams Detection
 
-Use an ELF i386 (32 bit) executable as target for detecting input streams.
-
-For example, you can use one of the executables generated in the [`dataset` repository](https://github.com/open-crs/dataset):
+```
+➜ ./crackme
+Enter the password: pass
+Wrong password!
+➜ poetry run attack_surface_approximation detect --elf crackme
+Several input mechanisms were detected for the given program:
 
-```console
-$ ../dataset/executables/toy_test_suite_1.elf
-Gimme two lines of input:
-aaa
-bbb
+┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
+┃ Stream                ┃ Present ┃
+┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
+│ files                 │   No    │
+│ arguments             │   No    │
+│ stdin                 │   Yes   │
+│ networking            │   No    │
+│ environment_variables │   No    │
+└───────────────────────┴─────────┘
 ```
 
-Now, do the attack surface approximation:
+#### Arguments Fuzzing
 
-```console
-$ poetry run attack_surface_approximation detect --elf $(pwd)/../dataset/executables/toy_test_suite_1.elf
-Several input mechanisms were detected for the given program:
+The target binary must be a 32-bit ELF dynamically linked against GLIBC 2.27 or earlier. If your binary was compiled on a modern system, compile it inside the QBDI container first:
 
-┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
-┃ Stream               ┃ Present ┃
-┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
-│ STDIN                │   Yes   │
-│ ARGUMENTS            │   Yes   │
-│ FILES                │   Yes   │
-│ ENVIRONMENT_VARIABLE │   Yes   │
-│ NETWORKING           │   Yes   │
-└──────────────────────┴─────────┘
+```
+➜ docker run --rm --user root \
+    -v $(pwd)/examples:/examples \
+    opencrs/qbdi \
+    bash -c "gcc -m32 /examples/target.c -o /examples/target"
 ```
 
-The executable used uses all potential input streams.
-
-#### Fuzz Arguments
+Then run the fuzzer:
 
-```console
-$ poetry run attack_surface_approximation fuzz --elf $(pwd)/../dataset/executables/toy_test_suite_1.elf --dictionary args.txt
+```
+➜ poetry run attack_surface_approximation fuzz --elf examples/target --dictionary args.txt
 Several arguments were detected for the given program:
 
-┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
-┃ Argument    ┃      Role      ┃
-┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
-│ -           │      FLAG      │
-│ --re        │      FLAG      │
-│ --re string │ STRING_ENABLER │
-│ -mmusl      │      FLAG      │
-└─────────────┴────────────────┘
+┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
+┃ Argument               ┃      Role      ┃
+┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
+│ -d                     │      FLAG      │
+│ -f string              │ STRING_ENABLER │
+│ -r                     │      FLAG      │
+│ -s                     │      FLAG      │
+│ -v                     │      FLAG      │
+│ -f /tmp/canary.opencrs │  FILE_ENABLER  │
+└────────────────────────┴────────────────┘
 ```
 
-#### Get Help
+#### Help
 
-```console
-$ poetry run attack_surface_approximation
+```
+➜ poetry run attack_surface_approximation
 Usage: attack_surface_approximation [OPTIONS] COMMAND [ARGS]...
 
   Discovers the attack surface of vulnerable programs.
@@ -212,7 +161,7 @@ Commands:
 
 ### As a Python Module
 
-#### Detect Input Streams
+#### Input Streams Detection
 
 ```python
 from attack_surface_approximation.static_input_streams_detection import \
@@ -222,7 +171,7 @@ detector = InputStreamsDetector(elf_filename)
 streams_list = detector.detect_all()
 ```
 
-#### Fuzz Arguments
+#### Arguments Fuzzing
 
 ```python
 from attack_surface_approximation.arguments_fuzzing import ArgumentsFuzzer

diff --git a/attack_surface_approximation/arguments_fuzzing/fuzzer.py b/attack_surface_approximation/arguments_fuzzing/fuzzer.py
@@ -3,6 +3,7 @@
 from attack_surface_approximation.arguments_fuzzing.arguments_types import (
     ArgumentsPair,
 )
+from commons.arguments import ArgumentRole
 from attack_surface_approximation.arguments_fuzzing.fuzzing_sequence_generator import (
     FuzzingSequenceGenerator,
 )
@@ -56,11 +57,15 @@ def __generate_baseline_hashes(self) -> typing.Generator[str, None, None]:
         for argument in arguments:
             analysis_result = self.analysis.analyze(argument)
 
-            yield analysis_result.bbs_hash
+            if analysis_result.bbs_hash is not None:
+                yield analysis_result.bbs_hash
 
     def __check_if_argument_is_valid(
         self, argument: ArgumentsPair, result: QBDIAnalysis
-    ) -> None:
+    ) -> bool:
+        if result.bbs_hash is None:
+            return False
+
         if (
             argument.get_roles_based_on_analysis(result, self.baseline_hashes)
             and result.bbs_hash not in self.old_hashes  # noqa: W503
@@ -91,9 +96,16 @@ def get_valid_argument(
             # generates a different hash than the baseline ones, it will be detected
             # as a false flag because of the sequence generation: --flag first, --flag
             # <string> afterwards.
-            self.old_hashes.append(result.bbs_hash)
+            if result.bbs_hash is not None:
+                self.old_hashes.append(result.bbs_hash)
 
             self.arguments_generator.update_last_analysis_result(result)
 
+    def __is_false_positive(self, argument: ArgumentsPair) -> bool:
+        if ArgumentRole.FLAG not in argument.valid_roles and ArgumentRole.STRING_ENABLER not in argument.valid_roles:
+            return False
+        return self.analysis.produces_stderr(argument)
+
     def get_all_valid_arguments(self) -> typing.List[ArgumentsPair]:
-        return list(self.get_valid_argument())
+        candidates = list(self.get_valid_argument())
+        return [a for a in candidates if not self.__is_false_positive(a)]
diff --git a/attack_surface_approximation/arguments_fuzzing/fuzzing_sequence_generator.py b/attack_surface_approximation/arguments_fuzzing/fuzzing_sequence_generator.py
@@ -78,14 +78,15 @@ def generate_fuzzing_arguments(
     ) -> ArgumentsGenerator:
         arg = FileArgument(self.canary_filename)
         yield arg
-        if ArgumentRole.FILE_ENABLER not in arg.get_roles_based_on_analysis(
-            self.last_analysis_result, bbs_hashes_baseline
-        ):
-            for argument in self.arguments:
-                yield ArgumentPlusFileArgument(argument, self.canary_filename)
 
         yield ArgumentArgument("-")
 
         for argument in self.arguments:
             yield ArgumentArgument(argument)
             yield ArgumentStringArgument(argument, self.canary_string)
+
+        if ArgumentRole.FILE_ENABLER not in arg.get_roles_based_on_analysis(
+            self.last_analysis_result, bbs_hashes_baseline
+        ):
+            for argument in self.arguments:
+                yield ArgumentPlusFileArgument(argument, self.canary_filename)