Add recursive directory and glob scanning to the CLI for bulk pickle triage by DevamShah · Pull Request #283 · trailofbits/fickling

DevamShah · 2026-06-23T01:36:30Z

Add recursive directory and glob scanning to the CLI for bulk pickle triage

Summary

Teach the fickling CLI to accept a directory or glob for PICKLE_FILE and add a -R/--recursive flag, so a whole tree of pickle files can be safety-scanned in one command instead of one invocation per file.

Problem / motivation

Today the CLI scans a single file at a time. Triaging anything larger — a downloaded model cache, an unpacked archive, or a set of artifacts pulled off a host during an engagement — means shelling out a find ... -exec fickling --check-safety loop and hand-rolling the exit-code aggregation. The HuggingFace path (--huggingface) already proves the bulk-scan pattern end to end; local directories had no equivalent. This closes that gap with the analysis primitives the codebase already exposes.

Change

PICKLE_FILE now accepts a directory or a glob pattern. When it resolves to either (or --recursive is set), the CLI routes to a new bulk-scan path instead of opening a single file. Single-file behavior, STDIN (-), and the --inject / --create / --trace modes are untouched — bulk scanning only triggers when none of those single-file operations were requested.
Added -R/--recursive to walk a directory tree depth-first. -r was already taken by --replace-result, so the recursive flag uses the conventional -R (as in grep -R).
File classification reuses the existing HF_PICKLE_EXTENSIONS sets already defined in cli.py: raw pickle extensions (.bin, .pkl, .pickle) go through scan_file, and the zip-based .pt/.pth archives go through scan_zip_archive — the same loader.py functions the HuggingFace scanner uses. No new analysis logic is introduced.
Per-file verdicts aggregate into a single ClamAV-compatible exit code, matching constants.py semantics: any unsafe file → EXIT_UNSAFE (1); otherwise any scan error → EXIT_ERROR (2); otherwise EXIT_CLEAN (0). Unsafe takes precedence over errors so a malicious file is never masked by an unrelated read failure. A directory/glob that resolves to nothing returns EXIT_ERROR rather than a misleading "clean".
Scanning is graceful=True, so a corrupt or unparseable member never aborts the run — it is reported and the walk continues. -p/--print-results prints a per-file breakdown plus a final verdict line, mirroring the HuggingFace output format.
All path handling uses pathlib to satisfy the repo's flake8-use-pathlib lint rules; no os.path/glob module use was added.
README documents the new usage under "Scanning a directory or glob".

Security rationale

Bulk pickle triage is a recurring need for AI/ML supply-chain defenders and incident responders. A single malicious __reduce__ in one file out of hundreds in a model cache is enough for arbitrary code execution on load — this is the deserialization risk tracked as CWE-502 (Deserialization of Untrusted Data) and called out in the OWASP Machine Learning Security Top 10 (ML06: ML Supply Chain Attacks) and OWASP LLM Top 10 (LLM03: Supply Chain). Maps to MITRE ATT&CK T1195.002 (Supply Chain Compromise: Compromise Software Supply Chain).

The exit-code precedence is deliberately fail-loud: an unsafe verdict dominates errors so CI gates and scan scripts cannot be tricked into a passing status by a single unreadable file. Because the feature only composes existing, already-reviewed scan_file/scan_zip_archive analysis, it introduces no new parsing or deserialization surface — detection fidelity (and false-positive rate) is identical to the per-file CLI; only the iteration and aggregation are new.

Testing / validation

Added test/test_cli_recursive.py (15 unittest.TestCase cases) covering: mixed safe/unsafe directories, all-safe and no-pickle directories, recursive vs. non-recursive descent (a nested unsafe file is correctly ignored without -R and caught with it), -R short flag, nonexistent target → EXIT_ERROR, glob matching (including a glob that deliberately excludes an unsafe .bin to prove no over-matching), recursive ** globs, .pt/.pth zip-member scanning, single-file pass-through, and --print-results summary output. Payloads are built with the __reduce__/os.system pattern already used in test/test_scan_api.py — no hand-assembled pickle bytes.

Run against the repo toolchain (uv sync --all-extras):

uv run pytest -q test/test_cli_recursive.py   # 15 passed
uv run pytest -q test/                         # 130 passed, 5 skipped, 29 subtests passed
uv run ruff format --check fickling/cli.py test/test_cli_recursive.py   # clean
uv run ruff check  fickling/cli.py test/test_cli_recursive.py           # All checks passed
uv run ty check fickling/                      # 80 diagnostics (unchanged baseline; 0 in new code)

Manual end-to-end smoke test confirmed exit codes on a real tree: non-recursive scan of a dir whose only unsafe file is nested returns 0; --recursive returns 1; a nonexistent target returns 2.

The CLI scans one file per invocation, so triaging a model cache or an unpacked archive means hand-rolling a find -exec loop and aggregating exit codes manually. The --huggingface path already proves the bulk-scan pattern; local directories had no equivalent. PICKLE_FILE now accepts a directory or a glob, and a new -R/--recursive flag walks a tree depth-first (-r was already --replace-result). File classification reuses the existing HF_PICKLE_EXTENSIONS sets and the same loader.scan_file / scan_zip_archive functions, so no new analysis or deserialization surface is added. Per-file verdicts aggregate into one ClamAV-compatible exit code: any unsafe file -> 1, else any error -> 2, else 0, so a malicious file is never masked by an unrelated read failure. Signed-off-by: Devam Shah <devamshah91@gmail.com>

CLAassistant · 2026-06-23T01:36:36Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

DevamShah requested a review from ESultanik as a code owner June 23, 2026 01:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add recursive directory and glob scanning to the CLI for bulk pickle triage#283

Add recursive directory and glob scanning to the CLI for bulk pickle triage#283
DevamShah wants to merge 1 commit into
trailofbits:masterfrom
DevamShah:add-recursive-directory-glob-scanning

DevamShah commented Jun 23, 2026

Uh oh!

CLAassistant commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DevamShah commented Jun 23, 2026

Add recursive directory and glob scanning to the CLI for bulk pickle triage

Summary

Problem / motivation

Change

Security rationale

Testing / validation

Uh oh!

CLAassistant commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants