Add BYOD (Bring-Your-Own-Driver) detection module and signatures by wmetcalf · Pull Request #568 · CAPESandbox/community

wmetcalf · 2026-04-29T19:31:05Z

Overview

Adds detection for Bring-Your-Own-Driver (BYOD / BYOVD) attacks where a sample loads a kernel driver — typically a known-vulnerable signed driver from the LOLDrivers catalog — to bypass EDR/AV protection.

Two parts:

Processing module (modules/processing/loldrivers.py) walks the per-task Sysmon EVTX (EID 1/5/6/11) and System log (EID 7045) extracted from evtx/evtx.zip, classifies driver loads against the LOLDrivers feed, and correlates post-load exploitation activity against a security-tools catalog.

Four signatures (modules/signatures/all/byod.py) consuming that output:

Signature	Fires on
`byod_loldrivers_match`	Tiered LOLDrivers hit — SHA256 (critical) → signer+filename (high) → filename (medium). Capped at severity 3 when the driver IS the analyst-submitted sample.
`byod_novel_driver`	Unsigned/non-system driver heuristics (suspicious path, missing/invalid signature, dropper lineage, matching service install)
`byod_post_load_exploitation`	Driver load followed by termination of EDR/AV/sandbox tooling — Sysmon EID 5 within a 120s window OR executed-command kill verbs (`taskkill`/`Stop-Process`/`wmic delete`) attributed against a 406-entry security-tools catalog
`byod_driver_service_install`	Sample-attributed kernel-mode driver service install (filters legit Windows driver installs)

Files

File	Purpose
`modules/processing/loldrivers.py`	EVTX walker + tiered match + novel-driver heuristics + exploitation correlation
`modules/signatures/all/byod.py`	4 signatures consuming processing output
`data/security_tools.json`	406-entry exe → {tool, vendor, category} catalog (12 categories: EDR, AV, NetworkAnalysis, Sysinternals, ReverseEngineering, etc.)
`utils/fetch_loldrivers.py`	Admin-run fetcher for the LOLDrivers feed (chunked stream, atomic replace, schema check)
`.gitignore`	Excludes the fetched `data/loldrivers.json` (~30 MB, too large for git)

False-positive resistance

byod_driver_service_install gates on three sample-attributable signals (any one is sufficient):

sample_under_test — driver basename equals the submitted sample's basename
created_by_sample — Sysmon EID 11 ties the .sys creation to a monitored process
service_invoked_by_sample — executed_commands contains sc create/sc start against the driver's path or basename

Path-based gating (Temp/AppData/etc.) was deliberately rejected — the analyzer drops user-uploaded samples into the same paths it would match, so an analyst submitting a raw .sys to scan would FP. The cmdline-attribution branch catches the loader/dropper pattern (where the .sys is extracted by the analyzer rather than written by a monitored process, and the basename differs from the submission name) without firing on raw .sys submissions.

The post-load cmdline kill correlation runs once per analysis (not per driver-load) and only emits when at least one driver actually loaded — so a non-BYOD sample running taskkill in isolation won't trigger byod_post_load_exploitation.

Installation

The LOLDrivers feed (~30 MB, ~620 driver entries, ~2000 sample hashes) is too large to commit. Fetch on demand:

```
poetry run python utils/fetch_loldrivers.py
```

The fetcher streams in 1 MiB chunks to <dest>.tmp, validates the JSON shape and presence of KnownVulnerableSamples, and atomically swaps in via os.replace. The data/security_tools.json catalog ships in the PR.

If the feed is missing or malformed at processing time, the module logs a WARNING/ERROR and disables itself gracefully — the other three signatures continue to work via heuristics.

Enable in processing.conf:
```
[loldrivers]
enabled = yes
```

Test plan

End-to-end validated on a live CAPE deployment:

CAPE detonation of RTCore64.sys (a well-known LOLDriver, signed by Micro-Star International) wrapped in a .bat loader doing sc create RTCore64Test binPath= "...\Temp\RTCore64.sys" type= kernel + sc start + taskkill /F /IM MsMpEng.exe — all 4 BYOD signatures fired with correct attribution and severities
Raw .sys analyst submission (no loader) — driver never loads, 0 BYOD signals (correct, no FP)
Missing data/loldrivers.json — graceful WARNING, 3 of 4 sigs still fire via heuristics
Malformed feed (JSON dict instead of list) — graceful ERROR, 3 of 4 sigs still fire
Malformed feed (list with junk entries mixed in) — junk entries skipped, valid entries parsed, all 4 sigs fire
FP corpus check on 20 prior tasks (no driver loads) — 0 false-positive fires

Notes

Requires python-evtx (already a CAPE dependency).
Processing module is registered with order = 11 to run after sysmon (10) and before signatures.

…le-attributed sc create, harden feed loaders Restore the executed_commands kill-cmdline scan in exploitation correlation. The Sysmon EID 5 path only catches kills the kernel actually completed; many sandbox VMs ship without Defender/EDR running, so taskkill / Stop-Process / wmic-delete attempts against those tools never produce EID 5. The cmdline scan covers the *attempt*, which is the BYOD signal we want regardless of whether the target was running. Split the helper in two so the cmdline scan runs once per analysis (not per driver) and emits as a single scope=analysis batch — avoids duplicating findings when multiple drivers load. The cmdline batch is only attached when at least one driver actually loaded, so it can't fire on non-BYOD samples that happen to run taskkill. Replace the path-suspicious branch on byod_driver_service_install with a cmdline-attributable branch. The original sample_under_test/created_by_sample gate missed the most common BYOD pattern: a packed loader extracts the .sys before any monitored process can be attributed (so created_by_sample is False) and the .sys basename differs from the submission name (so sample_under_test is False). A path-based branch (Temp/AppData/etc.) would have FP'd because the analyzer drops user-uploaded samples into the same locations — an analyst submitting a raw .sys to scan would have triggered byod_driver_service_install. Instead match on whether the sample's executed commands invoked sc create / sc start against the driver's path or basename, which catches the loader/dropper case without firing on raw .sys submissions (those never invoke sc create). Harden _load_loldrivers and _load_tools against malformed feeds: validate top-level type after json.load and skip non-dict entries inside the parse loop. Previously a corrupted or hostile feed would propagate AttributeError out of the parse loop and crash the processing module. Verified end-to-end against: - bat-loader detonation (RTCore64.sys + sc create) — all 4 BYOD sigs fire, single-batch cmdline kill correlation - raw .sys analyst submission — driver never loads, zero sigs fire - missing data/loldrivers.json — graceful WARNING, 3 of 4 sigs still fire via heuristics - malformed feed (dict instead of list) — graceful ERROR, 3 of 4 sigs still fire - malformed feed (list with junk mixed in) — junk entries skipped, valid entries parsed, all 4 sigs fire

…d feed fetcher Adds four BYOD (Bring-Your-Own-Driver) detection signatures that consume the loldrivers processing module's output: byod_loldrivers_match Sample loaded a known vulnerable or malicious driver from the LOLDrivers catalog. Severity scales with match confidence: critical on SHA256, high on (signer, filename), medium on filename only. Capped at 3 (informational) when the driver IS the analyst-submitted sample (sample_under_test). byod_novel_driver Sample loaded an unsigned/non-system driver — possible novel BYOD not yet in the catalog. Fires on a combination of: non-system user-writable path, missing/invalid signature, dropper lineage from a monitored process, and matching kernel-driver service install. byod_post_load_exploitation A driver load was followed by termination of EDR/AV/sandbox tooling via either (a) Sysmon EID 5 within a 120s window of the driver load, or (b) a kill cmdline (taskkill / Stop-Process / wmic delete) captured during the analysis when at least one driver loaded. byod_driver_service_install Sample-attributed kernel-mode driver service install. Gates on sample_under_test, created_by_sample (Sysmon EID 11 from monitored process), or service_invoked_by_sample (executed_commands contains sc create / sc start matching the driver). Legitimate Windows kernel-driver installs during analysis are filtered. data/security_tools.json — 406-entry catalog of EDR/AV/Network/Forensics /Sysinternals/RE/Sandbox tools mapping exe basename → {tool, vendor, category}. Used by the post-load exploitation correlation to attribute kill targets to a known security category. utils/fetch_loldrivers.py — admin-run fetcher for the LOLDrivers community feed at https://www.loldrivers.io/api/drivers.json. Streams in 1 MiB chunks to a temp file, validates JSON shape and presence of at least one KnownVulnerableSamples entry, then atomically swaps in via os.replace. The feed (~30 MB) is too large to commit to git, so the gitignore excludes data/loldrivers.json. End-to-end validated on a CAPE detonation of RTCore64.sys (a well-known LOLDriver) wrapped in a batch-file loader that did sc create / sc start: all four signatures fired with the expected severities and attribution data, while a control submission of the same .sys without the loader produced zero BYOD signals (driver never loads, no signal).

gemini-code-assist

Code Review

This pull request introduces a BYOD (Bring-Your-Own-Driver) detection system, including a processing module for Sysmon/System log analysis, a suite of signatures, and a utility to fetch the LOLDrivers catalog. Feedback addresses a potential parsing failure with high-precision Sysmon timestamps, recommends explicit UTF-8 encoding for file I/O, and suggests sanitizing ZIP entry paths to prevent traversal vulnerabilities.

gemini-code-assist · 2026-04-29T19:33:16Z

+    if not s:
+        return None
+    s = s.strip().rstrip("Z")
+    for fmt in ("%Y-%m-%d %H:%M:%S.%f", "%Y-%m-%d %H:%M:%S"):


Sysmon timestamps often include more than 6 digits of fractional seconds (e.g., 7 digits). datetime.strptime with %f only supports up to 6 digits, and fromisoformat (in Python versions prior to 3.11) also has strict limits. This will cause a ValueError when parsing such records. Truncating the fractional seconds to 6 digits before parsing is a safer approach.

Suggested change

for fmt in ("%Y-%m-%d %H:%M:%S.%f", "%Y-%m-%d %H:%M:%S"):

if "." in s:

base, frac = s.split(".", 1)

s = f"{base}.{frac[:6]}"

for fmt in ("%Y-%m-%d %H:%M:%S.%f", "%Y-%m-%d %H:%M:%S"):

gemini-code-assist · 2026-04-29T19:33:16Z

+        _LOLD_CACHE = {"by_sha256": {}, "by_signer_name": {}, "by_name": {}, "entries": 0}
+        return _LOLD_CACHE
+    try:
+        with open(LOLDRIVERS_PATH) as f:


It is recommended to specify the encoding (e.g., utf-8) when opening the JSON feed to ensure consistent behavior across different platforms and locales.

Suggested change

with open(LOLDRIVERS_PATH) as f:

with open(LOLDRIVERS_PATH, encoding="utf-8") as f:

gemini-code-assist · 2026-04-29T19:33:16Z

+        _TOOLS_CACHE = {}
+        return _TOOLS_CACHE
+    try:
+        with open(SECURITY_TOOLS_PATH) as f:


It is recommended to specify the encoding (e.g., utf-8) when opening the JSON feed to ensure consistent behavior across different platforms and locales.

Suggested change

with open(SECURITY_TOOLS_PATH) as f:

with open(SECURITY_TOOLS_PATH, encoding="utf-8") as f:

gemini-code-assist · 2026-04-29T19:33:16Z

+                    if total_extracted > max_size:
+                        log.warning("evtx zip extraction exceeded %d bytes, aborting", max_size)
+                        break
+                    extracted_path = zf.extract(info, target_dir)


zipfile.extract() does not inherently prevent path traversal if the ZIP file contains absolute paths or paths with ... While the source of the ZIP is internal to the sandbox, it is a security best practice to sanitize the filename using os.path.basename to ensure files are extracted only into the target directory.

Suggested change

extracted_path = zf.extract(info, target_dir)

info.filename = os.path.basename(info.filename)

extracted_path = zf.extract(info, target_dir)

Copilot

Pull request overview

Adds BYOD (Bring-Your-Own-Driver / BYOVD) detection to the analysis pipeline by introducing a processing module that correlates EVTX driver-load telemetry against the LOLDrivers catalog and post-load “security tool kill” activity, plus signatures that surface these findings.

Changes:

New processing module (loldrivers) that parses evtx/evtx.zip, matches loaded drivers against a LOLDrivers feed, and correlates post-load exploitation activity using a shipped security-tools catalog.
New BYOD signature pack (4 signatures) consuming results["loldrivers"].
New admin fetcher for the LOLDrivers feed + .gitignore update; adds data/security_tools.json catalog.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
utils/fetch_loldrivers.py	Adds a downloader/validator for the external LOLDrivers JSON feed.
modules/processing/loldrivers.py	Implements EVTX parsing, tiered matching, heuristics, and exploitation correlation for BYOD.
modules/signatures/all/byod.py	Adds 4 signatures to emit detections based on `results["loldrivers"]`.
data/security_tools.json	Adds a shipped exe→tool/vendor/category catalog for post-load correlation.
.gitignore	Ignores the large fetched `data/loldrivers.json` feed file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-29T19:36:56Z

+    samples = sum(len(e.get("KnownVulnerableSamples") or []) for e in parsed)
+    if not parsed or samples == 0:
+        os.unlink(tmp)
+        print("error: feed contained no entries with KnownVulnerableSamples", file=sys.stderr)
+        return 5


samples = sum(len(e.get(...)) for e in parsed) assumes every element is a dict; if the feed contains non-dict entries this raises AttributeError and the fetcher crashes even though you handle malformed entries elsewhere. Filter to isinstance(e, dict) (or default {}) when counting samples so the script can still reject/accept based on actual usable entries.

Copilot · 2026-04-29T19:36:57Z

+def _filetime_to_dt(s):
+    """Parse Sysmon UtcTime (e.g. '2026-04-28 16:57:01.123') → aware UTC datetime."""
+    if not s:
+        return None
+    s = s.strip().rstrip("Z")
+    for fmt in ("%Y-%m-%d %H:%M:%S.%f", "%Y-%m-%d %H:%M:%S"):
+        try:
+            return datetime.strptime(s, fmt).replace(tzinfo=timezone.utc)
+        except ValueError:
+            pass
+    try:
+        return datetime.fromisoformat(s).replace(tzinfo=timezone.utc)
+    except Exception:
+        return None


_filetime_to_dt() won’t parse the 7-digit fractional seconds commonly found in EVTX SystemTime values (e.g. ...16:57:01.1234567Z), because %f only supports up to 6 digits and fromisoformat() has the same limitation. This will cause time-based correlation (EID 5 within 120s) to silently fail. Consider normalizing timestamps by truncating fractional seconds to 6 digits (or using a parser that supports 100ns precision) before calling strptime/fromisoformat.

Copilot · 2026-04-29T19:36:57Z

+    try:
+        with zipfile.ZipFile(zip_path) as zf:
+            for info in zf.infolist():
+                if any(f.lower() in info.filename.lower() for f in name_filters):
+                    total_extracted += info.file_size
+                    if total_extracted > max_size:
+                        log.warning("evtx zip extraction exceeded %d bytes, aborting", max_size)
+                        break
+                    extracted_path = zf.extract(info, target_dir)
+                    paths.append(extracted_path)


_extract_evtx() uses ZipFile.extract() on archive members without validating the member path or rejecting symlinks. Because evtx.zip originates from the guest, this can enable zip-slip (write outside the temp dir) and symlink attacks. modules/processing/sigma.py already implements safe extraction checks for this same archive; it’d be safer to reuse that approach here (realpath prefix check, reject symlinks, and total uncompressed-size guard).

Suggested change

try:

with zipfile.ZipFile(zip_path) as zf:

for info in zf.infolist():

if any(f.lower() in info.filename.lower() for f in name_filters):

total_extracted += info.file_size

if total_extracted > max_size:

log.warning("evtx zip extraction exceeded %d bytes, aborting", max_size)

break

extracted_path = zf.extract(info, target_dir)

paths.append(extracted_path)

target_dir_real = os.path.realpath(target_dir)

try:

with zipfile.ZipFile(zip_path) as zf:

for info in zf.infolist():

if not any(f.lower() in info.filename.lower() for f in name_filters):

continue

total_extracted += info.file_size

if total_extracted > max_size:

log.warning("evtx zip extraction exceeded %d bytes, aborting", max_size)

break

# Reject symlink entries from untrusted archives.

mode = (info.external_attr >> 16) & 0o177777

if (mode & 0o170000) == 0o120000:

log.warning("skipping symlink in evtx zip: %s", info.filename)

continue

extracted_path = os.path.realpath(os.path.join(target_dir_real, info.filename))

if extracted_path != target_dir_real and not extracted_path.startswith(target_dir_real + os.sep):

log.warning("skipping unsafe evtx zip member path: %s", info.filename)

continue

if info.is_dir():

os.makedirs(extracted_path, exist_ok=True)

continue

os.makedirs(os.path.dirname(extracted_path), exist_ok=True)

with zf.open(info) as src, open(extracted_path, "wb") as dst:

while True:

chunk = src.read(1024 * 1024)

if not chunk:

break

dst.write(chunk)

paths.append(extracted_path)

Copilot · 2026-04-29T19:36:57Z

+    def _is_sample_being_analyzed(self, driver_path):
+        """Return True if the driver file IS the sample under test."""
+        if not driver_path:
+            return False
+        target = (self.results.get("target") or {}).get("file") or {}
+        sample_name = (target.get("name") or "").lower()
+        bn = _basename(driver_path)
+        if sample_name and bn == sample_name:
+            return True
+        return False


_is_sample_being_analyzed() only compares the driver basename to results.target.file.name. If the submitted sample is foo.sys and the analysis also loads a different foo.sys from elsewhere, this will incorrectly mark it as sample_under_test and cap severity. Since the driver SHA256 is already extracted (and results.target.file.sha256 exists in other modules), compare hashes when available and only fall back to basename when hashes are missing.

Copilot · 2026-04-29T19:36:58Z

+        evtx_zip = os.path.join(self.analysis_path, "evtx", "evtx.zip")
+        if not os.path.exists(evtx_zip):
+            return result
+
+        with tempfile.TemporaryDirectory() as td:
+            sysmon_paths = _extract_evtx(evtx_zip, ["Sysmon"], td)
+            system_paths = _extract_evtx(evtx_zip, ["_System.evtx"], td)
+
+            sysmon_records = []
+            for p in sysmon_paths:
+                sysmon_records.extend(_parse_evtx_records(p, {"1", "5", "6", "11"}))
+
+            system_records = []
+            for p in system_paths:
+                system_records.extend(_parse_evtx_records(p, {"7045"}))
+


The module attempts to extract System log events (EID 7045) from evtx/evtx.zip via name_filters=["_System.evtx"], but the analyzer-side EVTX dump currently only includes Microsoft-Windows-Sysmon%4Operational.evtx (see analyzer/windows/modules/auxiliary/sysmon.py). As a result, system_records/service_installs will always be empty in the default setup, and the byod_driver_service_install signature won’t fire. Consider either (a) updating the EVTX collection to include System.evtx, or (b) adjusting the processing module to detect installs via available telemetry when System.evtx isn’t present (and/or fix the filename filter to match System.evtx).

…inator, and System.evtx fallback Address review feedback from gemini-code-assist and Copilot on PR CAPESandbox#568. _filetime_to_dt: parse Sysmon SystemTime values with 7-digit fractional seconds (100ns precision) and ISO timezone offsets. Strip a trailing Z to "+00:00", truncate fractional seconds to 6 digits, try fromisoformat first for native offset handling, fall back to strptime. _extract_evtx: full defense-in-depth on the analyzer-produced zip. Reject symlink entries via Unix mode bits, sanitize each entry's filename to its basename to drop absolute paths and `..` traversal, realpath-check the destination stays inside target_dir, and stream via zf.open + chunked write rather than zf.extract (avoids zf.extract's reliance on the zip's filename metadata). _is_sample_being_analyzed: prefer SHA256 comparison when both the driver hash (Sysmon EID 6 Hashes field) and target.file.sha256 are available — eliminates basename collisions where the sample and an unrelated driver happen to share a filename. Falls back to basename when hashes are missing. _load_loldrivers / _load_tools: add encoding="utf-8" on open() for consistent behavior across platforms and locales. _extract_evtx system filter widened from "_System.evtx" to "System.evtx" — matches both `System.evtx` and `1_System.evtx` / `2_System.evtx` periodic snapshots. Synthesize service-install entries from executed_commands when the analyzer doesn't dump System.evtx. CAPE deployments that only collect Sysmon would never produce EID 7045, so byod_driver_service_install would never fire on real BYOD chains in default upstream config. The synthesis path parses `sc(.exe) create <name> binPath=...sys` patterns from the cape-monitor-captured executed_commands stream and emits synthetic kernel-mode-driver service-install entries. EID 7045 entries are preferred when present (real timestamps); synthesized entries are deduped by (service_name, .sys basename) so a deployment with both sources doesn't double-emit. fetch_loldrivers.py: filter the sample-count tally to isinstance(e, dict) so a feed with mixed non-dict entries doesn't AttributeError during the success-path printf. Re-verified end-to-end: - bat-loader detonation with System.evtx present (real EID 7045) — install entry source = eid7045, cmdline synthesis correctly deduped - same task with System.evtx files stripped from evtx.zip (simulating Sysmon-only deployment) — install entry source = cmdline, byod_driver_service_install still fires - raw .sys analyst submission — driver never loads, 0 BYOD signals

…-evtx `_parse_evtx_records` previously called python-evtx and serialized every record to XML via `record.xml()` before checking the EventID, then ran a regex over the rendered XML to extract `<Data Name="X">value</Data>` pairs. On a typical sandbox sysmon snapshot of ~7000 records that costs ~50 seconds even when ~99% of records get filtered out by EID — pure serialization overhead. The Rust-backed evtx-rs library (PyPI: `evtx`, package: `from evtx import PyEvtxParser`) is also commonly present on CAPE deployments (it's what `evtx_dump` ships) and parses the same EVTX into already- structured JSON ~150x faster: sub-second on the same 7000-record input. Each record's EventData is a dict keyed by Data-name, so we also drop the regex pass and just normalize values to strings. Try evtx-rs first; if `from evtx import PyEvtxParser` raises ImportError, fall back to the existing python-evtx + regex path unchanged. The yielded record shape (`{eid, time, data: {...}}`) is identical for both backends, so callers don't change. Verified output equivalence on a real EVTX: 759/759 matching records, zero field-set differences, zero value differences for EID 6 driver-load events.

wmetcalf added 3 commits April 29, 2026 14:26

Fix BYOVD processing security vulnerabilities

f105cab

Copilot AI review requested due to automatic review settings April 29, 2026 19:31

Copilot started reviewing on behalf of wmetcalf April 29, 2026 19:31 View session

gemini-code-assist Bot reviewed Apr 29, 2026

View reviewed changes

Copilot AI reviewed Apr 29, 2026

View reviewed changes

wmetcalf added 2 commits April 29, 2026 15:05

kevoreilly merged commit cc44525 into CAPESandbox:master May 1, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BYOD (Bring-Your-Own-Driver) detection module and signatures#568

Add BYOD (Bring-Your-Own-Driver) detection module and signatures#568
kevoreilly merged 5 commits into
CAPESandbox:masterfrom
wmetcalf:byod

wmetcalf commented Apr 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    for fmt in ("%Y-%m-%d %H:%M:%S.%f", "%Y-%m-%d %H:%M:%S"):
+    if "." in s:
+        base, frac = s.split(".", 1)
+        s = f"{base}.{frac[:6]}"
+    for fmt in ("%Y-%m-%d %H:%M:%S.%f", "%Y-%m-%d %H:%M:%S"):

	with open(LOLDRIVERS_PATH) as f:
	with open(LOLDRIVERS_PATH, encoding="utf-8") as f:

	with open(SECURITY_TOOLS_PATH) as f:
	with open(SECURITY_TOOLS_PATH, encoding="utf-8") as f:

	extracted_path = zf.extract(info, target_dir)
	info.filename = os.path.basename(info.filename)
	extracted_path = zf.extract(info, target_dir)

-    try:
-        with zipfile.ZipFile(zip_path) as zf:
-            for info in zf.infolist():
-                if any(f.lower() in info.filename.lower() for f in name_filters):
-                    total_extracted += info.file_size
-                    if total_extracted > max_size:
-                        log.warning("evtx zip extraction exceeded %d bytes, aborting", max_size)
-                        break
-                    extracted_path = zf.extract(info, target_dir)
-                    paths.append(extracted_path)
+    target_dir_real = os.path.realpath(target_dir)
+    try:
+        with zipfile.ZipFile(zip_path) as zf:
+            for info in zf.infolist():
+                if not any(f.lower() in info.filename.lower() for f in name_filters):
+                    continue
+                total_extracted += info.file_size
+                if total_extracted > max_size:
+                    log.warning("evtx zip extraction exceeded %d bytes, aborting", max_size)
+                    break
+                # Reject symlink entries from untrusted archives.
+                mode = (info.external_attr >> 16) & 0o177777
+                if (mode & 0o170000) == 0o120000:
+                    log.warning("skipping symlink in evtx zip: %s", info.filename)
+                    continue
+                extracted_path = os.path.realpath(os.path.join(target_dir_real, info.filename))
+                if extracted_path != target_dir_real and not extracted_path.startswith(target_dir_real + os.sep):
+                    log.warning("skipping unsafe evtx zip member path: %s", info.filename)
+                    continue
+                if info.is_dir():
+                    os.makedirs(extracted_path, exist_ok=True)
+                    continue
+                os.makedirs(os.path.dirname(extracted_path), exist_ok=True)
+                with zf.open(info) as src, open(extracted_path, "wb") as dst:
+                    while True:
+                        chunk = src.read(1024 * 1024)
+                        if not chunk:
+                            break
+                        dst.write(chunk)
+                paths.append(extracted_path)

Conversation

wmetcalf commented Apr 29, 2026

Overview

Files

False-positive resistance

Installation

Test plan

Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants