Add BYOD (Bring-Your-Own-Driver) detection module and signatures#568
Conversation
…le-attributed sc create, harden feed loaders
Restore the executed_commands kill-cmdline scan in exploitation correlation.
The Sysmon EID 5 path only catches kills the kernel actually completed; many
sandbox VMs ship without Defender/EDR running, so taskkill / Stop-Process /
wmic-delete attempts against those tools never produce EID 5. The cmdline
scan covers the *attempt*, which is the BYOD signal we want regardless of
whether the target was running. Split the helper in two so the cmdline scan
runs once per analysis (not per driver) and emits as a single
scope=analysis batch — avoids duplicating findings when multiple drivers
load. The cmdline batch is only attached when at least one driver actually
loaded, so it can't fire on non-BYOD samples that happen to run taskkill.
Replace the path-suspicious branch on byod_driver_service_install with a
cmdline-attributable branch. The original sample_under_test/created_by_sample
gate missed the most common BYOD pattern: a packed loader extracts the .sys
before any monitored process can be attributed (so created_by_sample is
False) and the .sys basename differs from the submission name (so
sample_under_test is False). A path-based branch (Temp/AppData/etc.) would
have FP'd because the analyzer drops user-uploaded samples into the same
locations — an analyst submitting a raw .sys to scan would have triggered
byod_driver_service_install. Instead match on whether the sample's executed
commands invoked sc create / sc start against the driver's path or
basename, which catches the loader/dropper case without firing on raw
.sys submissions (those never invoke sc create).
Harden _load_loldrivers and _load_tools against malformed feeds: validate
top-level type after json.load and skip non-dict entries inside the parse
loop. Previously a corrupted or hostile feed would propagate
AttributeError out of the parse loop and crash the processing module.
Verified end-to-end against:
- bat-loader detonation (RTCore64.sys + sc create) — all 4 BYOD sigs
fire, single-batch cmdline kill correlation
- raw .sys analyst submission — driver never loads, zero sigs fire
- missing data/loldrivers.json — graceful WARNING, 3 of 4 sigs still
fire via heuristics
- malformed feed (dict instead of list) — graceful ERROR, 3 of 4 sigs
still fire
- malformed feed (list with junk mixed in) — junk entries skipped,
valid entries parsed, all 4 sigs fire
…d feed fetcher
Adds four BYOD (Bring-Your-Own-Driver) detection signatures that consume
the loldrivers processing module's output:
byod_loldrivers_match
Sample loaded a known vulnerable or malicious driver from the
LOLDrivers catalog. Severity scales with match confidence: critical
on SHA256, high on (signer, filename), medium on filename only.
Capped at 3 (informational) when the driver IS the analyst-submitted
sample (sample_under_test).
byod_novel_driver
Sample loaded an unsigned/non-system driver — possible novel BYOD
not yet in the catalog. Fires on a combination of: non-system
user-writable path, missing/invalid signature, dropper lineage from
a monitored process, and matching kernel-driver service install.
byod_post_load_exploitation
A driver load was followed by termination of EDR/AV/sandbox tooling
via either (a) Sysmon EID 5 within a 120s window of the driver
load, or (b) a kill cmdline (taskkill / Stop-Process / wmic delete)
captured during the analysis when at least one driver loaded.
byod_driver_service_install
Sample-attributed kernel-mode driver service install. Gates on
sample_under_test, created_by_sample (Sysmon EID 11 from monitored
process), or service_invoked_by_sample (executed_commands contains
sc create / sc start matching the driver). Legitimate Windows
kernel-driver installs during analysis are filtered.
data/security_tools.json — 406-entry catalog of EDR/AV/Network/Forensics
/Sysinternals/RE/Sandbox tools mapping exe basename → {tool, vendor,
category}. Used by the post-load exploitation correlation to attribute
kill targets to a known security category.
utils/fetch_loldrivers.py — admin-run fetcher for the LOLDrivers
community feed at https://www.loldrivers.io/api/drivers.json. Streams in
1 MiB chunks to a temp file, validates JSON shape and presence of at
least one KnownVulnerableSamples entry, then atomically swaps in via
os.replace. The feed (~30 MB) is too large to commit to git, so the
gitignore excludes data/loldrivers.json.
End-to-end validated on a CAPE detonation of RTCore64.sys (a well-known
LOLDriver) wrapped in a batch-file loader that did sc create /
sc start: all four signatures fired with the expected severities and
attribution data, while a control submission of the same .sys without
the loader produced zero BYOD signals (driver never loads, no signal).
There was a problem hiding this comment.
Code Review
This pull request introduces a BYOD (Bring-Your-Own-Driver) detection system, including a processing module for Sysmon/System log analysis, a suite of signatures, and a utility to fetch the LOLDrivers catalog. Feedback addresses a potential parsing failure with high-precision Sysmon timestamps, recommends explicit UTF-8 encoding for file I/O, and suggests sanitizing ZIP entry paths to prevent traversal vulnerabilities.
| if not s: | ||
| return None | ||
| s = s.strip().rstrip("Z") | ||
| for fmt in ("%Y-%m-%d %H:%M:%S.%f", "%Y-%m-%d %H:%M:%S"): |
There was a problem hiding this comment.
Sysmon timestamps often include more than 6 digits of fractional seconds (e.g., 7 digits). datetime.strptime with %f only supports up to 6 digits, and fromisoformat (in Python versions prior to 3.11) also has strict limits. This will cause a ValueError when parsing such records. Truncating the fractional seconds to 6 digits before parsing is a safer approach.
| for fmt in ("%Y-%m-%d %H:%M:%S.%f", "%Y-%m-%d %H:%M:%S"): | |
| if "." in s: | |
| base, frac = s.split(".", 1) | |
| s = f"{base}.{frac[:6]}" | |
| for fmt in ("%Y-%m-%d %H:%M:%S.%f", "%Y-%m-%d %H:%M:%S"): |
| _LOLD_CACHE = {"by_sha256": {}, "by_signer_name": {}, "by_name": {}, "entries": 0} | ||
| return _LOLD_CACHE | ||
| try: | ||
| with open(LOLDRIVERS_PATH) as f: |
| _TOOLS_CACHE = {} | ||
| return _TOOLS_CACHE | ||
| try: | ||
| with open(SECURITY_TOOLS_PATH) as f: |
There was a problem hiding this comment.
| if total_extracted > max_size: | ||
| log.warning("evtx zip extraction exceeded %d bytes, aborting", max_size) | ||
| break | ||
| extracted_path = zf.extract(info, target_dir) |
There was a problem hiding this comment.
zipfile.extract() does not inherently prevent path traversal if the ZIP file contains absolute paths or paths with ... While the source of the ZIP is internal to the sandbox, it is a security best practice to sanitize the filename using os.path.basename to ensure files are extracted only into the target directory.
| extracted_path = zf.extract(info, target_dir) | |
| info.filename = os.path.basename(info.filename) | |
| extracted_path = zf.extract(info, target_dir) |
There was a problem hiding this comment.
Pull request overview
Adds BYOD (Bring-Your-Own-Driver / BYOVD) detection to the analysis pipeline by introducing a processing module that correlates EVTX driver-load telemetry against the LOLDrivers catalog and post-load “security tool kill” activity, plus signatures that surface these findings.
Changes:
- New processing module (
loldrivers) that parsesevtx/evtx.zip, matches loaded drivers against a LOLDrivers feed, and correlates post-load exploitation activity using a shipped security-tools catalog. - New BYOD signature pack (4 signatures) consuming
results["loldrivers"]. - New admin fetcher for the LOLDrivers feed +
.gitignoreupdate; addsdata/security_tools.jsoncatalog.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| utils/fetch_loldrivers.py | Adds a downloader/validator for the external LOLDrivers JSON feed. |
| modules/processing/loldrivers.py | Implements EVTX parsing, tiered matching, heuristics, and exploitation correlation for BYOD. |
| modules/signatures/all/byod.py | Adds 4 signatures to emit detections based on results["loldrivers"]. |
| data/security_tools.json | Adds a shipped exe→tool/vendor/category catalog for post-load correlation. |
| .gitignore | Ignores the large fetched data/loldrivers.json feed file. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| samples = sum(len(e.get("KnownVulnerableSamples") or []) for e in parsed) | ||
| if not parsed or samples == 0: | ||
| os.unlink(tmp) | ||
| print("error: feed contained no entries with KnownVulnerableSamples", file=sys.stderr) | ||
| return 5 |
There was a problem hiding this comment.
samples = sum(len(e.get(...)) for e in parsed) assumes every element is a dict; if the feed contains non-dict entries this raises AttributeError and the fetcher crashes even though you handle malformed entries elsewhere. Filter to isinstance(e, dict) (or default {}) when counting samples so the script can still reject/accept based on actual usable entries.
| def _filetime_to_dt(s): | ||
| """Parse Sysmon UtcTime (e.g. '2026-04-28 16:57:01.123') → aware UTC datetime.""" | ||
| if not s: | ||
| return None | ||
| s = s.strip().rstrip("Z") | ||
| for fmt in ("%Y-%m-%d %H:%M:%S.%f", "%Y-%m-%d %H:%M:%S"): | ||
| try: | ||
| return datetime.strptime(s, fmt).replace(tzinfo=timezone.utc) | ||
| except ValueError: | ||
| pass | ||
| try: | ||
| return datetime.fromisoformat(s).replace(tzinfo=timezone.utc) | ||
| except Exception: | ||
| return None |
There was a problem hiding this comment.
_filetime_to_dt() won’t parse the 7-digit fractional seconds commonly found in EVTX SystemTime values (e.g. ...16:57:01.1234567Z), because %f only supports up to 6 digits and fromisoformat() has the same limitation. This will cause time-based correlation (EID 5 within 120s) to silently fail. Consider normalizing timestamps by truncating fractional seconds to 6 digits (or using a parser that supports 100ns precision) before calling strptime/fromisoformat.
| try: | ||
| with zipfile.ZipFile(zip_path) as zf: | ||
| for info in zf.infolist(): | ||
| if any(f.lower() in info.filename.lower() for f in name_filters): | ||
| total_extracted += info.file_size | ||
| if total_extracted > max_size: | ||
| log.warning("evtx zip extraction exceeded %d bytes, aborting", max_size) | ||
| break | ||
| extracted_path = zf.extract(info, target_dir) | ||
| paths.append(extracted_path) |
There was a problem hiding this comment.
_extract_evtx() uses ZipFile.extract() on archive members without validating the member path or rejecting symlinks. Because evtx.zip originates from the guest, this can enable zip-slip (write outside the temp dir) and symlink attacks. modules/processing/sigma.py already implements safe extraction checks for this same archive; it’d be safer to reuse that approach here (realpath prefix check, reject symlinks, and total uncompressed-size guard).
| try: | |
| with zipfile.ZipFile(zip_path) as zf: | |
| for info in zf.infolist(): | |
| if any(f.lower() in info.filename.lower() for f in name_filters): | |
| total_extracted += info.file_size | |
| if total_extracted > max_size: | |
| log.warning("evtx zip extraction exceeded %d bytes, aborting", max_size) | |
| break | |
| extracted_path = zf.extract(info, target_dir) | |
| paths.append(extracted_path) | |
| target_dir_real = os.path.realpath(target_dir) | |
| try: | |
| with zipfile.ZipFile(zip_path) as zf: | |
| for info in zf.infolist(): | |
| if not any(f.lower() in info.filename.lower() for f in name_filters): | |
| continue | |
| total_extracted += info.file_size | |
| if total_extracted > max_size: | |
| log.warning("evtx zip extraction exceeded %d bytes, aborting", max_size) | |
| break | |
| # Reject symlink entries from untrusted archives. | |
| mode = (info.external_attr >> 16) & 0o177777 | |
| if (mode & 0o170000) == 0o120000: | |
| log.warning("skipping symlink in evtx zip: %s", info.filename) | |
| continue | |
| extracted_path = os.path.realpath(os.path.join(target_dir_real, info.filename)) | |
| if extracted_path != target_dir_real and not extracted_path.startswith(target_dir_real + os.sep): | |
| log.warning("skipping unsafe evtx zip member path: %s", info.filename) | |
| continue | |
| if info.is_dir(): | |
| os.makedirs(extracted_path, exist_ok=True) | |
| continue | |
| os.makedirs(os.path.dirname(extracted_path), exist_ok=True) | |
| with zf.open(info) as src, open(extracted_path, "wb") as dst: | |
| while True: | |
| chunk = src.read(1024 * 1024) | |
| if not chunk: | |
| break | |
| dst.write(chunk) | |
| paths.append(extracted_path) |
| def _is_sample_being_analyzed(self, driver_path): | ||
| """Return True if the driver file IS the sample under test.""" | ||
| if not driver_path: | ||
| return False | ||
| target = (self.results.get("target") or {}).get("file") or {} | ||
| sample_name = (target.get("name") or "").lower() | ||
| bn = _basename(driver_path) | ||
| if sample_name and bn == sample_name: | ||
| return True | ||
| return False |
There was a problem hiding this comment.
_is_sample_being_analyzed() only compares the driver basename to results.target.file.name. If the submitted sample is foo.sys and the analysis also loads a different foo.sys from elsewhere, this will incorrectly mark it as sample_under_test and cap severity. Since the driver SHA256 is already extracted (and results.target.file.sha256 exists in other modules), compare hashes when available and only fall back to basename when hashes are missing.
| evtx_zip = os.path.join(self.analysis_path, "evtx", "evtx.zip") | ||
| if not os.path.exists(evtx_zip): | ||
| return result | ||
|
|
||
| with tempfile.TemporaryDirectory() as td: | ||
| sysmon_paths = _extract_evtx(evtx_zip, ["Sysmon"], td) | ||
| system_paths = _extract_evtx(evtx_zip, ["_System.evtx"], td) | ||
|
|
||
| sysmon_records = [] | ||
| for p in sysmon_paths: | ||
| sysmon_records.extend(_parse_evtx_records(p, {"1", "5", "6", "11"})) | ||
|
|
||
| system_records = [] | ||
| for p in system_paths: | ||
| system_records.extend(_parse_evtx_records(p, {"7045"})) | ||
|
|
There was a problem hiding this comment.
The module attempts to extract System log events (EID 7045) from evtx/evtx.zip via name_filters=["_System.evtx"], but the analyzer-side EVTX dump currently only includes Microsoft-Windows-Sysmon%4Operational.evtx (see analyzer/windows/modules/auxiliary/sysmon.py). As a result, system_records/service_installs will always be empty in the default setup, and the byod_driver_service_install signature won’t fire. Consider either (a) updating the EVTX collection to include System.evtx, or (b) adjusting the processing module to detect installs via available telemetry when System.evtx isn’t present (and/or fix the filename filter to match System.evtx).
…inator, and System.evtx fallback Address review feedback from gemini-code-assist and Copilot on PR CAPESandbox#568. _filetime_to_dt: parse Sysmon SystemTime values with 7-digit fractional seconds (100ns precision) and ISO timezone offsets. Strip a trailing Z to "+00:00", truncate fractional seconds to 6 digits, try fromisoformat first for native offset handling, fall back to strptime. _extract_evtx: full defense-in-depth on the analyzer-produced zip. Reject symlink entries via Unix mode bits, sanitize each entry's filename to its basename to drop absolute paths and `..` traversal, realpath-check the destination stays inside target_dir, and stream via zf.open + chunked write rather than zf.extract (avoids zf.extract's reliance on the zip's filename metadata). _is_sample_being_analyzed: prefer SHA256 comparison when both the driver hash (Sysmon EID 6 Hashes field) and target.file.sha256 are available — eliminates basename collisions where the sample and an unrelated driver happen to share a filename. Falls back to basename when hashes are missing. _load_loldrivers / _load_tools: add encoding="utf-8" on open() for consistent behavior across platforms and locales. _extract_evtx system filter widened from "_System.evtx" to "System.evtx" — matches both `System.evtx` and `1_System.evtx` / `2_System.evtx` periodic snapshots. Synthesize service-install entries from executed_commands when the analyzer doesn't dump System.evtx. CAPE deployments that only collect Sysmon would never produce EID 7045, so byod_driver_service_install would never fire on real BYOD chains in default upstream config. The synthesis path parses `sc(.exe) create <name> binPath=...sys` patterns from the cape-monitor-captured executed_commands stream and emits synthetic kernel-mode-driver service-install entries. EID 7045 entries are preferred when present (real timestamps); synthesized entries are deduped by (service_name, .sys basename) so a deployment with both sources doesn't double-emit. fetch_loldrivers.py: filter the sample-count tally to isinstance(e, dict) so a feed with mixed non-dict entries doesn't AttributeError during the success-path printf. Re-verified end-to-end: - bat-loader detonation with System.evtx present (real EID 7045) — install entry source = eid7045, cmdline synthesis correctly deduped - same task with System.evtx files stripped from evtx.zip (simulating Sysmon-only deployment) — install entry source = cmdline, byod_driver_service_install still fires - raw .sys analyst submission — driver never loads, 0 BYOD signals
…-evtx
`_parse_evtx_records` previously called python-evtx and serialized every
record to XML via `record.xml()` before checking the EventID, then ran a
regex over the rendered XML to extract `<Data Name="X">value</Data>`
pairs. On a typical sandbox sysmon snapshot of ~7000 records that costs
~50 seconds even when ~99% of records get filtered out by EID — pure
serialization overhead.
The Rust-backed evtx-rs library (PyPI: `evtx`, package: `from evtx
import PyEvtxParser`) is also commonly present on CAPE deployments
(it's what `evtx_dump` ships) and parses the same EVTX into already-
structured JSON ~150x faster: sub-second on the same 7000-record
input. Each record's EventData is a dict keyed by Data-name, so we
also drop the regex pass and just normalize values to strings.
Try evtx-rs first; if `from evtx import PyEvtxParser` raises
ImportError, fall back to the existing python-evtx + regex path
unchanged. The yielded record shape (`{eid, time, data: {...}}`) is
identical for both backends, so callers don't change. Verified
output equivalence on a real EVTX: 759/759 matching records, zero
field-set differences, zero value differences for EID 6 driver-load
events.
Overview
Adds detection for Bring-Your-Own-Driver (BYOD / BYOVD) attacks where a sample loads a kernel driver — typically a known-vulnerable signed driver from the LOLDrivers catalog — to bypass EDR/AV protection.
Two parts:
Processing module (
modules/processing/loldrivers.py) walks the per-task Sysmon EVTX (EID 1/5/6/11) and System log (EID 7045) extracted fromevtx/evtx.zip, classifies driver loads against the LOLDrivers feed, and correlates post-load exploitation activity against a security-tools catalog.Four signatures (
modules/signatures/all/byod.py) consuming that output:byod_loldrivers_matchbyod_novel_driverbyod_post_load_exploitationtaskkill/Stop-Process/wmic delete) attributed against a 406-entry security-tools catalogbyod_driver_service_installFiles
modules/processing/loldrivers.pymodules/signatures/all/byod.pydata/security_tools.jsonutils/fetch_loldrivers.py.gitignoredata/loldrivers.json(~30 MB, too large for git)False-positive resistance
byod_driver_service_installgates on three sample-attributable signals (any one is sufficient):sample_under_test— driver basename equals the submitted sample's basenamecreated_by_sample— Sysmon EID 11 ties the.syscreation to a monitored processservice_invoked_by_sample—executed_commandscontainssc create/sc startagainst the driver's path or basenamePath-based gating (Temp/AppData/etc.) was deliberately rejected — the analyzer drops user-uploaded samples into the same paths it would match, so an analyst submitting a raw
.systo scan would FP. The cmdline-attribution branch catches the loader/dropper pattern (where the.sysis extracted by the analyzer rather than written by a monitored process, and the basename differs from the submission name) without firing on raw.syssubmissions.The post-load cmdline kill correlation runs once per analysis (not per driver-load) and only emits when at least one driver actually loaded — so a non-BYOD sample running
taskkillin isolation won't triggerbyod_post_load_exploitation.Installation
The LOLDrivers feed (~30 MB, ~620 driver entries, ~2000 sample hashes) is too large to commit. Fetch on demand:
```
poetry run python utils/fetch_loldrivers.py
```
The fetcher streams in 1 MiB chunks to
<dest>.tmp, validates the JSON shape and presence ofKnownVulnerableSamples, and atomically swaps in viaos.replace. Thedata/security_tools.jsoncatalog ships in the PR.If the feed is missing or malformed at processing time, the module logs a
WARNING/ERRORand disables itself gracefully — the other three signatures continue to work via heuristics.Enable in
processing.conf:```
[loldrivers]
enabled = yes
```
Test plan
End-to-end validated on a live CAPE deployment:
RTCore64.sys(a well-known LOLDriver, signed by Micro-Star International) wrapped in a.batloader doingsc create RTCore64Test binPath= "...\Temp\RTCore64.sys" type= kernel+sc start+taskkill /F /IM MsMpEng.exe— all 4 BYOD signatures fired with correct attribution and severities.sysanalyst submission (no loader) — driver never loads, 0 BYOD signals (correct, no FP)data/loldrivers.json— gracefulWARNING, 3 of 4 sigs still fire via heuristicsERROR, 3 of 4 sigs still fireNotes
python-evtx(already a CAPE dependency).order = 11to run aftersysmon(10) and before signatures.