Generate the VDR from per-CVE source files#26
Open
ppkarwasz wants to merge 5 commits into
Open
Conversation
Adds a minimal `uv` project at the repo root so Python tooling scripts can run against a locked, reproducible environment alongside the existing Maven-based site build.
The layout consists of:
- `pyproject.toml`: declares `requires-python >= 3.11` and the tooling dependencies (currently just `lxml`). `tool.uv.package = false` since this repo is not itself an installable Python package.
- `.python-version`: pins the interpreter version uv picks up.
- `uv.lock`: committed so dependency versions do not drift between
contributors or CI runs.
Scripts placed under `scripts/` may also carry PEP 723 inline metadata for standalone invocation via `uv run <script>`.
To bootstrap the environment download `uv` and run:
uv sync
The `vdr_split` script deterministically and reproducibly splits our monolithic VDR into one file per vulnerability. The files are stored at: src/vulnerabilities/CVE-XXXX-YYYY/<bom-ref>.cdx.xml where `bom-ref` is the slug we used as a BOM reference. The Conan package for Log4cxx has no identity of its own, so its vulnerabilities are recorded in the same file as the corresponding Log4cxx ones. A template for new CycloneDX files is stored in `src/vulnerabilities/template.cdx.xml`.
The `vdr_aggregate` performs the reverse operation, compared to `vdr_split`: - It merges all the CycloneDX documents in `src/vulnerabilities`, - If the result differs from the committed one, it bumps the version. Comparison does not take whitespace into consideration.
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change replaces our hand-maintained
src/site/static/cyclonedx/vdr.xmlwith a generated artifact assembled from one source file per(CVE, component)pair undersrc/vulnerabilities/.To regenerate the VDR after editing any per-CVE file:
To split an existing monolithic VDR back into per-CVE files (one-time migration, or recovery):
Why
The current hand-edited VDR is becoming hard to maintain reliably:
metadata.timestampto the max of everyvulnerability.updated. The aggregator now computes this automatically.(year DESC, number DESC), components alphabetically bybom-ref.How it works
Each vulnerability lives in its own file at
src/vulnerabilities/<CVE-id>/<component>.cdx.xml: a self-contained CycloneDX 1.7 BOM with the affected component asmetadata.componentand a single<vulnerability>element.log4cxx-conannever gets its own file; its vulnerabilities ride along in the correspondinglog4cxxfile via a<components>entry plus a<dependencies>edge.vdr_aggregate.pywalks every per-CVE file, dedupes components bybom-ref, dedupes vulnerabilities by CVE id, and emits the monolithicvdr.xml.vdr_split.pyperforms the inverse for migration. Both scripts sharevdr_common.py(constants, namespace handling, comparison, write-if-changed orchestration).Idempotent writes
Both scripts read the existing output's
serialNumberandversion, build a candidate at the existing version, and compare it to the file on disk via a structural comparison that ignores comments, inter-element whitespace, and namespace prefixes. If the candidate is equivalent, the file is left untouched: no diff, no version churn. If it differs, the version is bumped by one and the file is rewritten.This means re-running either script in a clean tree is a no-op, and a content edit produces exactly one version bump per affected file.
Why split per (CVE, component), beyond automation
metadata.component, since it covers many subjects. Per-component files letmetadata.componentname the analyzing project (e.g.log4j-core), with the vulnerable dependency invulnerability.affectsand the dependency path in<dependencies>. That's the shape required for VEX, CSAF, and OpenVEX, so we can grow into those formats without restructuring our source of truth._vulnerabilities.adocbe assembled from one generated partial per CVE, instead of a single monolithic AsciiDoc.We can also later decide to have a separate page per CVE.
Repository layout
Notes for reviewers
vdr.xmlis now CycloneDX 1.7 (was 1.6). The bump is intentional: 1.x is semantically versioned, and the structural change is just a namespace rename.bom-ref, solog4j-1.2-apinow precedeslog4j-core.uv run scripts/vdr_aggregate.pyfor updates.