Generate the VDR from per-CVE source files by ppkarwasz · Pull Request #26 · apache/logging-site

ppkarwasz · 2026-04-24T12:54:09Z

This change replaces our hand-maintained src/site/static/cyclonedx/vdr.xml with a generated artifact assembled from one source file per (CVE, component) pair under src/vulnerabilities/.

To regenerate the VDR after editing any per-CVE file:

uv run scripts/vdr_aggregate.py

To split an existing monolithic VDR back into per-CVE files (one-time migration, or recovery):

uv run scripts/vdr_split.py

Why

The current hand-edited VDR is becoming hard to maintain reliably:

Timestamps drift. In the latest release we forgot to bump metadata.timestamp to the max of every vulnerability.updated. The aggregator now computes this automatically.
Ordering is hard to keep straight. Vulnerabilities in the file are not strictly sorted, and components are listed in an ad-hoc order. The aggregator enforces deterministic order: vulnerabilities by (year DESC, number DESC), components alphabetically by bom-ref.
Merge conflicts on simultaneous additions. Adding seven vulnerabilities in a single batch (as in the most recent disclosure) is error-prone. Per-CVE files let contributors add or edit vulnerabilities independently.

How it works

Each vulnerability lives in its own file at src/vulnerabilities/<CVE-id>/<component>.cdx.xml: a self-contained CycloneDX 1.7 BOM with the affected component as metadata.component and a single <vulnerability> element. log4cxx-conan never gets its own file; its vulnerabilities ride along in the corresponding log4cxx file via a <components> entry plus a <dependencies> edge.

vdr_aggregate.py walks every per-CVE file, dedupes components by bom-ref, dedupes vulnerabilities by CVE id, and emits the monolithic vdr.xml. vdr_split.py performs the inverse for migration. Both scripts share vdr_common.py (constants, namespace handling, comparison, write-if-changed orchestration).

Idempotent writes

Both scripts read the existing output's serialNumber and version, build a candidate at the existing version, and compare it to the file on disk via a structural comparison that ignores comments, inter-element whitespace, and namespace prefixes. If the candidate is equivalent, the file is left untouched: no diff, no version churn. If it differs, the version is bumped by one and the file is rewritten.

This means re-running either script in a clean tree is a no-op, and a content edit produces exactly one version bump per affected file.

Why split per (CVE, component), beyond automation

Path to VEX. A monolithic VDR has no meaningful metadata.component, since it covers many subjects. Per-component files let metadata.component name the analyzing project (e.g. log4j-core), with the vulnerable dependency in vulnerability.affects and the dependency path in <dependencies>. That's the shape required for VEX, CSAF, and OpenVEX, so we can grow into those formats without restructuring our source of truth.
Easier asciidoc generation. Per-CVE files let _vulnerabilities.adoc be assembled from one generated partial per CVE, instead of a single monolithic AsciiDoc.
We can also later decide to have a separate page per CVE.

Repository layout

scripts/
  vdr_common.py        # shared helpers (constants, clone, serialize, equivalent, write_bom_if_changed)
  vdr_aggregate.py     # per-CVE files -> vdr.xml
  vdr_split.py         # vdr.xml -> per-CVE files
src/vulnerabilities/
  CVE-2017-5645/log4j-core.cdx.xml
  CVE-2018-1285/log4net.cdx.xml
  ...
  template.cdx.xml     # editable template for new CVEs
src/site/static/cyclonedx/vdr.xml   # generated output

Notes for reviewers

The aggregated vdr.xml is now CycloneDX 1.7 (was 1.6). The bump is intentional: 1.x is semantically versioned, and the structural change is just a namespace rename.
Component ordering changed: alphabetical by bom-ref, so log4j-1.2-api now precedes log4j-core.
The aggregated header comment now warns it's generated and points at uv run scripts/vdr_aggregate.py for updates.

Adds a minimal `uv` project at the repo root so Python tooling scripts can run against a locked, reproducible environment alongside the existing Maven-based site build. The layout consists of: - `pyproject.toml`: declares `requires-python >= 3.11` and the tooling dependencies (currently just `lxml`). `tool.uv.package = false` since this repo is not itself an installable Python package. - `.python-version`: pins the interpreter version uv picks up. - `uv.lock`: committed so dependency versions do not drift between contributors or CI runs. Scripts placed under `scripts/` may also carry PEP 723 inline metadata for standalone invocation via `uv run <script>`. To bootstrap the environment download `uv` and run: uv sync

The `vdr_split` script deterministically and reproducibly splits our monolithic VDR into one file per vulnerability. The files are stored at: src/vulnerabilities/CVE-XXXX-YYYY/<bom-ref>.cdx.xml where `bom-ref` is the slug we used as a BOM reference. The Conan package for Log4cxx has no identity of its own, so its vulnerabilities are recorded in the same file as the corresponding Log4cxx ones. A template for new CycloneDX files is stored in `src/vulnerabilities/template.cdx.xml`.

The `vdr_aggregate` performs the reverse operation, compared to `vdr_split`: - It merges all the CycloneDX documents in `src/vulnerabilities`, - If the result differs from the committed one, it bumps the version. Comparison does not take whitespace into consideration.

ppkarwasz added 5 commits April 24, 2026 12:37

feat: splits VDR into per-CVE documents

dc7f5f4

feat: replace manually maintained with generated VDR

0c9774c

ppkarwasz mentioned this pull request Apr 24, 2026

Add VEX generation to Logging Services #27

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate the VDR from per-CVE source files#26

Generate the VDR from per-CVE source files#26
ppkarwasz wants to merge 5 commits into
mainfrom
feat/vdr-generation

ppkarwasz commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ppkarwasz commented Apr 24, 2026

Why

How it works

Idempotent writes

Why split per (CVE, component), beyond automation

Repository layout

Notes for reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant