Skip to content

ci: add bot to refresh embedded DWARF for new Ceph releases (phase 1)#108

Open
taodd wants to merge 1 commit into
mainfrom
chore/embedded-dwarf-refresh-bot
Open

ci: add bot to refresh embedded DWARF for new Ceph releases (phase 1)#108
taodd wants to merge 1 commit into
mainfrom
chore/embedded-dwarf-refresh-bot

Conversation

@taodd
Copy link
Copy Markdown
Owner

@taodd taodd commented May 23, 2026

Summary

Adds a weekly scheduled GHA workflow that keeps the embedded DWARF JSONs under files/centos-stream/ in sync with newly-published Ceph point releases on download.ceph.com. Three new files, no behavioural changes to the runtime tools.

What it does, end-to-end

  1. Detect: tools/detect_missing_dwarf.py HTTP-HEADs ceph-osd RPM URLs across every (X.2.Y) candidate version in quincy / reef / squid / tentacle, diffs against the JSONs already in files/centos-stream/{osdtrace,radostrace}/, and prints TSV rows for the missing ones.
  2. Generate: for each missing row, tools/gen_dwarf_for_version.sh spins up a disposable centos:stream9 podman container, installs the matching ceph-osd / ceph-common / librbd1 / librados2 plus every -debuginfo + -debugsource at that version, builds cephtrace inside the container, holds a ceph-osd process stopped at its entry point via gdb's starti (so /proc/<pid>/exe is valid for the DWARF parser to attach), and runs ./osdtrace -j / ./radostrace -j against that holder PID.
  3. Verify: make clean && make rebuilds with the new JSONs in the embedded header to prove they parse and link cleanly.
  4. PR: opens a follow-up PR via peter-evans/create-pull-request@v6 with a markdown table of additions and a "failed to generate" section for the next run to retry.

What's missing today, per the detector

Running python3 tools/detect_missing_dwarf.py locally surfaced these 15 (version, tool-set) gaps that the first scheduled run would fill (quincy 17.2.4–.7, reef 18.2.0–.6 and 18.2.8, squid 19.2.0–.2, tentacle 20.2.0):

centos-stream	osdtrace,radostrace	17.2.4	2:17.2.4-0.el9
centos-stream	osdtrace,radostrace	17.2.5	2:17.2.5-0.el9
centos-stream	osdtrace,radostrace	17.2.6	2:17.2.6-0.el9
centos-stream	osdtrace,radostrace	17.2.7	2:17.2.7-0.el9
centos-stream	osdtrace,radostrace	18.2.0	2:18.2.0-0.el9
centos-stream	osdtrace,radostrace	18.2.1	2:18.2.1-0.el9
centos-stream	osdtrace,radostrace	18.2.2	2:18.2.2-0.el9
centos-stream	osdtrace,radostrace	18.2.4	2:18.2.4-0.el9
centos-stream	osdtrace,radostrace	18.2.5	2:18.2.5-0.el9
centos-stream	osdtrace,radostrace	18.2.6	2:18.2.6-0.el9
centos-stream	osdtrace,radostrace	18.2.8	2:18.2.8-0.el9
centos-stream	osdtrace,radostrace	19.2.0	2:19.2.0-0.el9
centos-stream	osdtrace,radostrace	19.2.1	2:19.2.1-0.el9
centos-stream	osdtrace,radostrace	19.2.2	2:19.2.2-0.el9
centos-stream	osdtrace,radostrace	20.2.0	2:20.2.0-0.el9

Test plan

  • python3 tools/detect_missing_dwarf.py exits 0 with the expected 15 rows on the current main.
  • bash -n tools/gen_dwarf_for_version.sh passes; YAML linted with python -c "yaml.safe_load(...)".
  • Manual workflow_dispatch run after merge to validate the full path end-to-end before relying on the schedule (the first scheduled run will fire ~7 days later).
  • First scheduled bot PR closes out the 15 gaps above.

Out of scope (deferred to phases 2/3)

  • quay.io container-image build-id coverage — image build-ids drift from RPM build-ids; needs a separate detector + generator that pulls the image and extracts /usr/bin/ceph-osd directly.
  • Ubuntu / Cloud Archive / Debian respin coverage — Launchpad and snapshot.debian.org querying, plus a matching host build environment.

Adds a weekly scheduled GHA workflow that detects newly-published Ceph
point releases on download.ceph.com, generates the corresponding
osdtrace + radostrace embedded DWARF JSONs inside a disposable
centos:stream9 container, re-aggregates the header, relinks both tools
to prove the new data is well-formed, and opens a follow-up PR with the
added files.

Phase 1 scope: centos-stream / el9 only.  This is the easiest lane --
upstream maintains a stable RPM URL pattern at download.ceph.com and
ships matching debuginfo packages, so the bot does not need any
distro-specific build infrastructure (no Launchpad / cloud-archive
mirroring, no Debian snapshot proxying).  The same shape will work for
quay.io container images (phase 2) and Ubuntu / Cloud Archive / Debian
respins (phase 3), each as a sibling workflow.

Pieces:

* tools/detect_missing_dwarf.py
    HTTP-HEADs ceph-osd RPM URLs across all candidate (X.2.Y) point
    releases and diffs against the JSONs already in
    files/centos-stream/.  Outputs one TSV row per missing
    (version, tool-set) pair.  Self-tested today: identifies 15 missing
    (X.Y.Z, [osdtrace, radostrace]) rows across quincy / reef / squid /
    tentacle that are publicly available on download.ceph.com but not
    yet covered by our committed JSONs.

* tools/gen_dwarf_for_version.sh
    Spins up a disposable centos:stream9 podman container, installs
    ceph-osd + ceph-common + librados2 + librbd1 and every matching
    -debuginfo + -debugsource at the requested version, builds cephtrace
    inside the container (matched glibc), holds a ceph-osd process at
    its entry point via gdb's `starti` command (no Ceph init code runs,
    but /proc/<pid>/exe is valid for osdtrace's DWARF parser to attach),
    and runs `./osdtrace -j` and/or `./radostrace -j` against that
    holder PID.  Writes the JSON(s) directly to files/<distro>/<tool>/
    via a bind-mounted repo root.

* .github/workflows/refresh-embedded-dwarf.yaml
    Weekly cron (Monday 06:00 UTC) + workflow_dispatch trigger.  Runs
    the detector, generator, and rebuild gate; on success opens a PR
    via peter-evans/create-pull-request@v6 listing the additions in a
    markdown table.  Failures (e.g. a missing debuginfo subpackage in
    one release) are non-fatal -- they are reported in the PR body so
    the next scheduled run can retry, and the rest of the run still
    PRs the successful ones.

The bot requires only the default GITHUB_TOKEN; the contents:write +
pull-requests:write permissions are scoped in the workflow YAML.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds automation to keep the repository’s embedded DWARF JSON snapshots (used by osdtrace/radostrace) up-to-date with newly published Ceph point releases by detecting missing versions, generating JSONs in a disposable CentOS Stream 9 container, and opening a follow-up PR.

Changes:

  • Add a scheduled + manual GitHub Actions workflow to detect/generate new embedded DWARF JSONs and open an automated PR.
  • Add a detector script that probes download.ceph.com for published CentOS Stream 9 Ceph RPM versions and diffs against existing JSONs in-repo.
  • Add a generator script that builds cephtrace in a podman container, synthesizes a “holder” ceph-osd PID, and runs osdtrace -j / radostrace -j to write new JSONs into files/centos-stream/....

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
tools/gen_dwarf_for_version.sh Containerized generator to install Ceph + debuginfo, build tools, create a holder PID, and emit DWARF JSON files into the repo.
tools/detect_missing_dwarf.py Probes upstream for published ceph-osd el9 RPMs and prints TSV rows for missing embedded-DWARF coverage.
.github/workflows/refresh-embedded-dwarf.yaml Weekly + on-demand workflow that runs detection, generation, rebuild verification, and opens an automated PR.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +75 to +85
# resolving is in a per-binary -debuginfo package.
podman exec "$CTR" bash -ec "
cd /tmp
pkgs='ceph-osd ceph-common librbd1 librados2
ceph-osd-debuginfo ceph-common-debuginfo
librbd1-debuginfo librados2-debuginfo
ceph-debuginfo ceph-debugsource'
for p in \$pkgs; do
curl -sfLO https://download.ceph.com/rpm-${VERSION}/el9/x86_64/\${p}-${VERSION}-0.el9.x86_64.rpm
done
rpm -ivh --force /tmp/*.rpm >/dev/null
Comment on lines +107 to +123
nohup gdb -nx -batch-silent \
-ex "set follow-fork-mode parent" \
-ex "set pagination off" \
-ex "starti" \
-ex "shell echo \$\$ > /tmp/osd_holder.pid; while true; do sleep 60; done" \
--args /usr/bin/ceph-osd --version >/tmp/gdb.log 2>&1 &
for i in $(seq 1 60); do
[ -s /tmp/osd_holder.pid ] && break
sleep 0.5
done
'

OSD_PID=$(podman exec "$CTR" bash -ec '
HOLDER=$(cat /tmp/osd_holder.pid 2>/dev/null || true)
[ -n "$HOLDER" ] || { echo "gdb holder did not start" >&2; cat /tmp/gdb.log >&2; exit 1; }
OSD=$(pgrep -P "$HOLDER" -x ceph-osd || true)
[ -n "$OSD" ] || { echo "ceph-osd subprocess not found" >&2; ps -ef >&2; exit 1; }
echo "Generation summary: $S succeeded, $F failed."
echo "succeeded=$S" >> "$GITHUB_OUTPUT"
echo "failed=$F" >> "$GITHUB_OUTPUT"

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

-ex "set follow-fork-mode parent" \
-ex "set pagination off" \
-ex "starti" \
-ex "shell echo \$\$ > /tmp/osd_holder.pid; while true; do sleep 60; done" \
Output (one row per (version, missing-tool-list)) is TSV on stdout so the
companion shell driver can read it line-by-line:

centos-stream osdtrace,radostrace 17.2.4 2:17.2.4-0.el9 https://download.ceph.com/rpm-17.2.4/el9/x86_64/ceph-osd-17.2.4-0.el9.x86_64.rpm
Comment on lines +50 to +55
req = urllib.request.Request(url, method="HEAD")
try:
with urllib.request.urlopen(req, timeout=timeout) as r:
return r.status
except Exception:
return 0
Comment on lines +93 to +101
def main() -> None:
upstream = upstream_el9_versions()
if not upstream:
# Treat a fully-empty probe set as a hard error: it almost always
# means download.ceph.com is unreachable from the runner, and
# opening a PR that deletes nothing is harmless but auto-merging
# against an empty diff would be misleading.
print("ERROR: no upstream RPMs detected; aborting", file=sys.stderr)
sys.exit(1)
Comment on lines +76 to +86
podman exec "$CTR" bash -ec "
cd /tmp
pkgs='ceph-osd ceph-common librbd1 librados2
ceph-osd-debuginfo ceph-common-debuginfo
librbd1-debuginfo librados2-debuginfo
ceph-debuginfo ceph-debugsource'
for p in \$pkgs; do
curl -sfLO https://download.ceph.com/rpm-${VERSION}/el9/x86_64/\${p}-${VERSION}-0.el9.x86_64.rpm
done
rpm -ivh --force /tmp/*.rpm >/dev/null
"
@taodd taodd force-pushed the chore/embedded-dwarf-refresh-bot branch from 3f4d73e to 8197bb9 Compare May 23, 2026 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants