ci: add bot to refresh embedded DWARF for new Ceph releases (phase 1)#108
Open
taodd wants to merge 1 commit into
Open
ci: add bot to refresh embedded DWARF for new Ceph releases (phase 1)#108taodd wants to merge 1 commit into
taodd wants to merge 1 commit into
Conversation
Adds a weekly scheduled GHA workflow that detects newly-published Ceph
point releases on download.ceph.com, generates the corresponding
osdtrace + radostrace embedded DWARF JSONs inside a disposable
centos:stream9 container, re-aggregates the header, relinks both tools
to prove the new data is well-formed, and opens a follow-up PR with the
added files.
Phase 1 scope: centos-stream / el9 only. This is the easiest lane --
upstream maintains a stable RPM URL pattern at download.ceph.com and
ships matching debuginfo packages, so the bot does not need any
distro-specific build infrastructure (no Launchpad / cloud-archive
mirroring, no Debian snapshot proxying). The same shape will work for
quay.io container images (phase 2) and Ubuntu / Cloud Archive / Debian
respins (phase 3), each as a sibling workflow.
Pieces:
* tools/detect_missing_dwarf.py
HTTP-HEADs ceph-osd RPM URLs across all candidate (X.2.Y) point
releases and diffs against the JSONs already in
files/centos-stream/. Outputs one TSV row per missing
(version, tool-set) pair. Self-tested today: identifies 15 missing
(X.Y.Z, [osdtrace, radostrace]) rows across quincy / reef / squid /
tentacle that are publicly available on download.ceph.com but not
yet covered by our committed JSONs.
* tools/gen_dwarf_for_version.sh
Spins up a disposable centos:stream9 podman container, installs
ceph-osd + ceph-common + librados2 + librbd1 and every matching
-debuginfo + -debugsource at the requested version, builds cephtrace
inside the container (matched glibc), holds a ceph-osd process at
its entry point via gdb's `starti` command (no Ceph init code runs,
but /proc/<pid>/exe is valid for osdtrace's DWARF parser to attach),
and runs `./osdtrace -j` and/or `./radostrace -j` against that
holder PID. Writes the JSON(s) directly to files/<distro>/<tool>/
via a bind-mounted repo root.
* .github/workflows/refresh-embedded-dwarf.yaml
Weekly cron (Monday 06:00 UTC) + workflow_dispatch trigger. Runs
the detector, generator, and rebuild gate; on success opens a PR
via peter-evans/create-pull-request@v6 listing the additions in a
markdown table. Failures (e.g. a missing debuginfo subpackage in
one release) are non-fatal -- they are reported in the PR body so
the next scheduled run can retry, and the rest of the run still
PRs the successful ones.
The bot requires only the default GITHUB_TOKEN; the contents:write +
pull-requests:write permissions are scoped in the workflow YAML.
There was a problem hiding this comment.
Pull request overview
Adds automation to keep the repository’s embedded DWARF JSON snapshots (used by osdtrace/radostrace) up-to-date with newly published Ceph point releases by detecting missing versions, generating JSONs in a disposable CentOS Stream 9 container, and opening a follow-up PR.
Changes:
- Add a scheduled + manual GitHub Actions workflow to detect/generate new embedded DWARF JSONs and open an automated PR.
- Add a detector script that probes
download.ceph.comfor published CentOS Stream 9 Ceph RPM versions and diffs against existing JSONs in-repo. - Add a generator script that builds cephtrace in a podman container, synthesizes a “holder”
ceph-osdPID, and runsosdtrace -j/radostrace -jto write new JSONs intofiles/centos-stream/....
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
tools/gen_dwarf_for_version.sh |
Containerized generator to install Ceph + debuginfo, build tools, create a holder PID, and emit DWARF JSON files into the repo. |
tools/detect_missing_dwarf.py |
Probes upstream for published ceph-osd el9 RPMs and prints TSV rows for missing embedded-DWARF coverage. |
.github/workflows/refresh-embedded-dwarf.yaml |
Weekly + on-demand workflow that runs detection, generation, rebuild verification, and opens an automated PR. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+75
to
+85
| # resolving is in a per-binary -debuginfo package. | ||
| podman exec "$CTR" bash -ec " | ||
| cd /tmp | ||
| pkgs='ceph-osd ceph-common librbd1 librados2 | ||
| ceph-osd-debuginfo ceph-common-debuginfo | ||
| librbd1-debuginfo librados2-debuginfo | ||
| ceph-debuginfo ceph-debugsource' | ||
| for p in \$pkgs; do | ||
| curl -sfLO https://download.ceph.com/rpm-${VERSION}/el9/x86_64/\${p}-${VERSION}-0.el9.x86_64.rpm | ||
| done | ||
| rpm -ivh --force /tmp/*.rpm >/dev/null |
Comment on lines
+107
to
+123
| nohup gdb -nx -batch-silent \ | ||
| -ex "set follow-fork-mode parent" \ | ||
| -ex "set pagination off" \ | ||
| -ex "starti" \ | ||
| -ex "shell echo \$\$ > /tmp/osd_holder.pid; while true; do sleep 60; done" \ | ||
| --args /usr/bin/ceph-osd --version >/tmp/gdb.log 2>&1 & | ||
| for i in $(seq 1 60); do | ||
| [ -s /tmp/osd_holder.pid ] && break | ||
| sleep 0.5 | ||
| done | ||
| ' | ||
|
|
||
| OSD_PID=$(podman exec "$CTR" bash -ec ' | ||
| HOLDER=$(cat /tmp/osd_holder.pid 2>/dev/null || true) | ||
| [ -n "$HOLDER" ] || { echo "gdb holder did not start" >&2; cat /tmp/gdb.log >&2; exit 1; } | ||
| OSD=$(pgrep -P "$HOLDER" -x ceph-osd || true) | ||
| [ -n "$OSD" ] || { echo "ceph-osd subprocess not found" >&2; ps -ef >&2; exit 1; } |
| echo "Generation summary: $S succeeded, $F failed." | ||
| echo "succeeded=$S" >> "$GITHUB_OUTPUT" | ||
| echo "failed=$F" >> "$GITHUB_OUTPUT" | ||
|
|
| -ex "set follow-fork-mode parent" \ | ||
| -ex "set pagination off" \ | ||
| -ex "starti" \ | ||
| -ex "shell echo \$\$ > /tmp/osd_holder.pid; while true; do sleep 60; done" \ |
| Output (one row per (version, missing-tool-list)) is TSV on stdout so the | ||
| companion shell driver can read it line-by-line: | ||
|
|
||
| centos-stream osdtrace,radostrace 17.2.4 2:17.2.4-0.el9 https://download.ceph.com/rpm-17.2.4/el9/x86_64/ceph-osd-17.2.4-0.el9.x86_64.rpm |
Comment on lines
+50
to
+55
| req = urllib.request.Request(url, method="HEAD") | ||
| try: | ||
| with urllib.request.urlopen(req, timeout=timeout) as r: | ||
| return r.status | ||
| except Exception: | ||
| return 0 |
Comment on lines
+93
to
+101
| def main() -> None: | ||
| upstream = upstream_el9_versions() | ||
| if not upstream: | ||
| # Treat a fully-empty probe set as a hard error: it almost always | ||
| # means download.ceph.com is unreachable from the runner, and | ||
| # opening a PR that deletes nothing is harmless but auto-merging | ||
| # against an empty diff would be misleading. | ||
| print("ERROR: no upstream RPMs detected; aborting", file=sys.stderr) | ||
| sys.exit(1) |
Comment on lines
+76
to
+86
| podman exec "$CTR" bash -ec " | ||
| cd /tmp | ||
| pkgs='ceph-osd ceph-common librbd1 librados2 | ||
| ceph-osd-debuginfo ceph-common-debuginfo | ||
| librbd1-debuginfo librados2-debuginfo | ||
| ceph-debuginfo ceph-debugsource' | ||
| for p in \$pkgs; do | ||
| curl -sfLO https://download.ceph.com/rpm-${VERSION}/el9/x86_64/\${p}-${VERSION}-0.el9.x86_64.rpm | ||
| done | ||
| rpm -ivh --force /tmp/*.rpm >/dev/null | ||
| " |
3f4d73e to
8197bb9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a weekly scheduled GHA workflow that keeps the embedded DWARF JSONs under
files/centos-stream/in sync with newly-published Ceph point releases ondownload.ceph.com. Three new files, no behavioural changes to the runtime tools.What it does, end-to-end
tools/detect_missing_dwarf.pyHTTP-HEADsceph-osdRPM URLs across every (X.2.Y) candidate version in quincy / reef / squid / tentacle, diffs against the JSONs already infiles/centos-stream/{osdtrace,radostrace}/, and prints TSV rows for the missing ones.tools/gen_dwarf_for_version.shspins up a disposablecentos:stream9podman container, installs the matchingceph-osd/ceph-common/librbd1/librados2plus every-debuginfo+-debugsourceat that version, builds cephtrace inside the container, holds aceph-osdprocess stopped at its entry point via gdb'sstarti(so/proc/<pid>/exeis valid for the DWARF parser to attach), and runs./osdtrace -j/./radostrace -jagainst that holder PID.make clean && makerebuilds with the new JSONs in the embedded header to prove they parse and link cleanly.peter-evans/create-pull-request@v6with a markdown table of additions and a "failed to generate" section for the next run to retry.What's missing today, per the detector
Running
python3 tools/detect_missing_dwarf.pylocally surfaced these 15 (version, tool-set) gaps that the first scheduled run would fill (quincy 17.2.4–.7, reef 18.2.0–.6 and 18.2.8, squid 19.2.0–.2, tentacle 20.2.0):Test plan
python3 tools/detect_missing_dwarf.pyexits 0 with the expected 15 rows on the currentmain.bash -n tools/gen_dwarf_for_version.shpasses; YAML linted withpython -c "yaml.safe_load(...)".workflow_dispatchrun after merge to validate the full path end-to-end before relying on the schedule (the first scheduled run will fire ~7 days later).Out of scope (deferred to phases 2/3)
/usr/bin/ceph-osddirectly.