Fix uninitialized / NULL pointer use in dwarf_parser#103
Open
xtrusia wants to merge 1 commit into
Open
Conversation
translate_param_location and translate_expr could dereference uninitialized or null pointers when elfutils location lookups fail. Make find_param, translate_param_location and translate_expr report failure, and drop the function probe when a parameter is unresolved. Signed-off-by: Seyeong Kim <seyeong.kim@canonical.com>
There was a problem hiding this comment.
Pull request overview
This PR hardens the DWARF parsing pipeline by adding explicit failure propagation for previously unchecked lookup / location-expression / frame-base / CFA paths, preventing uninitialized or NULL-pointer usage during probe generation. It also changes probe handling so that unresolved parameters cause the affected function probe to be dropped from the exported maps.
Changes:
- Change
find_param,translate_param_location, andtranslate_exprto returnbooland propagate failures up the call chain. - Add defensive checks for missing
DW_AT_location, empty/invalid location expressions, missing frame base forDW_OP_fbreg, and unresolved CFA forDW_OP_call_frame_cfa. - In
handle_function, skip the entire function probe (erase fromfunc2pc/func2vf) when a parameter cannot be resolved.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/dwarf_parser.h | Updates method signatures to return bool and add out-params for safe failure propagation. |
| src/dwarf_parser.cc | Implements the new bool-returning APIs, adds NULL/uninitialized guards, and erases probes when parameter resolution fails. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
418
to
+422
| VarLocation varloc; | ||
| translate_expr(NULL, expr, pc, varloc); | ||
| if (!translate_expr(NULL, expr, pc, varloc)) { | ||
| clog << "failed to translate location of " << fields[i] << endl; | ||
| return; | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reproduction and verification: dwarf_parser uninitialized / NULL deref fixes
Summary
find_param,translate_param_location, andtranslate_exprnow returnbool; failures on the newly checkedNULL / lookup / frame-base / CFA paths propagate up.
handle_functiondrops the whole function probe (erases fromfunc2pc/func2vf) when a parameter is unresolved.unguarded failure branches; the success-path control flow is
unchanged. BPF consumers already treat a missing
func2pcentryas "skip uprobe", so the erase composes with downstream code.
and both baseline and patched builds were run on the natural happy
path with byte-identical JSON output. Issue 2 and Issue 3 are
confirmed by inspection only — the test environment (Ceph 19.2.3 +
full debug info) does not exercise their failure branches.
Verification of the
fix-dwarf-parser-uninit-derefbranch againstmain(baseline
a8793a9). The patch addresses three latent failure-propagationgaps that all sit on the same live-DWARF parsing path; this note records
how the patched path was exercised end-to-end against the baseline.
The patched call path
All three issues live on the same chain, which only runs when cephtrace
is invoked with
-j <out.json>or when the embedded / JSON cache misses:The patch is purely defensive: it makes
find_param,translate_param_location, andtranslate_exprall return a boolean,propagates the newly checked NULL / lookup / frame-base / CFA
failures through their call sites, and has
handle_functiondropthe whole function probe (erasing it from
func2pc/func2vf) whena parameter cannot be resolved. (The
defaultand no-argument-atomarms of
translate_expr's switch still returntrue; the patchonly newly fails on the four conditions named above.) BPF consumers already treat a
missing
func2pcentry as "skip uprobe" (seeattach_uprobe/fill_map_hprobesin src/osdtrace.cc andsrc/radostrace.cc), so the erase composes safely
with downstream code.
Issue 1 — uninitialized
exprintranslate_param_locationsrc/dwarf_parser.cc#L220-L241
find_paramupstream has the same shape:So when the symbol is not found, the chain hands an uninitialized
vardie(and downstream an uninitializedloc_attr/expr) intotranslate_exprand the rest ofhandle_function.Issue 2 — NULL
cfa_opsintranslate_exprforDW_OP_call_frame_cfasrc/dwarf_parser.cc#L623-L645
If both
cfi_debugandcfi_ehfail to produce a frame forpc,cfa_opsstaysNULLbut the recursive call still dereferences it:Issue 3 — uninitialized
fb_exprintranslate_exprforDW_OP_fbregsrc/dwarf_parser.cc#L610-L621
fb_attritself can beNULLhere (translate_fieldspassesNULLthrough), which the baseline also does not check.
Reproduction environment
maas-default:embedded-dwarf-test, machine 0(node-12, 10.0.0.112, Noble 24.04)
ceph-osd-dbgsym,librados2-dbgsym,librbd1-dbgsymg++,clang,libelf-dev,libc6-dev,libc6-dev-i386,libdw-dev,libssl-dev,makea8793a9(main HEAD); the patchedvariant is fetched as
origin/fix-dwarf-parser-uninit-derefStep 1 — Natural reproduction (happy path, baseline vs patched)
Both baseline and patched builds were run with the real probe
definitions against Ceph 19.2.3 + matching dbgsym:
00Skip probemessagesprocess functions takeNeither build emits the bug-branch
cerrmessages(
Get param location expr failed,translate_expr get fb_expr failed,dwarf_frame_cfa add ... failed); the failure-only branches the patchnewly guards are never entered. The two JSONs are equal in size,
confirming the patch does not regress the happy path.
radostrace.baselinealso exits
0cleanly; the patchedradostracewas not separatelyrebuilt because it shares the same
dwarf_parsercode path and theosdtrace comparison already covers it.
Step 2 — Deterministic reproduction (probe mismatch)
The realistic trigger for Issue 1 is "cephtrace's probe spec does not
match the target binary" — exactly what happens when cephtrace is
pointed at an unsupported Ceph version (renamed parameter, changed
signature, etc.). This is modeled by swapping one probe variable name
in
osd_probes:The rest of
src/osdtrace.ccis untouched. The stringZZNONEXISTENTis not a real parameter of
OSD::enqueue_op/OSD::dequeue_op, sodwarf_getscopevarwill return a "not found" sentinel for those twoprobes. Both variants are then rebuilt and run with identical CLI
invocation:
Baseline result — infinite loop
Four invocations were live, each spinning between 24 % and 67 % CPU and
having never produced an output JSON. stderr was 28 GB of the single
line
unexpected type pxrepeated forever. The processes had to bekilled with
pkill -9to recover the disk.The exact code path:
find_paramcallsdwarf_getscopevar("ZZNONEXISTENT", ...)whichreturns the not-found sentinel and leaves
vardieuninitialized.Baseline
find_paramignores the return value and hands theuninitialized
vardieback totranslate_param_location.dwarf_attr_integrate(&vardie, DW_AT_location, &loc_attr)readsgarbage from
vardiebut happens to survive — the stack slot helda value that does not segfault on this run.
dwarf_getlocation_addrreturnsr != 1; baseline prints thewarning and falls through to
translate_exprwith anuninitialized
exprpointer. Again the run survives by luck.Back in
handle_function,dwarf_die_type(&vardie, &typedie)walks the still-garbage
vardieand populatestypediewith moregarbage.
translate_fieldsis called with that garbagetypedie.dwarf_tag(typedie)returns a value that matches none of theswitch cases, so control reaches the
default:arm intranslate_fields:
Neither
inortypedieis updated, so the outerwhile (i < fields.size())never makes progress —fields[1] == "px"is reprinted forever.
So Issue 1's missing failure propagation handed garbage data to
translate_fields, which then tripped a separate, latentinfinite-loop hazard in
translate_fields(the missingreturninthat switch's non-matching arms). The patch did not touch that
switch, but it prevents the upstream garbage from ever reaching it.
Fixing the
translate_fieldsswitch itself (so it cannot loop onan unknown
dwarf_tagfrom any source) is a separate follow-up.Patched result — graceful skip
Same probe tampering, same command, only
dwarf_parser.h/.ccswapped for the patched version:
find_paramnow reports failure;translate_param_locationreturnsfalse;handle_functionerases the unresolved function from bothfunc2pcandfunc2vfand returns 0 for that function. The otherprobes finish normally and the JSON is written.
The reported time is from
process functions take 937.834indwarf_parser.cc#L763-L765, which
computes
(clock() diff) / CLOCKS_PER_SEC * 1000, i.e. milliseconds.So the parsing phase finished in roughly 0.94 s on this run.
Side-by-side
a8793a9)fix-dwarf-parser-uninit-deref)pkill -9)0, clean exitprocess functions take 937.834ms)unexpected type pxConclusion
find_paramandtranslate_param_locationthroughhandle_function, which erases the unresolved function probe.vardie/typediechainfrom ever reaching
translate_fields, the patch also incidentallyneutralizes the previously unpatched
translate_fieldsinfinite-loop hazard for this failure mode.
produce equal-size JSONs (30 253 bytes) and neither hits the
failure-only branches.
DW_OP_call_frame_cfawith no usable CFI) and Issue 3(
DW_OP_fbregwith a failingdwarf_getlocation_addr) were notobserved in this environment, because Ceph 19.2.3's debug info
provides usable
.eh_frameand frame_base location lists. Thepatch's NULL / uninitialized guards on those code paths are
reviewed by inspection only; no regression observed.
translate_fieldsinfinite-loop hazard observed duringbaseline repro is not fixed by this patch; this patch only
prevents the upstream garbage from ever reaching that switch. A
dedicated fix to the
translate_fieldsswitch is left for aseparate follow-up.