Skip to content

feat: Support GDB index#1990

Draft
lapla-cogito wants to merge 20 commits into
wild-linker:mainfrom
lapla-cogito:gdb_idx
Draft

feat: Support GDB index#1990
lapla-cogito wants to merge 20 commits into
wild-linker:mainfrom
lapla-cogito:gdb_idx

Conversation

@lapla-cogito
Copy link
Copy Markdown
Member

This patch adds support for --gdb-index and --no-gdb-index. Although we support version 9 here, the test environment may produce a version 7 GDB index when using lld. This patch also adds a test that checks whether specific symbols are included in the symbol table of the .gdb_index section. Because of this requirement, the integration test includes logic to support versions earlier than version 9 as well.

Closes #811

Comment thread libwild/src/gdb_index.rs
fallback_cu: u32,
mut on_entry: impl FnMut(&'data [u8], u32),
) {
for section_name in [".debug_gnu_pubnames", ".debug_gnu_pubtypes"] {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The symbol table construction assumes the presence of these two sections by adding the compiler flag -ggnu-pubnames (see the added integration test). Without these sections, the symbol table will not be built and only the CU list and address table will be constructed. This effectively means you won't benefit from GDB index, but since lld and mold behave this way, I've aligned the implementation.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we emit a warning to the user telling the option is leading to a GDB index result?

@lapla-cogito
Copy link
Copy Markdown
Member Author

I did git commit --amend --allow-empty because the release workflow was triggered by the previous commit (which I canceled)...
https://github.com/wild-linker/wild/actions/runs/26509302406

I have no idea what happened... 🤷

Copy link
Copy Markdown
Collaborator

@marxin marxin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the numerous findings I created - can you explain what was the genesis of the PR and how much was a LLM involved? I know we had a PR for the very same functionality that was closed right after it was opened. Is it an incarnation of the changes, or was it created completely independently?

Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs
fn parse_cu_boundaries(data: &[u8]) -> Vec<CuBoundary> {
let mut cus = Vec::new();
let mut offset = 0usize;
while offset + 4 <= data.len() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good linking a place where the CU encoding is actually defined.

Comment thread libwild/src/gdb_index.rs
4 + init_len as usize
};
if total == 0 || offset + total > data.len() {
break;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we rather report a warning as the emitted GDB index will be likely incomeplete/corrupted?

Comment thread libwild/src/gdb_index.rs
}

/// Walk `.debug_info` bytes and return `(offset, total_length)` for each CU.
fn parse_cu_boundaries(data: &[u8]) -> Vec<CuBoundary> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I am more thinking about it - in general, can't we just the gimli crate for the parsing of the various DWARF-related data structures? Like here, using https://docs.rs/gimli/latest/gimli/read/struct.DebugInfo.html#method.units. I know there might be some performance penalty, but we should definitely measure it.

Copy link
Copy Markdown
Member Author

@lapla-cogito lapla-cogito May 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this, the only function that can currently be written using gimli is this one. It feels somewhat excessive to bring in gimli just for this purpose (since constructing indices requires only offset and length)... 🤔

Comment thread libwild/src/gdb_index.rs
///
/// Each set has a header pointing to a CU in `.debug_info`, followed by
/// (die_offset, attrs_byte, NUL-terminated name) entries.
fn parse_pubnames_sets(data: &[u8]) -> Vec<PubnamesSet<'_>> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gimli doesn't appear to support GNU attributes bytes, so I think we can't use them as-is.
https://github.com/gimli-rs/gimli/blob/45dde2309bda8f385acdf0c945ebc4f275ffe124/src/read/lookup.rs#L186-L199

Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs Outdated

// Emit into the output buffer.
let total = cp_off as usize + cv_data.len() + str_data.len();
let len = buf.len().min(total);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a hard error if the allocated buffer does not match the expected written data.

Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs Outdated
@lapla-cogito
Copy link
Copy Markdown
Member Author

@marxin (I haven't seen the review details yet)

we had a PR for the very same functionality that was closed right after it was opened

I think you mean #1605. I used that as a reference when implementing this, since I had also reviewed it at the time and therefore already had some familiarity with the surrounding context and implementation details. However, I did not reuse it directly for licensing concerns (and in any case, it does not support version 9 as-is). I mainly used LLMs to help refine the implementation through review and iterative revisions.

@lapla-cogito lapla-cogito marked this pull request as draft May 27, 2026 21:30
@marxin
Copy link
Copy Markdown
Collaborator

marxin commented May 28, 2026

I think you mean #1605. I used that as a reference when implementing this, since I had also reviewed it at the time and therefore already had some familiarity with the surrounding context and implementation details.

Yes - fine, that's a good approach you chose ;)

However, I did not reuse it directly for licensing concerns (and in any case, it does not support version 9 as-is). I mainly used LLMs to help refine the implementation through review and iterative revisions.

All good! Let's iterate on the PR, given you knowledge, I guess you can address the findings pretty fast. Happy to review and help you with the feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for --gdb-index

2 participants