Skip to content

Hublabel#4870

Open
electricEpilith wants to merge 95 commits into
masterfrom
hublabel
Open

Hublabel#4870
electricEpilith wants to merge 95 commits into
masterfrom
hublabel

Conversation

@electricEpilith
Copy link
Copy Markdown

Changelog Entry

To be copied to the draft changelog by merger:

  • Add hub labeling to distance index, which allows efficient exact shortest distance queries even in "oversized" snarls
  • Bug fix for minimizer, significant speed improvement

Description

Adds hub labeling functionality to the snarl distance index.

electricEpilith and others added 30 commits November 12, 2025 14:32
@adamnovak
Copy link
Copy Markdown
Member

This goes with vgteam/libbdsg#239, for reference.

Copy link
Copy Markdown
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of breaking up all the stuff in snarl_distance_index.cpp into more distinct units. But I think the design and especially documentation of the units needs work, and we also need some overarching organization of those units that isn't just several cpp files that all belong to one header.

I think some of these pieces might make sense as a few different algorithms in vg::algorithms. There each algorithm could have its own header defining the interface, and then all the helper stuff to power each could be nicely tucked away in the cpp file for just that algorithm.

If instead we want to keep them organized under the theme of building a snarl distance index, we might give them a folder and a namespace that reflects that, and then again give each piece its own header. Then that whole namespace would have something that sort of constitutes its external interface (populate_snarl_index()?) and around there we would have some documentation on how the different responsibilities are spread across these modules.

Comment thread src/snarl_distance_index_child_graph.hpp Outdated
Comment thread src/snarl_distance_index_child_graph.hpp Outdated
Comment thread src/snarl_distance_index_child_graph.hpp Outdated
Comment thread src/snarl_distance_index_child_graph.hpp Outdated
Comment thread src/snarl_distance_index_child_graph.hpp Outdated
Comment thread src/snarl_distance_index_build.cpp Outdated
Comment thread src/snarl_distance_index_build.cpp Outdated
Comment thread src/snarl_distance_index_build.cpp Outdated
Comment thread src/snarl_distance_index_build.cpp Outdated
Comment thread src/snarl_distance_index_build.cpp Outdated
@adamnovak
Copy link
Copy Markdown
Member

I think there's also still outstanding stuff to do in vgteam/libbdsg#239 around dropping code from the oracle algorithms that aren't being used, and making sure we have a fresh file format version number, which I think is covered by the current review there.

It looks like CI is failing here because the hash-constraining tests weren't actually dropped yet.

@electricEpilith
Copy link
Copy Markdown
Author

electricEpilith commented May 8, 2026

Dropped the hash tests. Moved refactor commits to hublabel-refactor.

@electricEpilith electricEpilith marked this pull request as draft May 8, 2026 21:37
@adamnovak adamnovak marked this pull request as ready for review May 15, 2026 19:26
Copy link
Copy Markdown
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good. I found a couple places where it's not quite done and made suggestions.

I think the libbdsg changes also need to be finished, and then this needs to be made to point at the final libbdsg commit. Right now it's behind vgteam/libbdsg#239 by a few commits.

Comment thread src/subcommand/minimizer_main.cpp Outdated
Comment thread src/gbwtgraph_helper.cpp Outdated
Comment thread src/snarl_distance_index.cpp Outdated
electricEpilith and others added 11 commits May 23, 2026 15:11
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Adam Novak <anovak@soe.ucsc.edu>
Copy link
Copy Markdown
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good now. The only thing I think we really need to change is making sure we point at the actual libbdsg commit we want.

Comment thread src/subcommand/bench_dist_query_main.cpp Outdated
Comment thread deps/libbdsg
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be pulling in vgteam/libbdsg@e0adffb which makes a breaking change to the distance index format. So we definitely don't want to merge this without that change.

Co-authored-by: Adam Novak <anovak@soe.ucsc.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants