refactor: replace deepsize crate with custom DeepSizeOf trait for Arrow-aware memory accounting#6229
Conversation
ReviewGood motivation — the external Issues to addressP0:
P1:
P1: In Minor
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
8566f00 to
1c70985
Compare
1c70985 to
23df3cb
Compare
westonpace
left a comment
There was a problem hiding this comment.
This is nice, I didn't look too closely at all the changes as I think most were find/replace.
Should we do some blanket impl like?
impl<T: lance_core::deepsize::DeepSizeOf> deepsize::DeepSizeOf for T {
...
}
Although maybe better to keep the dependency away and let callers do that themselves if they are using deepsize I suppose.
| /// | ||
| /// Generates an implementation that sums the `deep_size_of_children` of all | ||
| /// fields (for structs) or the active variant's fields (for enums). | ||
| #[proc_macro_derive(DeepSizeOf)] |
There was a problem hiding this comment.
Are there tests for this macro somewhere? I guess we get some testing via usage? You can, in theory, do integration tests with proc-macros if I recall (unit tests don't make a ton of sense)
Maybe not important for an internal utility.
There was a problem hiding this comment.
You are right, I should add some more tests.
The critical change from pub struct Context {
/// A set of all [`Arc`](std::sync::Arc)s that have already been counted
arcs: GenericSet<usize>,
/// A set of all [`Rc`](std::sync::Arc)s that have already been counted
rcs: GenericSet<usize>,
}We instead just track one list of pointers: pub struct Context {
seen: HashSet<usize>,
}This is important because Arrow doesn't give you access to the
I think this might be difficult, given I would need to create some sort of adapter between our Footnotes |
…ow-aware memory accounting The external `deepsize` crate does not correctly account for Arrow buffers that are shared across `Arc` references, causing double-counting in cache size calculations. This PR introduces a custom `DeepSizeOf` trait in `lance-core::deepsize` with a `Context` that tracks both `Arc` and raw buffer pointers in a unified `HashSet`. It also adds a `lance-derive` proc-macro crate for `#[derive(DeepSizeOf)]` and removes the dependency on the external `deepsize` crate. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
56744bc to
0445d04
Compare
westonpace
left a comment
There was a problem hiding this comment.
I suppose this is one way to work around the orphan rule 😆
DeepSizeOf is basically unmaintained these days. My PR to upgrade dependencies has been sitting for months with no response. Aeledfyr/deepsize#42 |
## Summary - Adds a `[[tool.bumpversion.files]]` entry for `lance-derive` in `.bumpversion.toml` so its workspace dependency line in `Cargo.toml` is bumped alongside the other lance crates during beta releases. ## Why The `lance-derive` crate (added in #6229) was missing from `.bumpversion.toml`. As a result, during the most recent `publish-beta` run, `[workspace.package].version` was bumped from `8.0.0-beta.7` → `8.0.0-beta.8` (which bumped `rust/lance-derive/Cargo.toml`), but the workspace dependency line in the root `Cargo.toml` stayed at `lance-derive = { version = "=8.0.0-beta.7", ... }`. Cargo then failed: ``` error: failed to select a version for the requirement `lance-derive = "=8.0.0-beta.7"` ``` Failing run: https://github.com/lance-format/lance/actions/runs/27225140622/job/80390190228 Same fix pattern as #6526 (lance-tokenizer) and #4572 (fsst, bitpacking). ## Test plan - [ ] Next `publish-beta` run succeeds and bumps `lance-derive` in lockstep with the other crates. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
The external
deepsizecrate does not correctly account for Arrow buffersthat are shared across
Arcreferences, causing double-counting in cachesize calculations.
This PR introduces a custom
DeepSizeOftrait inlance-core::deepsizewith a
Contextthat tracks bothArcand raw buffer pointers in aunified
HashSet. It also adds alance-deriveproc-macro crate for#[derive(DeepSizeOf)]and removes the dependency on the externaldeepsizecrate.Changes:
lance-core::deepsizemodule with customDeepSizeOftrait and pointer-trackingContextlance-deriveproc-macro crate with#[derive(DeepSizeOf)]deepsizecrate from all crates; update all impls to uselance_core::deepsizedeep_size_of_childrenimplementations to threadcontextthrough (previously some ignored the parameter)DeepSizeOfforarrow_buffer::ScalarBuffer<T>with proper pointer trackingPlainPostingListandPositionsto usedeep_size_of_childrenwith context