refactor entry#364
Conversation
The original implementation unnecessarily created a new vector of strings and performed an extra hash lookup for each entry. The new version just sorts entry iterators directly, avoiding both the string allocation and the extra lookup.
The original implementation was unnecessarily complex. Instead of using two vectors and sorting one of them, we can simply scatter the values directly into the output vector. In debug mode, a second bit-vector ensures that we're not passing in a broken index.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Refactors writer “entry/inode” plumbing toward handle/view-based APIs, modernizes algorithms to C++20 ranges, and introduces new packed/container utilities.
Changes:
- Replaces legacy writer
entry/inodetypes withentry_id/inode_id+ handle/view abstractions and adjusts dependent writer components. - Migrates many algorithms/tests from
<algorithm>iterator pairs tostd::ranges::*. - Adds new internal utilities (packed/container types, event tracer, map utilities) and updates build/test targets accordingly.
Reviewed changes
Copilot reviewed 128 out of 172 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/writer/internal/global_entry_data.cpp |
Reworks global entry indexing and adds memory-usage dumping utilities. |
include/dwarfs/writer/internal/global_entry_data.h |
Changes global entry data maps to indexed maps and introduces string_view-keyed tables. |
include/dwarfs/internal/synchronized.h |
Extends synchronized wrapper with load/release/store and a no_mutex policy. |
include/dwarfs/writer/entry_interface.h |
Shrinks the public entry_interface API surface. |
src/malloc_byte_buffer.cpp |
Changes frozen-buffer semantics for clear() and updates comments. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| void sort_and_index_map(T& map) { | ||
| using index_type = T::mapped_type; | ||
| using iterator = T::iterator; | ||
|
|
||
| std::vector<iterator> order; | ||
| order.reserve(map.size()); | ||
|
|
||
| for (auto it = map.begin(); it != map.end(); ++it) { | ||
| order.push_back(it); | ||
| } | ||
|
|
||
| std::ranges::sort(order, std::ranges::less{}, &T::value_type::first); | ||
|
|
||
| index_type ix{0}; | ||
|
|
||
| for (iterator it : order) { | ||
| it->second = ix++; | ||
| } | ||
| } |
There was a problem hiding this comment.
order stores map iterators, but the std::ranges::sort projection &T::value_type::first is applied to the iterator itself, not to *it. This will not compile (or will be ill-formed) because the iterator type has no first member. Use a projection/lambda that returns the key from the iterator (e.g., [](iterator it){ return it->first; }) or store pointers/refs to value_type instead of iterators.
| using string_view_keyed_map_type = | ||
| map_type<std::string_view, T, string_like_hash, std::equal_to<>>; | ||
|
|
||
| using string_view_index_map_type = string_view_keyed_map_type<index_type>; | ||
|
|
||
| index_map_type<uid_type> uids_; | ||
| index_map_type<gid_type> gids_; | ||
| index_map_type<mode_type> modes_; | ||
| string_view_index_map_type names_; | ||
| string_view_index_map_type symlinks_; |
There was a problem hiding this comment.
names_/symlinks_ now store keys as std::string_view, but this class does not own backing storage for those views. If callers pass views into temporary or short-lived strings, the map will retain dangling references and exhibit use-after-free bugs. Prefer owning keys (std::string with transparent hash/equal) or explicitly intern/store the strings in a stable arena owned by global_entry_data and store views into that arena.
| using string_view_keyed_map_type = | |
| map_type<std::string_view, T, string_like_hash, std::equal_to<>>; | |
| using string_view_index_map_type = string_view_keyed_map_type<index_type>; | |
| index_map_type<uid_type> uids_; | |
| index_map_type<gid_type> gids_; | |
| index_map_type<mode_type> modes_; | |
| string_view_index_map_type names_; | |
| string_view_index_map_type symlinks_; | |
| using string_keyed_map_type = | |
| map_type<std::string, T, string_like_hash, std::equal_to<>>; | |
| using string_index_map_type = string_keyed_map_type<index_type>; | |
| index_map_type<uid_type> uids_; | |
| index_map_type<gid_type> gids_; | |
| index_map_type<mode_type> modes_; | |
| string_index_map_type names_; | |
| string_index_map_type symlinks_; |
| value_type value; | ||
| { | ||
| write_lock_type lock(mx_); | ||
| value.swap(value_); |
There was a problem hiding this comment.
release() assumes value_type has a member swap(); that is not guaranteed for arbitrary T and will fail to compile for types that only support ADL swap or are merely movable. Use using std::swap; swap(value, value_); (or std::exchange(value_, value_type{}) if default-constructible) to make this generic.
| value.swap(value_); | |
| using std::swap; | |
| swap(value, value_); |
| class entry_interface { | ||
| public: | ||
| virtual std::string path_as_string() const = 0; | ||
| virtual std::string unix_dpath() const = 0; | ||
| virtual std::string const& name() const = 0; | ||
| virtual file_size_t size() const = 0; | ||
| virtual file_size_t allocated_size() const = 0; | ||
| virtual ~entry_interface() = default; | ||
|
|
||
| virtual bool is_directory() const = 0; | ||
| virtual std::string unix_dpath() const = 0; | ||
| }; |
There was a problem hiding this comment.
This change removes several previously-declared virtual methods (e.g., path_as_string(), name(), size(), allocated_size()) from a public header, which is an ABI/API breaking change for any downstream implementations/users of entry_interface. If the intent is to narrow the interface, consider either (1) introducing a new interface type and deprecating the old one, or (2) keeping the old methods with defaulted/redirected implementations until a major-version bump.
|
|
||
| void clear() override { | ||
| assert_not_frozen("clear"); | ||
| // clear *only* resets the size and does affect the allocated memory |
There was a problem hiding this comment.
The comment is missing a negation and currently reads as the opposite of the likely intent. It should say that clear does not affect the allocated memory.
| // clear *only* resets the size and does affect the allocated memory | |
| // clear *only* resets the size and does not affect the allocated memory |
|

No description provided.