Add btas::zb::RangeNd for slimmer zero-based ranges#187
Open
evaleev wants to merge 6 commits into
Open
Conversation
Tensor-of-Tensor applications use inner tiles whose Range is structurally degenerate (lobound is always 0, upbound equals extent, strides are derivable, offset is 0, always contiguous) and small (10-100 elements, rank <= 6). The general-purpose RangeNd pays ~304 B for capabilities the inner case never uses, dominating per-tile memory. btas::zb::RangeNd<MaxRank=6, Ext=int16_t, Ord=int32_t> stores only the extents (std::array<Ext, MaxRank>) plus a 1 B rank, fitting in 14 B (align 2). Strides are synthesized on demand via ordinal_view returned by value; lobound() materializes zeros from a static buffer. Models TWG.BoxRange via range_traits, is_index, and boxrange_iteration_order specializations, and provides MADNESS archive load/store that serializes rank + extents only (ordinal is derivable). Unit tests in unittest/zb_range_test.cc cover sizeof/alignof contracts, concept membership, construction, ordinal mapping, iteration, equality, swap, and non-default template parameters (6 cases, 60 assertions).
Two follow-ups discovered while wiring btas::Tensor with zb::RangeNd as the inner tile in TiledArray ToT tests: * btas::Tensor's (range, storage) ctor instantiates range_type(range.lobound(), range.upbound()) even when the runtime branch wouldn't take it, so zb::RangeNd needs that constructor to satisfy template instantiation. The ctor asserts that lobound is all zeros (zero-based ranges) and takes upbound as the extent. * Replace plain assert() with BTAS_ASSERT per BTAS convention, and include <btas/error.h> for it. All 60 zb::RangeNd unit-test assertions still pass.
Mirrors TA::Tensor's range+lambda ctor: builds each element from `gen` called on the element's multi-index. Useful when a btas::Tensor is the inner tile in a TiledArray ToT and generic code (e.g. MPQC's jacobi_update) wants tile-type-agnostic per-index construction. SFINAE keeps the new ctor distinct from existing (Range, value), (Range, iterator), and (Range, Storage) overloads via is_invocable_r_v<value_type, F, *Range::begin()>.
There was a problem hiding this comment.
Pull request overview
This PR introduces a compact, zero-based range type (btas::zb::RangeNd) intended to dramatically reduce per-tile range overhead for small-rank inner tiles (e.g., Tensor-of-Tensor use cases), and adds a new btas::Tensor constructor that can populate values via a generator called on each multi-index.
Changes:
- Added
btas::zb::RangeNd(and supportingbtas::zb::index/ordinal_view) plus BTAS/MADNESS trait & archive support. - Added unit tests for
zb::RangeNd, and extended existing tensor tests to cover a generator-based constructor. - Wired the new unit test into the unittest CMake target and added a generator-based
Tensor(range, gen)constructor.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
btas/zb/range.h |
Adds packed zero-based RangeNd, iteration/ordinal synthesis, traits, and MADNESS archive load/store. |
btas/tensor.h |
Adds Tensor(const Range&, F&& gen) constructor to populate storage from a per-index generator. |
unittest/zb_range_test.cc |
New unit test suite validating size/alignment, concept membership, accessors, ordinal mapping, iteration, and swap/equality. |
unittest/tensor_test.cc |
Adds coverage for the new generator-based tensor constructor. |
unittest/CMakeLists.txt |
Adds zb_range_test.cc to the unit test build. |
Comments suppressed due to low confidence (4)
btas/zb/range.h:38
- btas::zb::index stores size_ in a std::uint8_t, but there is no constraint preventing MaxRank > 255. Instantiating index<MaxRank,...> with MaxRank >= 256 will truncate sizes and can lead to out-of-bounds access. Add a static_assert(MaxRank < 256) (or use a wider size_ type) to enforce the invariant.
template <std::size_t MaxRank, typename Int>
class index {
btas/zb/range.h:403
- ArchiveLoadImpl reads rank from the archive and then constructs extent_type ext(rank). If asserts are compiled out, a malformed archive with rank > MaxRank can lead to size truncation and out-of-bounds writes/reads. Please validate rank <= MaxRank and fail deterministically (e.g., throw via BTAS_EXCEPTION) before allocating/filling ext.
std::uint8_t rank{};
ar& rank;
typename btas::zb::RangeNd<MaxRank, Ext, Ord>::extent_type ext(
static_cast<std::size_t>(rank));
for (std::uint8_t i = 0; i < rank; ++i) ar& ext[i];
r = btas::zb::RangeNd<MaxRank, Ext, Ord>(ext);
}
btas/zb/range.h:216
- The comment says this constructor accepts an "extent container", but the SFINAE requires btas::is_index::value, which excludes common containers like std::vector. Either widen the constraint to accept general containers (consistent with btas::zb::index's container ctor) or update the comment/API docs to reflect the actual requirement.
/// Construct from an extent container (rank inferred from size).
template <typename C,
typename = std::enable_if_t<is_index<std::decay_t<C>>::value &&
!std::is_same_v<std::decay_t<C>, RangeNd>>>
RangeNd(const C& ext) : extent_(ext) {}
btas/tensor.h:163
- The enable_if checks that gen is invocable with the input Range's iterator value type, but the implementation actually calls gen(idx) where idx comes from iterating over range_ (i.e., Tensor::range_type). If Range != range_type, this can pass the constraint but still fail to compile inside the body. Consider constraining gen against Tensor::index_type / range_type's iterator value type (or adjusting the loop to invoke gen with the same index type used in the constraint).
template <typename Range, typename F,
typename = std::enable_if_t<
btas::is_boxrange<Range>::value &&
std::is_invocable_r_v<
value_type, F,
decltype(*std::begin(std::declval<const Range&>()))>>>
Tensor(const Range& range, F&& gen)
: range_(range.lobound(), range.upbound()) {
array_adaptor<storage_type>::resize(storage_, range_.area());
auto out_it = begin();
for (auto&& idx : range_) {
*out_it++ = gen(idx);
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
zb::RangeNd intentionally stores no strides — its 14 B packed representation is the reason it exists, and contiguous row-major strides are derivable from extent alone. But BTAS's generic permute(X, p, Y), TA's tile ops, and other consumers expect r.stride() to be callable. Add a stride() member that synthesizes row-major strides on demand and returns them by value — nothing is stored, the packed footprint is preserved. Callers that need a pointer (e.g. btas::permute forwarding into btas::Range(lo, up, stride)) bind the temporary as a const& and copy. Also turn the if/else in btas::permute(X, p, Y) into if constexpr so the strided-fallback branch is discarded for ranges whose is_general_layout is true — keeps permute compilable against ranges that genuinely lack stride storage (none today, but a defensible cleanup).
Bring zb::RangeNd's template signature in line with btas::RangeNd: template <::blas::Layout _Order, typename Ext, typename Ord, std::size_t MaxRank> (layout first, MaxRank last so the common zb::RangeNd<> form stays a single token). The ordinal_view, ordinal(idx), and increment() paths became layout-aware via if constexpr — row-major iterates last dim fastest, column-major first dim fastest. range_traits and boxrange_iteration_order specializations now reflect the runtime layout. Also address review feedback / harden the zero-based path: - Add missing <algorithm> and <utility> headers in zb/range.h, and <utility> in tensor.h (no longer rely on transitive includes). - static_assert(MaxRank < 256) on btas::zb::index, mirroring the same check on RangeNd, since the size_ field is uint8_t. - BTAS_ASSERT rank <= MaxRank in the MADNESS archive load to fail deterministically on malformed archives instead of silently truncating into the uint8_t size_. - Add a column-major test case to zb_range_test.cc and update the non-default-params test to use the new template ordering.
- tensor.h (range, generator) ctor: dispatch via std::invoke, and constrain the generator against range_type's iteration value (not the input Range's), so the SFINAE-accepted set matches what compiles in the body. - zb::RangeNd (lobound, upbound) ctor: assert rank-equality before checking the all-zero lobound condition, so a size mismatch fails loudly instead of silently extending extent_ from upbound alone. - zb::RangeNd::zero_buffer comment: "first rank() bytes" -> "elements". - zb_range_test.cc: drop the unused `prev` index in the column-major test and instead validate that traversal is in column-major order by checking r.ordinal(idx) == iteration count for each visited index.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
btas::zb::RangeNd<MaxRank=6, Ext=int16_t, Ord=int32_t>— a zero-based, packedRangeNdfor Tensor-of-Tensor inner tiles wherelobound ≡ 0,upbound ≡ extent, strides are derivable, offset is 0, and the rank is small (≤ 6).std::array<Ext, MaxRank> extent_+uint8_t rank_.sizeof = 14 B,align = 2, vs.~304 Bfor the general-purposebtas::Range.ordinal_viewvalue returned fromRangeNd::ordinal();lobound()materializes zeros from a static buffer.TWG.BoxRange(range_traits,is_index,boxrange_iteration_orderspecializations) and provides MADNESSArchiveLoadImpl/ArchiveStoreImplthat serialize rank + extents only (ordinal is derivable).TA::Tensor<btas::Tensor<T, btas::zb::RangeNd<>, …>>) where per-inner-tile range overhead dominates.Test plan
unittest/zb_range_test.cc— 6 test cases / 60 assertions covering sizeof/alignof contracts, concept membership, construction (default → rank-0 +area()==0, variadic, initializer-list, from container), lobound/upbound/extent accessors, row-major ordinal formula andordinal_viewstrides, index iteration order, equality, swap, and non-default template parameters.btas_testbuilds clean on macOS (arm64, clang + C++20).All tests passed (60 assertions in 6 test cases).