Skip to content

Add btas::zb::RangeNd for slimmer zero-based ranges#187

Open
evaleev wants to merge 6 commits into
masterfrom
ev/zb-range
Open

Add btas::zb::RangeNd for slimmer zero-based ranges#187
evaleev wants to merge 6 commits into
masterfrom
ev/zb-range

Conversation

@evaleev
Copy link
Copy Markdown
Member

@evaleev evaleev commented May 12, 2026

Summary

  • Adds btas::zb::RangeNd<MaxRank=6, Ext=int16_t, Ord=int32_t> — a zero-based, packed RangeNd for Tensor-of-Tensor inner tiles where lobound ≡ 0, upbound ≡ extent, strides are derivable, offset is 0, and the rank is small (≤ 6).
  • State is just std::array<Ext, MaxRank> extent_ + uint8_t rank_. sizeof = 14 B, align = 2, vs. ~304 B for the general-purpose btas::Range.
  • Strides are synthesized on demand via an ordinal_view value returned from RangeNd::ordinal(); lobound() materializes zeros from a static buffer.
  • Models TWG.BoxRange (range_traits, is_index, boxrange_iteration_order specializations) and provides MADNESS ArchiveLoadImpl / ArchiveStoreImpl that serialize rank + extents only (ordinal is derivable).
  • Motivated by TiledArray ToT (TA::Tensor<btas::Tensor<T, btas::zb::RangeNd<>, …>>) where per-inner-tile range overhead dominates.

Test plan

  • New unittest/zb_range_test.cc — 6 test cases / 60 assertions covering sizeof/alignof contracts, concept membership, construction (default → rank-0 + area()==0, variadic, initializer-list, from container), lobound/upbound/extent accessors, row-major ordinal formula and ordinal_view strides, index iteration order, equality, swap, and non-default template parameters.
  • btas_test builds clean on macOS (arm64, clang + C++20).
  • All zb test cases pass locally: All tests passed (60 assertions in 6 test cases).

evaleev added 2 commits May 12, 2026 11:12
Tensor-of-Tensor applications use inner tiles whose Range is structurally
degenerate (lobound is always 0, upbound equals extent, strides are
derivable, offset is 0, always contiguous) and small (10-100 elements,
rank <= 6). The general-purpose RangeNd pays ~304 B for capabilities the
inner case never uses, dominating per-tile memory.

btas::zb::RangeNd<MaxRank=6, Ext=int16_t, Ord=int32_t> stores only the
extents (std::array<Ext, MaxRank>) plus a 1 B rank, fitting in 14 B
(align 2). Strides are synthesized on demand via ordinal_view returned
by value; lobound() materializes zeros from a static buffer. Models
TWG.BoxRange via range_traits, is_index, and boxrange_iteration_order
specializations, and provides MADNESS archive load/store that
serializes rank + extents only (ordinal is derivable).

Unit tests in unittest/zb_range_test.cc cover sizeof/alignof contracts,
concept membership, construction, ordinal mapping, iteration, equality,
swap, and non-default template parameters (6 cases, 60 assertions).
Two follow-ups discovered while wiring btas::Tensor with zb::RangeNd as
the inner tile in TiledArray ToT tests:

* btas::Tensor's (range, storage) ctor instantiates
  range_type(range.lobound(), range.upbound()) even when the runtime
  branch wouldn't take it, so zb::RangeNd needs that constructor to
  satisfy template instantiation. The ctor asserts that lobound is all
  zeros (zero-based ranges) and takes upbound as the extent.

* Replace plain assert() with BTAS_ASSERT per BTAS convention, and
  include <btas/error.h> for it.

All 60 zb::RangeNd unit-test assertions still pass.
@evaleev evaleev changed the title Add btas::zb::RangeNd for packed zero-based inner-tile ranges Add btas::zb::RangeNd for slimmer zero-based ranges May 12, 2026
Mirrors TA::Tensor's range+lambda ctor: builds each element from `gen`
called on the element's multi-index. Useful when a btas::Tensor is the
inner tile in a TiledArray ToT and generic code (e.g. MPQC's
jacobi_update) wants tile-type-agnostic per-index construction.

SFINAE keeps the new ctor distinct from existing (Range, value),
(Range, iterator), and (Range, Storage) overloads via
is_invocable_r_v<value_type, F, *Range::begin()>.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a compact, zero-based range type (btas::zb::RangeNd) intended to dramatically reduce per-tile range overhead for small-rank inner tiles (e.g., Tensor-of-Tensor use cases), and adds a new btas::Tensor constructor that can populate values via a generator called on each multi-index.

Changes:

  • Added btas::zb::RangeNd (and supporting btas::zb::index / ordinal_view) plus BTAS/MADNESS trait & archive support.
  • Added unit tests for zb::RangeNd, and extended existing tensor tests to cover a generator-based constructor.
  • Wired the new unit test into the unittest CMake target and added a generator-based Tensor(range, gen) constructor.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
btas/zb/range.h Adds packed zero-based RangeNd, iteration/ordinal synthesis, traits, and MADNESS archive load/store.
btas/tensor.h Adds Tensor(const Range&, F&& gen) constructor to populate storage from a per-index generator.
unittest/zb_range_test.cc New unit test suite validating size/alignment, concept membership, accessors, ordinal mapping, iteration, and swap/equality.
unittest/tensor_test.cc Adds coverage for the new generator-based tensor constructor.
unittest/CMakeLists.txt Adds zb_range_test.cc to the unit test build.
Comments suppressed due to low confidence (4)

btas/zb/range.h:38

  • btas::zb::index stores size_ in a std::uint8_t, but there is no constraint preventing MaxRank > 255. Instantiating index<MaxRank,...> with MaxRank >= 256 will truncate sizes and can lead to out-of-bounds access. Add a static_assert(MaxRank < 256) (or use a wider size_ type) to enforce the invariant.
template <std::size_t MaxRank, typename Int>
class index {

btas/zb/range.h:403

  • ArchiveLoadImpl reads rank from the archive and then constructs extent_type ext(rank). If asserts are compiled out, a malformed archive with rank > MaxRank can lead to size truncation and out-of-bounds writes/reads. Please validate rank <= MaxRank and fail deterministically (e.g., throw via BTAS_EXCEPTION) before allocating/filling ext.
    std::uint8_t rank{};
    ar& rank;
    typename btas::zb::RangeNd<MaxRank, Ext, Ord>::extent_type ext(
        static_cast<std::size_t>(rank));
    for (std::uint8_t i = 0; i < rank; ++i) ar& ext[i];
    r = btas::zb::RangeNd<MaxRank, Ext, Ord>(ext);
  }

btas/zb/range.h:216

  • The comment says this constructor accepts an "extent container", but the SFINAE requires btas::is_index::value, which excludes common containers like std::vector. Either widen the constraint to accept general containers (consistent with btas::zb::index's container ctor) or update the comment/API docs to reflect the actual requirement.
  /// Construct from an extent container (rank inferred from size).
  template <typename C,
            typename = std::enable_if_t<is_index<std::decay_t<C>>::value &&
                                        !std::is_same_v<std::decay_t<C>, RangeNd>>>
  RangeNd(const C& ext) : extent_(ext) {}

btas/tensor.h:163

  • The enable_if checks that gen is invocable with the input Range's iterator value type, but the implementation actually calls gen(idx) where idx comes from iterating over range_ (i.e., Tensor::range_type). If Range != range_type, this can pass the constraint but still fail to compile inside the body. Consider constraining gen against Tensor::index_type / range_type's iterator value type (or adjusting the loop to invoke gen with the same index type used in the constraint).
    template <typename Range, typename F,
              typename = std::enable_if_t<
                  btas::is_boxrange<Range>::value &&
                  std::is_invocable_r_v<
                      value_type, F,
                      decltype(*std::begin(std::declval<const Range&>()))>>>
    Tensor(const Range& range, F&& gen)
        : range_(range.lobound(), range.upbound()) {
      array_adaptor<storage_type>::resize(storage_, range_.area());
      auto out_it = begin();
      for (auto&& idx : range_) {
        *out_it++ = gen(idx);
      }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread btas/zb/range.h
Comment thread btas/tensor.h
evaleev added 2 commits May 12, 2026 22:31
zb::RangeNd intentionally stores no strides — its 14 B packed
representation is the reason it exists, and contiguous row-major strides
are derivable from extent alone. But BTAS's generic permute(X, p, Y),
TA's tile ops, and other consumers expect r.stride() to be callable.

Add a stride() member that synthesizes row-major strides on demand and
returns them by value — nothing is stored, the packed footprint is
preserved. Callers that need a pointer (e.g. btas::permute forwarding
into btas::Range(lo, up, stride)) bind the temporary as a const& and
copy.

Also turn the if/else in btas::permute(X, p, Y) into if constexpr so the
strided-fallback branch is discarded for ranges whose is_general_layout
is true — keeps permute compilable against ranges that genuinely lack
stride storage (none today, but a defensible cleanup).
Bring zb::RangeNd's template signature in line with btas::RangeNd:
  template <::blas::Layout _Order, typename Ext, typename Ord, std::size_t MaxRank>
(layout first, MaxRank last so the common zb::RangeNd<> form stays a
single token). The ordinal_view, ordinal(idx), and increment() paths
became layout-aware via if constexpr — row-major iterates last dim
fastest, column-major first dim fastest. range_traits and
boxrange_iteration_order specializations now reflect the runtime
layout.

Also address review feedback / harden the zero-based path:
- Add missing <algorithm> and <utility> headers in zb/range.h, and
  <utility> in tensor.h (no longer rely on transitive includes).
- static_assert(MaxRank < 256) on btas::zb::index, mirroring the
  same check on RangeNd, since the size_ field is uint8_t.
- BTAS_ASSERT rank <= MaxRank in the MADNESS archive load to fail
  deterministically on malformed archives instead of silently
  truncating into the uint8_t size_.
- Add a column-major test case to zb_range_test.cc and update the
  non-default-params test to use the new template ordering.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Comment thread btas/tensor.h
Comment thread btas/zb/range.h Outdated
Comment thread btas/zb/range.h
Comment thread unittest/zb_range_test.cc Outdated
- tensor.h (range, generator) ctor: dispatch via std::invoke, and
  constrain the generator against range_type's iteration value (not the
  input Range's), so the SFINAE-accepted set matches what compiles in
  the body.
- zb::RangeNd (lobound, upbound) ctor: assert rank-equality before
  checking the all-zero lobound condition, so a size mismatch fails
  loudly instead of silently extending extent_ from upbound alone.
- zb::RangeNd::zero_buffer comment: "first rank() bytes" -> "elements".
- zb_range_test.cc: drop the unused `prev` index in the column-major
  test and instead validate that traversal is in column-major order
  by checking r.ordinal(idx) == iteration count for each visited index.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants