enable btas::Tensor as inner tile of TA::Tensor-of-Tensor + bump BTAS pin by evaleev · Pull Request #546 · ValeevGroup/tiledarray

evaleev · 2026-05-12T20:37:10Z

Summary

Bumps BTAS pin to 7e64fbad (adds a Tensor(Range, F) ctor on btas::Tensor mirroring TA::Tensor's range+lambda ctor).
Closes the remaining gaps so a TA::DistArray<TA::Tensor<btas::Tensor<T>>, Policy> (btas-inner ToT) is usable end-to-end alongside the existing TA-inner ToT, including through TA::einsum's ToT * T and ToT * ToT paths. The outer tile is always TA::Tensor; btas::Tensor is inner-only and remains free of outer-tile operations (permute/reshape/batch/...).

Notable TA-side changes

external/btas.h: nested_rank<btas::Tensor<...>> partial spec (so einsum's MaxNestedArray classifies btas-inner ToT correctly); 6-arg gemm(alpha, A, B, beta, C, helper) overload matching the TA::Tensor signature; size_of<MemorySpace>(btas::Tensor) in namespace btas so ADL finds it from TA::Tensor's recursive size_of.
tensor/operators.h + new tensor/operators_body.ipp: lift the element-wise ops (T+T/T-T/T*T/-T/T*N/N*T/in-place variants/Perm*T/operator<<) into a shared .ipp body, included once into namespace TiledArray and once into namespace btas, gated by disjoint per-namespace ta_ops_match_tensor_v predicates.
tensor/print.h + .ipp: split NDArrayPrinter::print's Index template param into ExtentIndex / StrideIndex (btas::Tensor's range exposes extent_data as unsigned long* but stride_data as long long*).
tensor/tensor.h: fix broken SFINAE in Tensor::subt template (missing _t on enable_if).
tensor/type_traits.h: rewrite result_tensor_helper to derive the result via TensorA::rebind_t<numeric_type> (the "rebind allocator on numeric" operation both TA::Tensor and btas::Tensor expose). Drops the requirement that the input expose allocator_type.
tensor/kernels.h: tensor_contract / tensor_hadamard use free CPO calls instead of .permute() / .mult() / .mult_to() member calls.
einsum/tiledarray.h: DeNestedArray<Array> for ToT inputs wraps the inner numeric in TA::Tensor (not the inner tile type); sum_tot_2_tos produces TA::Tensor<numeric_type> and uses unqualified sum() so ADL picks the right overload.
dist_array.h: volume()'s reduce uses arg->total_size() when the inner tile exposes it, else falls back to arg->size().

New test

tests/btas_zb_inner_tile.cpp: sniff tests instantiating TA::Tensor<btas::Tensor<int, btas::zb::RangeNd<>, ...>> as the ToT inner tile, exercising subt/add/scale through the cross-namespace operator path.

Test plan

check_serial-tiledarray (np=1) passes.
check-tiledarray (np=1 + np=2) passes.
No regression in MPQC downstream (PNO/CSV CC paths build and run).

BTAS 7e64fbad adds a (Range, F) ctor on btas::Tensor that mirrors TA::Tensor's range+lambda ctor. Needed for tile-type-agnostic per-index inner-tile construction (e.g. MPQC's jacobi_update for btas-inner ToT amplitudes).

Closes the remaining gaps so a TA::DistArray<TA::Tensor<btas::Tensor<T>>, Policy> (btas-inner ToT) is usable end-to-end alongside the existing TA-inner ToT, including through einsum's ToT * T and ToT * ToT paths. The outer tile is always TA::Tensor; btas::Tensor is *inner-only* and remains free of outer-tile operations (permute/reshape/batch/...). Notable changes: - external/btas.h: - nested_rank<btas::Tensor<...>> partial spec so einsum's MaxNestedArray correctly classifies btas-inner ToT. - 6-arg gemm(alpha, A, B, beta, C, helper) overload matching the TA::Tensor signature (the existing 5-arg form is accumulate-only). - size_of<MemorySpace>(btas::Tensor) in namespace btas so ADL finds it from TA::Tensor's recursive size_of when the inner tile is btas. - tensor/operators.h + new operators_body.ipp: - Lift T+T/T-T/T*T/-T/T*N/N*T/T+=T/T-=T/T*=T/T*=N/Perm*T and the contiguous-tensor operator<< into a shared .ipp body, included once into namespace TiledArray and once into namespace btas, gated by disjoint per-namespace ta_ops_match_tensor_v predicates. Fixes ADL of these operators inside TA::Tensor's lambdas for btas inner. - tensor/print.h + .ipp: - Split NDArrayPrinter::print's Index template param into ExtentIndex / StrideIndex so btas::Tensor (whose range exposes extent_data as unsigned-long* but stride_data as long-long*) drives the printer. - tensor/tensor.h: - Fix broken SFINAE in Tensor::subt template (typename = std::enable_if<...> missing the _t). - tensor/type_traits.h: - Rewrite result_tensor_helper to derive the result via TensorA::rebind_t<numeric_type> (the "rebind allocator on numeric" operation both TA::Tensor and btas::Tensor expose). Drops the requirement that the input expose allocator_type. - tensor/kernels.h: - tensor_contract / tensor_hadamard: member-style .permute(), .mult(), .mult_to() -> free CPO calls so ADL dispatches via namespace btas for btas tiles and tile_op/tile_interface.h for TA. - einsum/tiledarray.h: - DeNestedArray<Array> for ToT inputs now wraps the inner numeric in TA::Tensor rather than re-using the inner tile type as a new outer tile (preserves the "btas is inner-only" rule across DeNest). - sum_tot_2_tos lambda likewise produces TA::Tensor<numeric_type>, and replaces the tot(ix).sum() member with an unqualified sum() so ADL finds the right overload (free fn in TA / namespace btas). - dist_array.h: - volume(): reduce_op uses arg->total_size() when the inner tile exposes it (TA::Tensor), else falls back to arg->size() (btas). - tests/btas_zb_inner_tile.cpp (+ tests/CMakeLists.txt): - Sniff tests instantiating TA::Tensor<btas::Tensor<int, btas::zb::RangeNd<>, ...>> as the ToT inner tile, exercising subt / add / scale through the cross-namespace operator path.

Two fixes needed by MPQC's CSV CCk validation tests (which exercise TA::Tensor<btas::Tensor<T>> inner tiles end-to-end): 1. external/btas.h: mirror TA::Tensor's empty-/null-argument early-exits in the btas free-function suite (add/add_to/subt/subt_to/mult/mult_to, their factored and permuted variants, scale/scale_to/neg/neg_to). TA's ToT lambdas default-construct result inner tiles and accumulate into them, so consumers regularly hit the empty-result-then-add_to(arg) pattern; without these guards, TA's congruent-range assertion fires inside the underlying TensorInterface machinery. 2. tensor/tensor_interface.h: replace range_.includes(index_ordinal) with range_.includes_ordinal(index_ordinal) in TensorInterface's operator[] / at_ordinal. The integral includes(Ordinal) overload on TA::Range asserts rank!=1 to disambiguate from the includes(Index) coordinate-tuple form; TI's ordinal-lookup path was inadvertently tripping that for rank-1 inner tiles (e.g. PNO-basis 1-D energies).

…n array The "regular" (non-nested) companion array used in ToT * T einsum contractions had its outer tile set to the *inner* tile type of the input ToT array. For TA-inner ToT that happens to work (TA::Tensor is a valid outer tile too), but for btas-inner ToT it produced a DistArray<btas::Tensor, ...> whose tile lacks the outer-tile API einsum needs (permute/reshape/batch/...). The outer tile must always be TA::Tensor; inner-tile types like btas::Tensor are *inner-only*. Fix compatible_regular_distarray_type to wrap the inner's numeric type in TA::Tensor. Also bumps the TiledArray pin to the version that lifts the matching TA-side restrictions (ValeevGroup/tiledarray#546).

Copilot

Pull request overview

This PR bumps the pinned BTAS revision and extends TiledArray’s Tensor-of-Tensor (ToT) plumbing so btas::Tensor can be used as the inner tile type (with TA::Tensor remaining the outer tile), including through TA::einsum paths.

Changes:

Update BTAS integration to support btas::Tensor as ToT inner tiles (traits, ADL hooks, gemm, size_of, congruency helpers for btas::zb::RangeNd).
Refactor element-wise tensor operators into a shared include (operators_body.ipp) injected into both namespace TiledArray and namespace btas.
Add a serial “sniff test” covering construction and basic ToT ops with btas::zb::RangeNd inner tiles.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/CMakeLists.txt	Adds the new btas inner-tile test to the test suite.
tests/btas_zb_inner_tile.cpp	New compile/runtime sniff tests for btas `zb::RangeNd` inner tiles in ToT.
src/TiledArray/tensor/type_traits.h	Adds operator gating predicate; changes result-tensor deduction to use `rebind_t`.
src/TiledArray/tensor/tensor.h	Fixes SFINAE (`enable_if_t`) in `Tensor::subt` template.
src/TiledArray/tensor/tensor_interface.h	Uses `includes_ordinal()` for ordinal bounds asserts; minor formatting.
src/TiledArray/tensor/print.h	Splits printer index types into independently deducible extent/stride index types.
src/TiledArray/tensor/print.ipp	Updates printer template definitions to match the new extent/stride index parameters.
src/TiledArray/tensor/operators.h	Includes shared operator body; keeps TA-specific operators here.
src/TiledArray/tensor/operators_body.ipp	New shared implementation of element-wise ops and contiguous `operator<<`.
src/TiledArray/tensor/kernels.h	Switches to free-function CPO calls (`permute`, `mult`, `mult_to`) for broader tile support.
src/TiledArray/external/btas.h	Adds zb-range congruency/Range conversion, btas free ops with empty-handling, `gemm` overload, nested-rank, ADL `size_of`, and btas-side operator injection.
src/TiledArray/einsum/tiledarray.h	Adjusts denesting for ToT to always yield `TA::Tensor<numeric>`; uses ADL-friendly free ops in inner loops.
src/TiledArray/dist_array.h	Makes `volume()` reduction prefer `total_size()` when available, else `size()`.
external/versions.cmake	Bumps the tracked BTAS git tag.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

 template <typename Op, typename TensorA, typename TensorB,
          typename Allocator = void,
          typename = std::enable_if_t<is_nested_tensor_v<TensorA, TensorB>>>
 struct result_tensor_helper {
 private:
  using TensorA_ = std::remove_reference_t<TensorA>;
  using TensorB_ = std::remove_reference_t<TensorB>;
  using value_type_A = typename TensorA_::value_type;
  using value_type_B = typename TensorB_::value_type;
-  using allocator_type_A = typename TensorA_::allocator_type;
-  using allocator_type_B = typename TensorB_::allocator_type;

 public:
  using numeric_type = binop_result_t<Op, value_type_A, value_type_B>;
-  using allocator_type =
-      std::conditional_t<std::is_same_v<void, Allocator> &&
-                             std::is_same_v<allocator_type_A, allocator_type_B>,
-                         allocator_type_A, Allocator>;
+
+  // Result tensor type stays in TensorA's family with the allocator rebound to
+  // hold `numeric_type`. Both TA::Tensor and btas::Tensor expose this as
+  // `rebind_t<U>` (TA::Tensor via std::allocator_traits::rebind_alloc; btas
+  // via storage_traits::rebind_t). An explicit @tparam Allocator override only
+  // applies when TensorA is a TA::Tensor.
  using result_type =
-      std::conditional_t<std::is_same_v<void, allocator_type>,
-                         TA::Tensor<numeric_type>,
-                         TA::Tensor<numeric_type, allocator_type>>;
+      std::conditional_t<std::is_same_v<void, Allocator> ||
+                             !is_ta_tensor_v<TensorA_>,
+                         typename TensorA_::template rebind_t<numeric_type>,
+                         TA::Tensor<numeric_type, Allocator>>;


+/// Predicate used by the shared operator body in
+/// @c TiledArray/tensor/operators_body.inl to gate the element-wise tensor
+/// operators that are injected into @c namespace TiledArray . The btas-side
+/// copy of the same operators (in @c external/btas.h) partial-specializes
+/// this predicate to @c std::false_type for @c btas::Tensor so the two
+/// namespaces' operators stay non-overlapping under ADL.


+  //
+  // @c ExtentIndex and @c StrideIndex are independently deducible so callers
+  // can pass arrays of different integer types — needed for @c btas::Tensor ,
+  // whose range exposes @c extent_data() as @c unsigned-long* but
+  // @c stride_data() as @c long-long* .
  template <typename T, typename Char = char,
-            typename Index = Range1::index1_type,
+            typename ExtentIndex = Range1::index1_type,
+            typename StrideIndex = ExtentIndex,
            typename CharTraits = std::char_traits<Char>>
-  void print(const T* data, const std::size_t order, const Index* extents,
-             const Index* strides, std::basic_ostream<Char, CharTraits>& os,
+  void print(const T* data, const std::size_t order, const ExtentIndex* extents,
+             const StrideIndex* strides,
+             std::basic_ostream<Char, CharTraits>& os,
             std::size_t extra_indentation = 0);


+// Sanity-check the size claim — fail loudly here if range layout drifts.
+static_assert(sizeof(btas::zb::RangeNd<>) == 14,
+              "zb::RangeNd default layout must remain 14 bytes");


evaleev added 2 commits May 12, 2026 16:36

build: bump BTAS pin to pick up Tensor (Range, generator) ctor

604e7a8

BTAS 7e64fbad adds a (Range, F) ctor on btas::Tensor that mirrors TA::Tensor's range+lambda ctor. Needed for tile-type-agnostic per-index inner-tile construction (e.g. MPQC's jacobi_update for btas-inner ToT amplitudes).

evaleev changed the title ~~build: bump BTAS pin to pick up Tensor (Range, generator) ctor~~ enable btas::Tensor as inner tile of TA::Tensor-of-Tensor + bump BTAS pin May 12, 2026

evaleev mentioned this pull request May 12, 2026

ResultTensorOfTensorTA: outer tile of "regular" companion array must be TA::Tensor ValeevGroup/SeQuant#515

Open

2 tasks

evaleev requested a review from Copilot May 12, 2026 23:50

Copilot started reviewing on behalf of evaleev May 12, 2026 23:51 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable btas::Tensor as inner tile of TA::Tensor-of-Tensor + bump BTAS pin#546

enable btas::Tensor as inner tile of TA::Tensor-of-Tensor + bump BTAS pin#546
evaleev wants to merge 3 commits into
masterfrom
evaleev/build/bump-btas-pin

evaleev commented May 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

evaleev commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Notable TA-side changes

New test

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

evaleev commented May 12, 2026 •

edited

Loading