enable btas::Tensor as inner tile of TA::Tensor-of-Tensor + bump BTAS pin#546
Open
evaleev wants to merge 3 commits into
Open
enable btas::Tensor as inner tile of TA::Tensor-of-Tensor + bump BTAS pin#546evaleev wants to merge 3 commits into
evaleev wants to merge 3 commits into
Conversation
BTAS 7e64fbad adds a (Range, F) ctor on btas::Tensor that mirrors TA::Tensor's range+lambda ctor. Needed for tile-type-agnostic per-index inner-tile construction (e.g. MPQC's jacobi_update for btas-inner ToT amplitudes).
Closes the remaining gaps so a TA::DistArray<TA::Tensor<btas::Tensor<T>>,
Policy> (btas-inner ToT) is usable end-to-end alongside the existing
TA-inner ToT, including through einsum's ToT * T and ToT * ToT paths.
The outer tile is always TA::Tensor; btas::Tensor is *inner-only* and
remains free of outer-tile operations (permute/reshape/batch/...).
Notable changes:
- external/btas.h:
- nested_rank<btas::Tensor<...>> partial spec so einsum's
MaxNestedArray correctly classifies btas-inner ToT.
- 6-arg gemm(alpha, A, B, beta, C, helper) overload matching the
TA::Tensor signature (the existing 5-arg form is accumulate-only).
- size_of<MemorySpace>(btas::Tensor) in namespace btas so ADL finds
it from TA::Tensor's recursive size_of when the inner tile is btas.
- tensor/operators.h + new operators_body.ipp:
- Lift T+T/T-T/T*T/-T/T*N/N*T/T+=T/T-=T/T*=T/T*=N/Perm*T and the
contiguous-tensor operator<< into a shared .ipp body, included
once into namespace TiledArray and once into namespace btas, gated
by disjoint per-namespace ta_ops_match_tensor_v predicates. Fixes
ADL of these operators inside TA::Tensor's lambdas for btas inner.
- tensor/print.h + .ipp:
- Split NDArrayPrinter::print's Index template param into
ExtentIndex / StrideIndex so btas::Tensor (whose range exposes
extent_data as unsigned-long* but stride_data as long-long*)
drives the printer.
- tensor/tensor.h:
- Fix broken SFINAE in Tensor::subt template
(typename = std::enable_if<...> missing the _t).
- tensor/type_traits.h:
- Rewrite result_tensor_helper to derive the result via
TensorA::rebind_t<numeric_type> (the "rebind allocator on numeric"
operation both TA::Tensor and btas::Tensor expose). Drops the
requirement that the input expose allocator_type.
- tensor/kernels.h:
- tensor_contract / tensor_hadamard: member-style .permute(),
.mult(), .mult_to() -> free CPO calls so ADL dispatches via
namespace btas for btas tiles and tile_op/tile_interface.h for TA.
- einsum/tiledarray.h:
- DeNestedArray<Array> for ToT inputs now wraps the inner numeric
in TA::Tensor rather than re-using the inner tile type as a new
outer tile (preserves the "btas is inner-only" rule across
DeNest).
- sum_tot_2_tos lambda likewise produces TA::Tensor<numeric_type>,
and replaces the tot(ix).sum() member with an unqualified sum()
so ADL finds the right overload (free fn in TA / namespace btas).
- dist_array.h:
- volume(): reduce_op uses arg->total_size() when the inner tile
exposes it (TA::Tensor), else falls back to arg->size() (btas).
- tests/btas_zb_inner_tile.cpp (+ tests/CMakeLists.txt):
- Sniff tests instantiating TA::Tensor<btas::Tensor<int,
btas::zb::RangeNd<>, ...>> as the ToT inner tile, exercising
subt / add / scale through the cross-namespace operator path.
2 tasks
Two fixes needed by MPQC's CSV CCk validation tests (which exercise TA::Tensor<btas::Tensor<T>> inner tiles end-to-end): 1. external/btas.h: mirror TA::Tensor's empty-/null-argument early-exits in the btas free-function suite (add/add_to/subt/subt_to/mult/mult_to, their factored and permuted variants, scale/scale_to/neg/neg_to). TA's ToT lambdas default-construct result inner tiles and accumulate into them, so consumers regularly hit the empty-result-then-add_to(arg) pattern; without these guards, TA's congruent-range assertion fires inside the underlying TensorInterface machinery. 2. tensor/tensor_interface.h: replace range_.includes(index_ordinal) with range_.includes_ordinal(index_ordinal) in TensorInterface's operator[] / at_ordinal. The integral includes(Ordinal) overload on TA::Range asserts rank!=1 to disambiguate from the includes(Index) coordinate-tuple form; TI's ordinal-lookup path was inadvertently tripping that for rank-1 inner tiles (e.g. PNO-basis 1-D energies).
evaleev
added a commit
to ValeevGroup/SeQuant
that referenced
this pull request
May 12, 2026
…n array The "regular" (non-nested) companion array used in ToT * T einsum contractions had its outer tile set to the *inner* tile type of the input ToT array. For TA-inner ToT that happens to work (TA::Tensor is a valid outer tile too), but for btas-inner ToT it produced a DistArray<btas::Tensor, ...> whose tile lacks the outer-tile API einsum needs (permute/reshape/batch/...). The outer tile must always be TA::Tensor; inner-tile types like btas::Tensor are *inner-only*. Fix compatible_regular_distarray_type to wrap the inner's numeric type in TA::Tensor. Also bumps the TiledArray pin to the version that lifts the matching TA-side restrictions (ValeevGroup/tiledarray#546).
There was a problem hiding this comment.
Pull request overview
This PR bumps the pinned BTAS revision and extends TiledArray’s Tensor-of-Tensor (ToT) plumbing so btas::Tensor can be used as the inner tile type (with TA::Tensor remaining the outer tile), including through TA::einsum paths.
Changes:
- Update BTAS integration to support
btas::Tensoras ToT inner tiles (traits, ADL hooks,gemm,size_of, congruency helpers forbtas::zb::RangeNd). - Refactor element-wise tensor operators into a shared include (
operators_body.ipp) injected into bothnamespace TiledArrayandnamespace btas. - Add a serial “sniff test” covering construction and basic ToT ops with
btas::zb::RangeNdinner tiles.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/CMakeLists.txt | Adds the new btas inner-tile test to the test suite. |
| tests/btas_zb_inner_tile.cpp | New compile/runtime sniff tests for btas zb::RangeNd inner tiles in ToT. |
| src/TiledArray/tensor/type_traits.h | Adds operator gating predicate; changes result-tensor deduction to use rebind_t. |
| src/TiledArray/tensor/tensor.h | Fixes SFINAE (enable_if_t) in Tensor::subt template. |
| src/TiledArray/tensor/tensor_interface.h | Uses includes_ordinal() for ordinal bounds asserts; minor formatting. |
| src/TiledArray/tensor/print.h | Splits printer index types into independently deducible extent/stride index types. |
| src/TiledArray/tensor/print.ipp | Updates printer template definitions to match the new extent/stride index parameters. |
| src/TiledArray/tensor/operators.h | Includes shared operator body; keeps TA-specific operators here. |
| src/TiledArray/tensor/operators_body.ipp | New shared implementation of element-wise ops and contiguous operator<<. |
| src/TiledArray/tensor/kernels.h | Switches to free-function CPO calls (permute, mult, mult_to) for broader tile support. |
| src/TiledArray/external/btas.h | Adds zb-range congruency/Range conversion, btas free ops with empty-handling, gemm overload, nested-rank, ADL size_of, and btas-side operator injection. |
| src/TiledArray/einsum/tiledarray.h | Adjusts denesting for ToT to always yield TA::Tensor<numeric>; uses ADL-friendly free ops in inner loops. |
| src/TiledArray/dist_array.h | Makes volume() reduction prefer total_size() when available, else size(). |
| external/versions.cmake | Bumps the tracked BTAS git tag. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
493
to
+515
| template <typename Op, typename TensorA, typename TensorB, | ||
| typename Allocator = void, | ||
| typename = std::enable_if_t<is_nested_tensor_v<TensorA, TensorB>>> | ||
| struct result_tensor_helper { | ||
| private: | ||
| using TensorA_ = std::remove_reference_t<TensorA>; | ||
| using TensorB_ = std::remove_reference_t<TensorB>; | ||
| using value_type_A = typename TensorA_::value_type; | ||
| using value_type_B = typename TensorB_::value_type; | ||
| using allocator_type_A = typename TensorA_::allocator_type; | ||
| using allocator_type_B = typename TensorB_::allocator_type; | ||
|
|
||
| public: | ||
| using numeric_type = binop_result_t<Op, value_type_A, value_type_B>; | ||
| using allocator_type = | ||
| std::conditional_t<std::is_same_v<void, Allocator> && | ||
| std::is_same_v<allocator_type_A, allocator_type_B>, | ||
| allocator_type_A, Allocator>; | ||
|
|
||
| // Result tensor type stays in TensorA's family with the allocator rebound to | ||
| // hold `numeric_type`. Both TA::Tensor and btas::Tensor expose this as | ||
| // `rebind_t<U>` (TA::Tensor via std::allocator_traits::rebind_alloc; btas | ||
| // via storage_traits::rebind_t). An explicit @tparam Allocator override only | ||
| // applies when TensorA is a TA::Tensor. | ||
| using result_type = | ||
| std::conditional_t<std::is_same_v<void, allocator_type>, | ||
| TA::Tensor<numeric_type>, | ||
| TA::Tensor<numeric_type, allocator_type>>; | ||
| std::conditional_t<std::is_same_v<void, Allocator> || | ||
| !is_ta_tensor_v<TensorA_>, | ||
| typename TensorA_::template rebind_t<numeric_type>, | ||
| TA::Tensor<numeric_type, Allocator>>; |
Comment on lines
+120
to
+125
| /// Predicate used by the shared operator body in | ||
| /// @c TiledArray/tensor/operators_body.inl to gate the element-wise tensor | ||
| /// operators that are injected into @c namespace TiledArray . The btas-side | ||
| /// copy of the same operators (in @c external/btas.h) partial-specializes | ||
| /// this predicate to @c std::false_type for @c btas::Tensor so the two | ||
| /// namespaces' operators stay non-overlapping under ADL. |
Comment on lines
+93
to
105
| // | ||
| // @c ExtentIndex and @c StrideIndex are independently deducible so callers | ||
| // can pass arrays of different integer types — needed for @c btas::Tensor , | ||
| // whose range exposes @c extent_data() as @c unsigned-long* but | ||
| // @c stride_data() as @c long-long* . | ||
| template <typename T, typename Char = char, | ||
| typename Index = Range1::index1_type, | ||
| typename ExtentIndex = Range1::index1_type, | ||
| typename StrideIndex = ExtentIndex, | ||
| typename CharTraits = std::char_traits<Char>> | ||
| void print(const T* data, const std::size_t order, const Index* extents, | ||
| const Index* strides, std::basic_ostream<Char, CharTraits>& os, | ||
| void print(const T* data, const std::size_t order, const ExtentIndex* extents, | ||
| const StrideIndex* strides, | ||
| std::basic_ostream<Char, CharTraits>& os, | ||
| std::size_t extra_indentation = 0); |
Comment on lines
+32
to
+34
| // Sanity-check the size claim — fail loudly here if range layout drifts. | ||
| static_assert(sizeof(btas::zb::RangeNd<>) == 14, | ||
| "zb::RangeNd default layout must remain 14 bytes"); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
7e64fbad(adds aTensor(Range, F)ctor onbtas::TensormirroringTA::Tensor's range+lambda ctor).TA::DistArray<TA::Tensor<btas::Tensor<T>>, Policy>(btas-inner ToT) is usable end-to-end alongside the existing TA-inner ToT, including throughTA::einsum'sToT * TandToT * ToTpaths. The outer tile is alwaysTA::Tensor;btas::Tensoris inner-only and remains free of outer-tile operations (permute/reshape/batch/...).Notable TA-side changes
external/btas.h:nested_rank<btas::Tensor<...>>partial spec (so einsum'sMaxNestedArrayclassifies btas-inner ToT correctly); 6-arggemm(alpha, A, B, beta, C, helper)overload matching theTA::Tensorsignature;size_of<MemorySpace>(btas::Tensor)innamespace btasso ADL finds it fromTA::Tensor's recursivesize_of.tensor/operators.h+ newtensor/operators_body.ipp: lift the element-wise ops (T+T/T-T/T*T/-T/T*N/N*T/in-place variants/Perm*T/operator<<) into a shared.ippbody, included once intonamespace TiledArrayand once intonamespace btas, gated by disjoint per-namespaceta_ops_match_tensor_vpredicates.tensor/print.h+.ipp: splitNDArrayPrinter::print'sIndextemplate param intoExtentIndex/StrideIndex(btas::Tensor's range exposesextent_dataasunsigned long*butstride_dataaslong long*).tensor/tensor.h: fix broken SFINAE inTensor::subttemplate (missing_tonenable_if).tensor/type_traits.h: rewriteresult_tensor_helperto derive the result viaTensorA::rebind_t<numeric_type>(the "rebind allocator on numeric" operation bothTA::Tensorandbtas::Tensorexpose). Drops the requirement that the input exposeallocator_type.tensor/kernels.h:tensor_contract/tensor_hadamarduse free CPO calls instead of.permute()/.mult()/.mult_to()member calls.einsum/tiledarray.h:DeNestedArray<Array>for ToT inputs wraps the inner numeric inTA::Tensor(not the inner tile type);sum_tot_2_tosproducesTA::Tensor<numeric_type>and uses unqualifiedsum()so ADL picks the right overload.dist_array.h:volume()'s reduce usesarg->total_size()when the inner tile exposes it, else falls back toarg->size().New test
tests/btas_zb_inner_tile.cpp: sniff tests instantiatingTA::Tensor<btas::Tensor<int, btas::zb::RangeNd<>, ...>>as the ToT inner tile, exercising subt/add/scale through the cross-namespace operator path.Test plan
check_serial-tiledarray(np=1) passes.check-tiledarray(np=1 + np=2) passes.