Skip to content

enable btas::Tensor as inner tile of TA::Tensor-of-Tensor + bump BTAS pin#546

Open
evaleev wants to merge 3 commits into
masterfrom
evaleev/build/bump-btas-pin
Open

enable btas::Tensor as inner tile of TA::Tensor-of-Tensor + bump BTAS pin#546
evaleev wants to merge 3 commits into
masterfrom
evaleev/build/bump-btas-pin

Conversation

@evaleev
Copy link
Copy Markdown
Member

@evaleev evaleev commented May 12, 2026

Summary

  • Bumps BTAS pin to 7e64fbad (adds a Tensor(Range, F) ctor on btas::Tensor mirroring TA::Tensor's range+lambda ctor).
  • Closes the remaining gaps so a TA::DistArray<TA::Tensor<btas::Tensor<T>>, Policy> (btas-inner ToT) is usable end-to-end alongside the existing TA-inner ToT, including through TA::einsum's ToT * T and ToT * ToT paths. The outer tile is always TA::Tensor; btas::Tensor is inner-only and remains free of outer-tile operations (permute/reshape/batch/...).

Notable TA-side changes

  • external/btas.h: nested_rank<btas::Tensor<...>> partial spec (so einsum's MaxNestedArray classifies btas-inner ToT correctly); 6-arg gemm(alpha, A, B, beta, C, helper) overload matching the TA::Tensor signature; size_of<MemorySpace>(btas::Tensor) in namespace btas so ADL finds it from TA::Tensor's recursive size_of.
  • tensor/operators.h + new tensor/operators_body.ipp: lift the element-wise ops (T+T/T-T/T*T/-T/T*N/N*T/in-place variants/Perm*T/operator<<) into a shared .ipp body, included once into namespace TiledArray and once into namespace btas, gated by disjoint per-namespace ta_ops_match_tensor_v predicates.
  • tensor/print.h + .ipp: split NDArrayPrinter::print's Index template param into ExtentIndex / StrideIndex (btas::Tensor's range exposes extent_data as unsigned long* but stride_data as long long*).
  • tensor/tensor.h: fix broken SFINAE in Tensor::subt template (missing _t on enable_if).
  • tensor/type_traits.h: rewrite result_tensor_helper to derive the result via TensorA::rebind_t<numeric_type> (the "rebind allocator on numeric" operation both TA::Tensor and btas::Tensor expose). Drops the requirement that the input expose allocator_type.
  • tensor/kernels.h: tensor_contract / tensor_hadamard use free CPO calls instead of .permute() / .mult() / .mult_to() member calls.
  • einsum/tiledarray.h: DeNestedArray<Array> for ToT inputs wraps the inner numeric in TA::Tensor (not the inner tile type); sum_tot_2_tos produces TA::Tensor<numeric_type> and uses unqualified sum() so ADL picks the right overload.
  • dist_array.h: volume()'s reduce uses arg->total_size() when the inner tile exposes it, else falls back to arg->size().

New test

  • tests/btas_zb_inner_tile.cpp: sniff tests instantiating TA::Tensor<btas::Tensor<int, btas::zb::RangeNd<>, ...>> as the ToT inner tile, exercising subt/add/scale through the cross-namespace operator path.

Test plan

  • check_serial-tiledarray (np=1) passes.
  • check-tiledarray (np=1 + np=2) passes.
  • No regression in MPQC downstream (PNO/CSV CC paths build and run).

evaleev added 2 commits May 12, 2026 16:36
BTAS 7e64fbad adds a (Range, F) ctor on btas::Tensor that mirrors
TA::Tensor's range+lambda ctor. Needed for tile-type-agnostic per-index
inner-tile construction (e.g. MPQC's jacobi_update for btas-inner ToT
amplitudes).
Closes the remaining gaps so a TA::DistArray<TA::Tensor<btas::Tensor<T>>,
Policy> (btas-inner ToT) is usable end-to-end alongside the existing
TA-inner ToT, including through einsum's ToT * T and ToT * ToT paths.
The outer tile is always TA::Tensor; btas::Tensor is *inner-only* and
remains free of outer-tile operations (permute/reshape/batch/...).

Notable changes:

- external/btas.h:
  - nested_rank<btas::Tensor<...>> partial spec so einsum's
    MaxNestedArray correctly classifies btas-inner ToT.
  - 6-arg gemm(alpha, A, B, beta, C, helper) overload matching the
    TA::Tensor signature (the existing 5-arg form is accumulate-only).
  - size_of<MemorySpace>(btas::Tensor) in namespace btas so ADL finds
    it from TA::Tensor's recursive size_of when the inner tile is btas.

- tensor/operators.h + new operators_body.ipp:
  - Lift T+T/T-T/T*T/-T/T*N/N*T/T+=T/T-=T/T*=T/T*=N/Perm*T and the
    contiguous-tensor operator<< into a shared .ipp body, included
    once into namespace TiledArray and once into namespace btas, gated
    by disjoint per-namespace ta_ops_match_tensor_v predicates. Fixes
    ADL of these operators inside TA::Tensor's lambdas for btas inner.

- tensor/print.h + .ipp:
  - Split NDArrayPrinter::print's Index template param into
    ExtentIndex / StrideIndex so btas::Tensor (whose range exposes
    extent_data as unsigned-long* but stride_data as long-long*)
    drives the printer.

- tensor/tensor.h:
  - Fix broken SFINAE in Tensor::subt template
    (typename = std::enable_if<...> missing the _t).

- tensor/type_traits.h:
  - Rewrite result_tensor_helper to derive the result via
    TensorA::rebind_t<numeric_type> (the "rebind allocator on numeric"
    operation both TA::Tensor and btas::Tensor expose). Drops the
    requirement that the input expose allocator_type.

- tensor/kernels.h:
  - tensor_contract / tensor_hadamard: member-style .permute(),
    .mult(), .mult_to() -> free CPO calls so ADL dispatches via
    namespace btas for btas tiles and tile_op/tile_interface.h for TA.

- einsum/tiledarray.h:
  - DeNestedArray<Array> for ToT inputs now wraps the inner numeric
    in TA::Tensor rather than re-using the inner tile type as a new
    outer tile (preserves the "btas is inner-only" rule across
    DeNest).
  - sum_tot_2_tos lambda likewise produces TA::Tensor<numeric_type>,
    and replaces the tot(ix).sum() member with an unqualified sum()
    so ADL finds the right overload (free fn in TA / namespace btas).

- dist_array.h:
  - volume(): reduce_op uses arg->total_size() when the inner tile
    exposes it (TA::Tensor), else falls back to arg->size() (btas).

- tests/btas_zb_inner_tile.cpp (+ tests/CMakeLists.txt):
  - Sniff tests instantiating TA::Tensor<btas::Tensor<int,
    btas::zb::RangeNd<>, ...>> as the ToT inner tile, exercising
    subt / add / scale through the cross-namespace operator path.
@evaleev evaleev changed the title build: bump BTAS pin to pick up Tensor (Range, generator) ctor enable btas::Tensor as inner tile of TA::Tensor-of-Tensor + bump BTAS pin May 12, 2026
Two fixes needed by MPQC's CSV CCk validation tests (which exercise
TA::Tensor<btas::Tensor<T>> inner tiles end-to-end):

1. external/btas.h: mirror TA::Tensor's empty-/null-argument early-exits
   in the btas free-function suite (add/add_to/subt/subt_to/mult/mult_to,
   their factored and permuted variants, scale/scale_to/neg/neg_to). TA's
   ToT lambdas default-construct result inner tiles and accumulate into
   them, so consumers regularly hit the empty-result-then-add_to(arg)
   pattern; without these guards, TA's congruent-range assertion fires
   inside the underlying TensorInterface machinery.

2. tensor/tensor_interface.h: replace range_.includes(index_ordinal) with
   range_.includes_ordinal(index_ordinal) in TensorInterface's operator[]
   / at_ordinal. The integral includes(Ordinal) overload on TA::Range
   asserts rank!=1 to disambiguate from the includes(Index)
   coordinate-tuple form; TI's ordinal-lookup path was inadvertently
   tripping that for rank-1 inner tiles (e.g. PNO-basis 1-D energies).
evaleev added a commit to ValeevGroup/SeQuant that referenced this pull request May 12, 2026
…n array

The "regular" (non-nested) companion array used in ToT * T einsum
contractions had its outer tile set to the *inner* tile type of the
input ToT array. For TA-inner ToT that happens to work (TA::Tensor is a
valid outer tile too), but for btas-inner ToT it produced a
DistArray<btas::Tensor, ...> whose tile lacks the outer-tile API einsum
needs (permute/reshape/batch/...).

The outer tile must always be TA::Tensor; inner-tile types like
btas::Tensor are *inner-only*. Fix compatible_regular_distarray_type
to wrap the inner's numeric type in TA::Tensor.

Also bumps the TiledArray pin to the version that lifts the matching
TA-side restrictions (ValeevGroup/tiledarray#546).
@evaleev evaleev requested a review from Copilot May 12, 2026 23:50
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR bumps the pinned BTAS revision and extends TiledArray’s Tensor-of-Tensor (ToT) plumbing so btas::Tensor can be used as the inner tile type (with TA::Tensor remaining the outer tile), including through TA::einsum paths.

Changes:

  • Update BTAS integration to support btas::Tensor as ToT inner tiles (traits, ADL hooks, gemm, size_of, congruency helpers for btas::zb::RangeNd).
  • Refactor element-wise tensor operators into a shared include (operators_body.ipp) injected into both namespace TiledArray and namespace btas.
  • Add a serial “sniff test” covering construction and basic ToT ops with btas::zb::RangeNd inner tiles.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/CMakeLists.txt Adds the new btas inner-tile test to the test suite.
tests/btas_zb_inner_tile.cpp New compile/runtime sniff tests for btas zb::RangeNd inner tiles in ToT.
src/TiledArray/tensor/type_traits.h Adds operator gating predicate; changes result-tensor deduction to use rebind_t.
src/TiledArray/tensor/tensor.h Fixes SFINAE (enable_if_t) in Tensor::subt template.
src/TiledArray/tensor/tensor_interface.h Uses includes_ordinal() for ordinal bounds asserts; minor formatting.
src/TiledArray/tensor/print.h Splits printer index types into independently deducible extent/stride index types.
src/TiledArray/tensor/print.ipp Updates printer template definitions to match the new extent/stride index parameters.
src/TiledArray/tensor/operators.h Includes shared operator body; keeps TA-specific operators here.
src/TiledArray/tensor/operators_body.ipp New shared implementation of element-wise ops and contiguous operator<<.
src/TiledArray/tensor/kernels.h Switches to free-function CPO calls (permute, mult, mult_to) for broader tile support.
src/TiledArray/external/btas.h Adds zb-range congruency/Range conversion, btas free ops with empty-handling, gemm overload, nested-rank, ADL size_of, and btas-side operator injection.
src/TiledArray/einsum/tiledarray.h Adjusts denesting for ToT to always yield TA::Tensor<numeric>; uses ADL-friendly free ops in inner loops.
src/TiledArray/dist_array.h Makes volume() reduction prefer total_size() when available, else size().
external/versions.cmake Bumps the tracked BTAS git tag.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 493 to +515
template <typename Op, typename TensorA, typename TensorB,
typename Allocator = void,
typename = std::enable_if_t<is_nested_tensor_v<TensorA, TensorB>>>
struct result_tensor_helper {
private:
using TensorA_ = std::remove_reference_t<TensorA>;
using TensorB_ = std::remove_reference_t<TensorB>;
using value_type_A = typename TensorA_::value_type;
using value_type_B = typename TensorB_::value_type;
using allocator_type_A = typename TensorA_::allocator_type;
using allocator_type_B = typename TensorB_::allocator_type;

public:
using numeric_type = binop_result_t<Op, value_type_A, value_type_B>;
using allocator_type =
std::conditional_t<std::is_same_v<void, Allocator> &&
std::is_same_v<allocator_type_A, allocator_type_B>,
allocator_type_A, Allocator>;

// Result tensor type stays in TensorA's family with the allocator rebound to
// hold `numeric_type`. Both TA::Tensor and btas::Tensor expose this as
// `rebind_t<U>` (TA::Tensor via std::allocator_traits::rebind_alloc; btas
// via storage_traits::rebind_t). An explicit @tparam Allocator override only
// applies when TensorA is a TA::Tensor.
using result_type =
std::conditional_t<std::is_same_v<void, allocator_type>,
TA::Tensor<numeric_type>,
TA::Tensor<numeric_type, allocator_type>>;
std::conditional_t<std::is_same_v<void, Allocator> ||
!is_ta_tensor_v<TensorA_>,
typename TensorA_::template rebind_t<numeric_type>,
TA::Tensor<numeric_type, Allocator>>;
Comment on lines +120 to +125
/// Predicate used by the shared operator body in
/// @c TiledArray/tensor/operators_body.inl to gate the element-wise tensor
/// operators that are injected into @c namespace TiledArray . The btas-side
/// copy of the same operators (in @c external/btas.h) partial-specializes
/// this predicate to @c std::false_type for @c btas::Tensor so the two
/// namespaces' operators stay non-overlapping under ADL.
Comment on lines +93 to 105
//
// @c ExtentIndex and @c StrideIndex are independently deducible so callers
// can pass arrays of different integer types — needed for @c btas::Tensor ,
// whose range exposes @c extent_data() as @c unsigned-long* but
// @c stride_data() as @c long-long* .
template <typename T, typename Char = char,
typename Index = Range1::index1_type,
typename ExtentIndex = Range1::index1_type,
typename StrideIndex = ExtentIndex,
typename CharTraits = std::char_traits<Char>>
void print(const T* data, const std::size_t order, const Index* extents,
const Index* strides, std::basic_ostream<Char, CharTraits>& os,
void print(const T* data, const std::size_t order, const ExtentIndex* extents,
const StrideIndex* strides,
std::basic_ostream<Char, CharTraits>& os,
std::size_t extra_indentation = 0);
Comment on lines +32 to +34
// Sanity-check the size claim — fail loudly here if range layout drifts.
static_assert(sizeof(btas::zb::RangeNd<>) == 14,
"zb::RangeNd default layout must remain 14 bytes");
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants