Skip to content

Fix swiglu_decode intermediate comparison to use chained reference#105

Merged
andrej merged 2 commits intoamd:develfrom
albiol2004:fix-swiglu-decode-rectangular
Apr 17, 2026
Merged

Fix swiglu_decode intermediate comparison to use chained reference#105
andrej merged 2 commits intoamd:develfrom
albiol2004:fix-swiglu-decode-rectangular

Conversation

@albiol2004
Copy link
Copy Markdown
Contributor

Aligns swiglu_decode/test.py with swiglu_prefill/test.py by verifying the intermediate buffer against a chained reference built from the observed AIE left_swished and right buffers, instead of against the CPU-computed golden_ref["intermediate"].

The golden-reference path amplifies legitimate, sub-tolerance bf16 drift from upstream stages (e.g. SiLU of very-negative inputs where the AIE LUT rounds to 0.0 while fp32 CPU silu preserves a tiny negative value) through the multiplication against a large-magnitude right operand, producing spurious "got 0.0, expected -1.27"-style failures. The AIE kernels themselves are numerically correct, the observed intermediate matches observed_left_swished * observed_right exactly, and the final output already passes at a tighter tolerance than the intermediate stage.

This issue surfaces at rectangular FFN shapes (e.g. embedding_dim=1024, hidden_dim=3584) where the statistics of the SiLU input distribution make near-zero LUT outputs more common than at the previously-tested square 2048² shape. Adds (1024, 3584) to the parametrization so regressions in rectangular decode are caught.

Added

  • (1024, 3584) parametrization in iron/operators/swiglu_decode/test.py, reflecting Qwen3.5-0.8B FFN dims so rectangular decode is covered alongside the existing square smoke test.

Changed

  • iron/operators/swiglu_decode/test.py: verify intermediate against a chained reference (observed_left_swished * observed_right) rather than golden_ref["intermediate"], matching the approach already in swiglu_prefill/test.py. Tightens the tolerance to rel_tol=0.04, abs_tol=0.4 accordingly (same values used for the output check and for prefill).

Removed

  • None.

Testing

Verified on NPU2 (Strix, aie2p):

  • pytest iron/operators/swiglu_decode/test.py -v --iterations 1 : both 2048×2048 and 1024×3584 pass.
  • pytest iron/operators/ -m "not extensive" --iterations 1 : no regressions.

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR has been reviewed and approved.
  3. All checks are passing.

The decode test verified `intermediate` against golden_ref["intermediate"]
(CPU-computed silu(golden_left) * golden_right), while the prefill test
uses a chained reference built from the observed AIE left_swished and
right buffers. That inconsistency surfaces as spurious failures at
rectangular FFN shapes (e.g. embedding=1024, hidden=3584): the AIE SiLU
LUT rounds near-zero outputs to exactly 0.0 where fp32 CPU silu keeps a
tiny negative value, and the subsequent multiply against a large-magnitude
right operand amplifies that sub-tolerance drift into "got 0.0, expected
-1.27"-style mismatches.

The AIE kernels are numerically correct, the observed intermediate
matches observed_left_swished * observed_right exactly, and the final
output already passes at a tighter tolerance. Only the verification
methodology was wrong.

Switch to the prefill-style chained reference and tighten tolerance to
(rel=0.04, abs=0.4), same as the output check and prefill intermediate.
Add (1024, 3584) to the parametrization so rectangular decode is
covered in CI.
@andrej
Copy link
Copy Markdown
Collaborator

andrej commented Apr 17, 2026

Thanks, this is a useful improvement. I just kicked off the CI, if this passes I'll merge it

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 17, 2026

CI Test Results

d22ca66 (2026_04_17_16_04_27)

IRONCLAD - CI Summary

Examples

Test Krackan Phoenix
llama_3.2_1b_prompt_1024_tokens_1 pass -
llama_3.2_1b_prompt_1024_tokens_40 pass -
llama_3.2_1b_prompt_13_tokens_1 pass -
llama_3.2_1b_prompt_13_tokens_40 pass -

Small

Test Krackan Phoenix
M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128 pass pass
M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1 pass -
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1 pass pass
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1 pass pass
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1 pass pass
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1 pass pass
M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1 pass -
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048 pass pass
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024 pass pass
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512 pass pass
M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256 pass -
M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8 pass pass
M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8 pass pass
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1 pass pass
M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4 pass pass
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024 pass pass
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024 pass pass
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024 pass pass
M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024 pass -
M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1 pass -
embedding_dim_1024-hidden_dim_3584 pass pass
embedding_dim_2048-hidden_dim_2048 pass pass
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048 pass pass
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32 pass pass
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False pass pass
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True pass pass
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024 pass pass
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32 pass pass
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False pass pass
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True pass pass
input_length_2048-num_aie_columns_1-tile_size_2048 pass pass
input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0 pass pass
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024 pass pass
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32 pass pass
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False pass pass
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True pass pass
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512 pass pass
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32 pass pass
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False pass pass
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True pass pass
input_length_2048-num_aie_columns_2-tile_size_1024 pass pass
input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0 pass pass
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512 pass pass
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32 pass pass
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False pass pass
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True pass pass
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256 pass pass
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32 pass pass
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False pass pass
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True pass -
input_length_2048-num_aie_columns_4-tile_size_512 pass pass
input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0 pass pass
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256 pass -
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32 pass -
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False pass -
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True pass -
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128 pass -
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32 pass -
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False pass -
input_length_2048-num_aie_columns_8-tile_size_256 pass -
input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0 pass -
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048 pass pass
input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128 pass -
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024 pass pass
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024 pass pass
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512 pass pass
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512 pass pass
input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256 pass -
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256 pass pass
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024 pass pass
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048 pass pass
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512 pass pass
rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0 pass pass
rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0 pass pass
rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0 pass pass
rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0 pass -
rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0 pass pass
rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0 pass pass
rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0 pass pass
rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0 pass -
seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0 pass -
seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False pass pass

Extensive

Test Krackan Phoenix
(no data) - -
Krackan - Small

IRONCLAD

Tested on 2026_04_17_16_04_27 at commit d22ca66.

Test Checks Latency (mean)Bandwidth (mean)Throughput (mean)
M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128 ✅ 5/5 n/a 0.21 0.21
M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1 ✅ 5/5 2455.90 3.88 1525.06
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1 ✅ 5/5 211.18 1.05 44.91
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1 ✅ 5/5 227.00 0.99 42.11
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1 ✅ 5/5 49274.66 0.51 348.66
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1 ✅ 5/5 28661.62 0.88 599.41
M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1 ✅ 5/5 7506.72 3.35 2290.20
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048 ✅ 5/5 n/a 12.66 12.65
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024 ✅ 5/5 n/a 24.13 24.11
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512 ✅ 5/5 n/a 35.47 35.45
M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256 ✅ 5/5 n/a 42.49 42.46
M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8 ✅ 5/5 199.30 2.66 n/a
M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8 ✅ 5/5 190.46 2.80 n/a
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1 ✅ 5/5 2452.18 3.34 876.43
M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4 ✅ 5/5 3554.90 0.37 19.73
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 13.15 13.14
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 23.39 23.38
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 39.47 39.44
M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 42.37 42.34
M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1 ✅ 5/5 1856.86 3.61 1115.00
embedding_dim_1024-hidden_dim_3584 ✅ 5/5 4407.68 0.00 n/a
embedding_dim_2048-hidden_dim_2048 ✅ 5/5 4391.40 0.00 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048 ✅ 30/30 173.52 0.05 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32 ✅ 5/5 166.74 0.03 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False ✅ 5/5 168.86 0.05 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True ✅ 5/5 203.52 0.06 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024 ✅ 25/25 180.87 0.05 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32 ✅ 5/5 175.64 0.03 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False ✅ 5/5 165.66 0.05 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True ✅ 5/5 200.48 0.05 n/a
input_length_2048-num_aie_columns_1-tile_size_2048 ✅ 10/10 166.06 0.08 n/a
input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0 ✅ 5/5 194.58 0.06 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024 ✅ 30/30 170.71 0.05 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32 ✅ 5/5 160.74 0.03 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False ✅ 5/5 172.38 0.05 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True ✅ 5/5 192.96 0.05 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512 ✅ 25/25 171.38 0.05 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32 ✅ 5/5 158.80 0.03 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False ✅ 5/5 162.22 0.05 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True ✅ 5/5 201.58 0.05 n/a
input_length_2048-num_aie_columns_2-tile_size_1024 ✅ 10/10 176.52 0.07 n/a
input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0 ✅ 5/5 167.40 0.08 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512 ✅ 30/30 186.47 0.05 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32 ✅ 5/5 176.44 0.03 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False ✅ 5/5 175.10 0.05 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True ✅ 5/5 198.22 0.05 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256 ✅ 25/25 203.03 0.04 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32 ✅ 5/5 189.20 0.03 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False ✅ 5/5 192.74 0.04 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True ✅ 5/5 212.62 0.04 n/a
input_length_2048-num_aie_columns_4-tile_size_512 ✅ 10/10 179.85 0.07 n/a
input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0 ✅ 5/5 202.16 0.06 n/a
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256 ✅ 30/30 208.66 0.04 n/a
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32 ✅ 5/5 181.14 0.03 n/a
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False ✅ 5/5 190.82 0.05 n/a
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True ✅ 5/5 210.50 0.04 n/a
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128 ✅ 25/25 231.43 0.04 n/a
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32 ✅ 5/5 212.56 0.02 n/a
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False ✅ 5/5 203.14 0.04 n/a
input_length_2048-num_aie_columns_8-tile_size_256 ✅ 10/10 260.15 0.05 n/a
input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0 ✅ 5/5 216.48 0.06 n/a
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048 ✅ 5/5 150.04 0.06 n/a
input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128 ✅ 5/5 257.56 0.03 n/a
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024 ✅ 5/5 142.12 0.06 n/a
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024 ✅ 5/5 178.78 0.05 n/a
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512 ✅ 5/5 145.92 0.06 n/a
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512 ✅ 5/5 180.34 0.05 n/a
input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256 ✅ 5/5 169.30 0.05 n/a
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256 ✅ 5/5 179.12 0.05 n/a
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024 ✅ 5/5 201.28 0.66 n/a
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048 ✅ 5/5 160.84 0.83 n/a
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512 ✅ 5/5 212.50 0.62 n/a
rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0 ✅ 5/5 183.84 0.54 n/a
rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0 ✅ 5/5 185.92 0.57 n/a
rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0 ✅ 5/5 166.76 0.61 n/a
rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0 ✅ 5/5 216.04 0.46 n/a
rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0 ✅ 5/5 194.64 0.39 n/a
rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0 ✅ 5/5 184.50 0.41 n/a
rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0 ✅ 5/5 228.78 0.36 n/a
rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0 ✅ 5/5 203.24 0.37 n/a
seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0 ✅ 5/5 40672.00 0.21 n/a
seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False ✅ 5/5 10049.04 0.22 n/a

Trends:

IRONCLAD Trends

M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:58:560.24 (n/a)0.21 (n/a)0.21 (n/a)0.19 (n/a)0.02 (n/a)0.23 (n/a)0.21 (n/a)0.20 (n/a)0.19 (n/a)0.02 (n/a)

M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:58:564.39 (n/a)3.88 (n/a)3.94 (n/a)3.37 (n/a)0.48 (n/a)2792.50 (n/a)2455.90 (n/a)2388.00 (n/a)2144.60 (n/a)306.97 (n/a)1724.96 (n/a)1525.06 (n/a)1549.12 (n/a)1324.76 (n/a)187.80 (n/a)

M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:58:561.20 (n/a)1.05 (n/a)1.04 (n/a)0.98 (n/a)0.09 (n/a)225.20 (n/a)211.18 (n/a)212.30 (n/a)184.50 (n/a)15.99 (n/a)51.16 (n/a)44.91 (n/a)44.45 (n/a)41.91 (n/a)3.67 (n/a)

M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:58:561.15 (n/a)0.99 (n/a)0.96 (n/a)0.83 (n/a)0.12 (n/a)265.70 (n/a)227.00 (n/a)230.30 (n/a)192.60 (n/a)28.54 (n/a)49.00 (n/a)42.11 (n/a)40.97 (n/a)35.51 (n/a)5.30 (n/a)

M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:58:560.51 (n/a)0.51 (n/a)0.51 (n/a)0.51 (n/a)0.00 (n/a)49346.40 (n/a)49274.66 (n/a)49262.80 (n/a)49243.40 (n/a)41.70 (n/a)348.88 (n/a)348.66 (n/a)348.74 (n/a)348.15 (n/a)0.29 (n/a)

M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:58:560.88 (n/a)0.88 (n/a)0.88 (n/a)0.87 (n/a)0.00 (n/a)28830.60 (n/a)28661.62 (n/a)28647.90 (n/a)28536.70 (n/a)106.70 (n/a)602.03 (n/a)599.41 (n/a)599.69 (n/a)595.89 (n/a)2.23 (n/a)

M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:58:563.47 (n/a)3.35 (n/a)3.30 (n/a)3.26 (n/a)0.10 (n/a)7723.70 (n/a)7506.72 (n/a)7627.60 (n/a)7247.40 (n/a)221.08 (n/a)2370.49 (n/a)2290.20 (n/a)2252.32 (n/a)2224.31 (n/a)68.06 (n/a)

M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:58:5613.12 (n/a)12.66 (n/a)13.09 (n/a)11.07 (n/a)0.89 (n/a)13.12 (n/a)12.65 (n/a)13.09 (n/a)11.06 (n/a)0.89 (n/a)

M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:58:5624.33 (n/a)24.13 (n/a)24.14 (n/a)23.93 (n/a)0.14 (n/a)24.31 (n/a)24.11 (n/a)24.13 (n/a)23.91 (n/a)0.14 (n/a)

M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:58:5640.52 (n/a)35.47 (n/a)38.44 (n/a)21.20 (n/a)8.03 (n/a)40.50 (n/a)35.45 (n/a)38.42 (n/a)21.19 (n/a)8.02 (n/a)

M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:58:5645.18 (n/a)42.49 (n/a)42.56 (n/a)38.61 (n/a)2.49 (n/a)45.15 (n/a)42.46 (n/a)42.54 (n/a)38.59 (n/a)2.49 (n/a)

M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:563.16 (n/a)2.66 (n/a)2.62 (n/a)2.32 (n/a)0.31 (n/a)226.10 (n/a)199.30 (n/a)200.10 (n/a)165.80 (n/a)21.92 (n/a)

M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:563.48 (n/a)2.80 (n/a)2.62 (n/a)2.45 (n/a)0.41 (n/a)213.70 (n/a)190.46 (n/a)199.80 (n/a)150.80 (n/a)25.00 (n/a)

M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:58:564.20 (n/a)3.34 (n/a)3.27 (n/a)2.93 (n/a)0.51 (n/a)2748.30 (n/a)2452.18 (n/a)2463.40 (n/a)1917.20 (n/a)330.23 (n/a)1102.62 (n/a)876.43 (n/a)858.14 (n/a)769.19 (n/a)134.00 (n/a)

M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:58:560.53 (n/a)0.37 (n/a)0.34 (n/a)0.29 (n/a)0.10 (n/a)4331.70 (n/a)3554.90 (n/a)3707.10 (n/a)2335.10 (n/a)737.75 (n/a)28.74 (n/a)19.73 (n/a)18.10 (n/a)15.49 (n/a)5.18 (n/a)

M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:58:5613.59 (n/a)13.15 (n/a)13.05 (n/a)12.92 (n/a)0.26 (n/a)13.58 (n/a)13.14 (n/a)13.04 (n/a)12.92 (n/a)0.26 (n/a)

M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:58:5624.45 (n/a)23.39 (n/a)23.71 (n/a)20.86 (n/a)1.47 (n/a)24.43 (n/a)23.38 (n/a)23.70 (n/a)20.85 (n/a)1.47 (n/a)

M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:58:5640.37 (n/a)39.47 (n/a)39.86 (n/a)37.42 (n/a)1.16 (n/a)40.34 (n/a)39.44 (n/a)39.83 (n/a)37.40 (n/a)1.16 (n/a)

M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:58:5644.79 (n/a)42.37 (n/a)42.38 (n/a)39.30 (n/a)2.03 (n/a)44.76 (n/a)42.34 (n/a)42.36 (n/a)39.28 (n/a)2.03 (n/a)

M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:58:564.01 (n/a)3.61 (n/a)3.70 (n/a)3.12 (n/a)0.34 (n/a)2132.20 (n/a)1856.86 (n/a)1799.90 (n/a)1658.20 (n/a)181.77 (n/a)1239.41 (n/a)1115.00 (n/a)1141.83 (n/a)963.91 (n/a)104.79 (n/a)

embedding_dim_1024-hidden_dim_3584

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)4893.06 (n/a)4407.68 (n/a)4621.98 (n/a)3524.15 (n/a)538.46 (n/a)

embedding_dim_2048-hidden_dim_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)4745.04 (n/a)4391.40 (n/a)4374.31 (n/a)4148.42 (n/a)236.29 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)227.30 (n/a)173.52 (n/a)175.85 (n/a)110.50 (n/a)31.96 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)204.20 (n/a)166.74 (n/a)172.00 (n/a)128.10 (n/a)29.18 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)209.60 (n/a)168.86 (n/a)171.90 (n/a)124.60 (n/a)30.54 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.08 (n/a)0.06 (n/a)0.06 (n/a)0.05 (n/a)0.01 (n/a)247.00 (n/a)203.52 (n/a)193.00 (n/a)149.10 (n/a)40.63 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.07 (n/a)0.05 (n/a)0.05 (n/a)0.02 (n/a)0.01 (n/a)381.80 (n/a)180.87 (n/a)167.90 (n/a)113.30 (n/a)50.66 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.03 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.00 (n/a)196.10 (n/a)175.64 (n/a)170.70 (n/a)167.30 (n/a)11.98 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.06 (n/a)0.05 (n/a)0.05 (n/a)0.05 (n/a)0.00 (n/a)179.30 (n/a)165.66 (n/a)167.20 (n/a)146.80 (n/a)12.91 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)277.90 (n/a)200.48 (n/a)189.60 (n/a)169.20 (n/a)44.96 (n/a)

input_length_2048-num_aie_columns_1-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.09 (n/a)0.08 (n/a)0.08 (n/a)0.06 (n/a)0.01 (n/a)210.60 (n/a)166.06 (n/a)158.30 (n/a)132.70 (n/a)25.17 (n/a)

input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.07 (n/a)0.06 (n/a)0.06 (n/a)0.05 (n/a)0.01 (n/a)242.80 (n/a)194.58 (n/a)195.10 (n/a)168.00 (n/a)30.00 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)223.20 (n/a)170.71 (n/a)169.10 (n/a)119.30 (n/a)30.43 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.03 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.00 (n/a)182.70 (n/a)160.74 (n/a)153.70 (n/a)151.00 (n/a)13.35 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)225.40 (n/a)172.38 (n/a)174.30 (n/a)139.10 (n/a)35.69 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.07 (n/a)0.05 (n/a)0.05 (n/a)0.05 (n/a)0.01 (n/a)219.80 (n/a)192.96 (n/a)200.10 (n/a)149.10 (n/a)27.60 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.06 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)238.40 (n/a)171.38 (n/a)167.60 (n/a)128.40 (n/a)31.64 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.00 (n/a)189.10 (n/a)158.80 (n/a)155.40 (n/a)142.90 (n/a)18.51 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)183.00 (n/a)162.22 (n/a)171.30 (n/a)137.30 (n/a)20.23 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.05 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)243.60 (n/a)201.58 (n/a)188.00 (n/a)175.60 (n/a)27.78 (n/a)

input_length_2048-num_aie_columns_2-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.10 (n/a)0.07 (n/a)0.07 (n/a)0.05 (n/a)0.01 (n/a)224.00 (n/a)176.52 (n/a)175.35 (n/a)126.40 (n/a)30.63 (n/a)

input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.10 (n/a)0.08 (n/a)0.07 (n/a)0.06 (n/a)0.01 (n/a)190.10 (n/a)167.40 (n/a)177.20 (n/a)124.50 (n/a)26.47 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.06 (n/a)0.05 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)386.90 (n/a)186.47 (n/a)183.35 (n/a)127.60 (n/a)53.02 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)221.20 (n/a)176.44 (n/a)183.10 (n/a)117.40 (n/a)37.53 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.07 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.02 (n/a)244.40 (n/a)175.10 (n/a)159.50 (n/a)119.30 (n/a)56.44 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)245.00 (n/a)198.22 (n/a)201.80 (n/a)157.70 (n/a)35.58 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.06 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)346.20 (n/a)203.03 (n/a)187.90 (n/a)137.00 (n/a)47.81 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)222.00 (n/a)189.20 (n/a)200.60 (n/a)126.00 (n/a)37.78 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.05 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.00 (n/a)215.40 (n/a)192.74 (n/a)193.10 (n/a)164.20 (n/a)19.69 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.05 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.00 (n/a)231.20 (n/a)212.62 (n/a)214.20 (n/a)191.10 (n/a)17.35 (n/a)

input_length_2048-num_aie_columns_4-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.10 (n/a)0.07 (n/a)0.07 (n/a)0.05 (n/a)0.02 (n/a)238.20 (n/a)179.85 (n/a)187.55 (n/a)120.90 (n/a)36.34 (n/a)

input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.08 (n/a)0.06 (n/a)0.06 (n/a)0.04 (n/a)0.01 (n/a)285.70 (n/a)202.16 (n/a)192.20 (n/a)158.10 (n/a)48.84 (n/a)

input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.06 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)336.80 (n/a)208.66 (n/a)207.25 (n/a)133.30 (n/a)45.55 (n/a)

input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)275.60 (n/a)181.14 (n/a)162.70 (n/a)121.60 (n/a)57.56 (n/a)

input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.07 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)273.60 (n/a)190.82 (n/a)171.20 (n/a)124.80 (n/a)56.67 (n/a)

input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.04 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.00 (n/a)228.70 (n/a)210.50 (n/a)204.30 (n/a)196.60 (n/a)12.94 (n/a)

input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.05 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)345.80 (n/a)231.43 (n/a)220.80 (n/a)155.50 (n/a)50.14 (n/a)

input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)224.30 (n/a)212.56 (n/a)220.60 (n/a)182.40 (n/a)17.46 (n/a)

input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.06 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)241.10 (n/a)203.14 (n/a)220.80 (n/a)126.20 (n/a)45.47 (n/a)

input_length_2048-num_aie_columns_8-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.09 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.02 (n/a)362.20 (n/a)260.15 (n/a)255.50 (n/a)136.00 (n/a)86.25 (n/a)

input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.08 (n/a)0.06 (n/a)0.06 (n/a)0.04 (n/a)0.02 (n/a)327.10 (n/a)216.48 (n/a)196.20 (n/a)158.80 (n/a)67.80 (n/a)

input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.07 (n/a)0.06 (n/a)0.06 (n/a)0.04 (n/a)0.01 (n/a)186.40 (n/a)150.04 (n/a)148.80 (n/a)112.70 (n/a)31.56 (n/a)

input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.05 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)329.10 (n/a)257.56 (n/a)253.10 (n/a)169.60 (n/a)64.20 (n/a)

input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.07 (n/a)0.06 (n/a)0.06 (n/a)0.05 (n/a)0.01 (n/a)171.20 (n/a)142.12 (n/a)136.80 (n/a)114.80 (n/a)22.22 (n/a)

input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.08 (n/a)0.05 (n/a)0.04 (n/a)0.03 (n/a)0.02 (n/a)235.20 (n/a)178.78 (n/a)203.50 (n/a)99.10 (n/a)55.25 (n/a)

input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.07 (n/a)0.06 (n/a)0.06 (n/a)0.04 (n/a)0.01 (n/a)190.20 (n/a)145.92 (n/a)138.90 (n/a)115.70 (n/a)32.59 (n/a)

input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.05 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)216.10 (n/a)180.34 (n/a)170.20 (n/a)159.40 (n/a)24.34 (n/a)

input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.02 (n/a)228.10 (n/a)169.30 (n/a)165.80 (n/a)113.60 (n/a)48.79 (n/a)

input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.06 (n/a)0.05 (n/a)0.04 (n/a)0.04 (n/a)0.01 (n/a)215.00 (n/a)179.12 (n/a)182.20 (n/a)138.70 (n/a)32.86 (n/a)

input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.79 (n/a)0.66 (n/a)0.66 (n/a)0.56 (n/a)0.10 (n/a)235.80 (n/a)201.28 (n/a)197.60 (n/a)166.40 (n/a)30.25 (n/a)

input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:561.05 (n/a)0.83 (n/a)0.79 (n/a)0.71 (n/a)0.14 (n/a)185.50 (n/a)160.84 (n/a)164.90 (n/a)125.30 (n/a)25.53 (n/a)

input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.72 (n/a)0.62 (n/a)0.61 (n/a)0.58 (n/a)0.06 (n/a)226.00 (n/a)212.50 (n/a)215.10 (n/a)182.10 (n/a)17.77 (n/a)

rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.63 (n/a)0.54 (n/a)0.53 (n/a)0.44 (n/a)0.07 (n/a)223.70 (n/a)183.84 (n/a)186.30 (n/a)156.80 (n/a)26.47 (n/a)

rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.65 (n/a)0.57 (n/a)0.62 (n/a)0.32 (n/a)0.14 (n/a)307.80 (n/a)185.92 (n/a)159.00 (n/a)151.70 (n/a)68.23 (n/a)

rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.76 (n/a)0.61 (n/a)0.61 (n/a)0.42 (n/a)0.13 (n/a)232.70 (n/a)166.76 (n/a)160.90 (n/a)129.40 (n/a)40.02 (n/a)

rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.50 (n/a)0.46 (n/a)0.44 (n/a)0.42 (n/a)0.04 (n/a)232.40 (n/a)216.04 (n/a)225.30 (n/a)196.20 (n/a)17.31 (n/a)

rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.48 (n/a)0.39 (n/a)0.40 (n/a)0.27 (n/a)0.07 (n/a)268.90 (n/a)194.64 (n/a)182.50 (n/a)154.30 (n/a)43.54 (n/a)

rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.50 (n/a)0.41 (n/a)0.40 (n/a)0.34 (n/a)0.06 (n/a)215.00 (n/a)184.50 (n/a)184.50 (n/a)146.90 (n/a)24.69 (n/a)

rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.55 (n/a)0.36 (n/a)0.35 (n/a)0.21 (n/a)0.13 (n/a)359.30 (n/a)228.78 (n/a)211.10 (n/a)134.70 (n/a)84.15 (n/a)

rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.51 (n/a)0.37 (n/a)0.36 (n/a)0.31 (n/a)0.08 (n/a)234.20 (n/a)203.24 (n/a)203.40 (n/a)145.70 (n/a)35.54 (n/a)

seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.21 (n/a)0.21 (n/a)0.21 (n/a)0.21 (n/a)0.00 (n/a)40769.90 (n/a)40672.00 (n/a)40703.20 (n/a)40568.30 (n/a)88.33 (n/a)

seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:58:560.28 (n/a)0.22 (n/a)0.24 (n/a)0.15 (n/a)0.05 (n/a)13804.32 (n/a)10049.04 (n/a)8894.02 (n/a)7540.09 (n/a)2471.57 (n/a)
Krackan - Examples

IRONCLAD

Tested on 2026_04_17_15_55_25 at commit d22ca66.

Test Checks TTFT (mean)TPS (mean)
llama_3.2_1b_prompt_1024_tokens_1 ✅ 5/5 2.14 n/a
llama_3.2_1b_prompt_1024_tokens_40 ✅ 5/5 2.17 4.19
llama_3.2_1b_prompt_13_tokens_1 ✅ 5/5 2.10 n/a
llama_3.2_1b_prompt_13_tokens_40 ✅ 5/5 2.10 4.16

Trends:

IRONCLAD Trends

llama_3.2_1b_prompt_1024_tokens_1

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
d22ca66 — 2026-04-17 15:49:392.15 (n/a)2.14 (n/a)2.14 (n/a)2.12 (n/a)0.01 (n/a)

llama_3.2_1b_prompt_1024_tokens_40

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
d22ca66 — 2026-04-17 15:49:394.21 (n/a)4.19 (n/a)4.18 (n/a)4.18 (n/a)0.01 (n/a)2.31 (n/a)2.17 (n/a)2.14 (n/a)2.11 (n/a)0.08 (n/a)

llama_3.2_1b_prompt_13_tokens_1

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
d22ca66 — 2026-04-17 15:49:392.11 (n/a)2.10 (n/a)2.09 (n/a)2.08 (n/a)0.01 (n/a)

llama_3.2_1b_prompt_13_tokens_40

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
d22ca66 — 2026-04-17 15:49:394.18 (n/a)4.16 (n/a)4.16 (n/a)4.15 (n/a)0.01 (n/a)2.12 (n/a)2.10 (n/a)2.09 (n/a)2.08 (n/a)0.02 (n/a)
Phoenix - Small

IRONCLAD

Tested on 2026_04_17_15_52_31 at commit d22ca66.

Test Checks Latency (mean)Bandwidth (mean)Throughput (mean)
M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128 ✅ 5/5 n/a 0.09 0.09
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1 ✅ 5/5 534.08 0.44 18.57
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1 ✅ 5/5 463.04 0.51 21.61
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1 ✅ 5/5 82064.18 0.31 209.40
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1 ✅ 5/5 24103.96 1.04 712.91
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048 ✅ 5/5 n/a 3.59 3.59
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024 ✅ 5/5 n/a 6.90 6.90
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512 ✅ 5/5 n/a 10.03 10.03
M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8 ✅ 5/5 493.80 1.27 n/a
M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8 ✅ 5/5 619.44 0.87 n/a
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1 ✅ 5/5 4148.80 2.14 560.09
M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4 ✅ 5/5 6066.28 0.21 11.23
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 3.67 3.67
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 6.94 6.93
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 10.35 10.34
embedding_dim_1024-hidden_dim_3584 ✅ 5/5 15219.08 0.00 n/a
embedding_dim_2048-hidden_dim_2048 ✅ 5/5 12921.50 0.00 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048 ✅ 30/30 374.20 0.02 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32 ✅ 5/5 278.52 0.02 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False ✅ 5/5 371.10 0.03 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True ✅ 5/5 404.56 0.03 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024 ✅ 25/25 550.24 0.02 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32 ✅ 5/5 346.78 0.02 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False ✅ 5/5 300.20 0.03 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True ✅ 5/5 397.50 0.03 n/a
input_length_2048-num_aie_columns_1-tile_size_2048 ✅ 10/10 431.99 0.03 n/a
input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0 ✅ 5/5 367.30 0.04 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024 ✅ 30/30 408.83 0.02 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32 ✅ 5/5 347.80 0.02 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False ✅ 5/5 397.54 0.02 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True ✅ 5/5 354.14 0.03 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512 ✅ 25/25 511.99 0.02 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32 ✅ 5/5 379.90 0.02 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False ✅ 5/5 375.66 0.02 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True ✅ 5/5 361.24 0.03 n/a
input_length_2048-num_aie_columns_2-tile_size_1024 ✅ 10/10 404.81 0.04 n/a
input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0 ✅ 5/5 408.02 0.04 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512 ✅ 30/30 573.29 0.02 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32 ✅ 5/5 442.50 0.01 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False ✅ 5/5 448.08 0.02 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True ✅ 5/5 404.62 0.03 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256 ✅ 25/25 426.91 0.02 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32 ✅ 5/5 435.74 0.02 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False ✅ 5/5 508.12 0.02 n/a
input_length_2048-num_aie_columns_4-tile_size_512 ✅ 10/10 553.49 0.03 n/a
input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0 ✅ 5/5 1163.34 0.01 n/a
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048 ✅ 5/5 362.56 0.02 n/a
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024 ✅ 5/5 400.76 0.03 n/a
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024 ✅ 5/5 307.86 0.03 n/a
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512 ✅ 5/5 511.04 0.02 n/a
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512 ✅ 5/5 470.98 0.02 n/a
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256 ✅ 5/5 397.98 0.02 n/a
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024 ✅ 5/5 1210.36 0.14 n/a
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048 ✅ 5/5 526.58 0.27 n/a
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512 ✅ 5/5 843.96 0.24 n/a
rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0 ✅ 5/5 415.24 0.25 n/a
rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0 ✅ 5/5 1332.80 0.14 n/a
rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0 ✅ 5/5 436.34 0.26 n/a
rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0 ✅ 5/5 513.56 0.16 n/a
rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0 ✅ 5/5 418.60 0.18 n/a
rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0 ✅ 5/5 431.84 0.18 n/a
seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False ✅ 5/5 23691.76 0.09 n/a

Trends:

IRONCLAD Trends

M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:49:320.11 (n/a)0.09 (n/a)0.09 (n/a)0.06 (n/a)0.02 (n/a)0.10 (n/a)0.09 (n/a)0.09 (n/a)0.06 (n/a)0.02 (n/a)

M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:49:320.57 (n/a)0.44 (n/a)0.48 (n/a)0.32 (n/a)0.11 (n/a)685.10 (n/a)534.08 (n/a)465.00 (n/a)391.30 (n/a)134.43 (n/a)24.12 (n/a)18.57 (n/a)20.29 (n/a)13.77 (n/a)4.50 (n/a)

M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:49:320.62 (n/a)0.51 (n/a)0.49 (n/a)0.32 (n/a)0.12 (n/a)695.90 (n/a)463.04 (n/a)448.30 (n/a)357.10 (n/a)138.33 (n/a)26.43 (n/a)21.61 (n/a)21.05 (n/a)13.56 (n/a)5.27 (n/a)

M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:49:320.31 (n/a)0.31 (n/a)0.30 (n/a)0.30 (n/a)0.01 (n/a)83995.90 (n/a)82064.18 (n/a)82544.90 (n/a)80567.10 (n/a)1479.54 (n/a)213.24 (n/a)209.40 (n/a)208.13 (n/a)204.53 (n/a)3.77 (n/a)

M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:49:321.07 (n/a)1.04 (n/a)1.05 (n/a)1.02 (n/a)0.02 (n/a)24664.70 (n/a)24103.96 (n/a)23992.50 (n/a)23607.40 (n/a)410.81 (n/a)727.73 (n/a)712.91 (n/a)716.05 (n/a)696.54 (n/a)12.11 (n/a)

M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:49:323.77 (n/a)3.59 (n/a)3.64 (n/a)3.42 (n/a)0.15 (n/a)3.76 (n/a)3.59 (n/a)3.64 (n/a)3.41 (n/a)0.15 (n/a)

M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:49:327.65 (n/a)6.90 (n/a)7.43 (n/a)5.33 (n/a)0.98 (n/a)7.65 (n/a)6.90 (n/a)7.43 (n/a)5.33 (n/a)0.98 (n/a)

M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:49:3214.08 (n/a)10.03 (n/a)8.31 (n/a)7.25 (n/a)3.35 (n/a)14.07 (n/a)10.03 (n/a)8.30 (n/a)7.25 (n/a)3.35 (n/a)

M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:321.99 (n/a)1.27 (n/a)1.25 (n/a)0.62 (n/a)0.56 (n/a)845.60 (n/a)493.80 (n/a)417.80 (n/a)263.60 (n/a)238.30 (n/a)

M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:321.02 (n/a)0.87 (n/a)0.92 (n/a)0.63 (n/a)0.15 (n/a)831.10 (n/a)619.44 (n/a)567.90 (n/a)515.80 (n/a)126.32 (n/a)

M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:49:323.60 (n/a)2.14 (n/a)1.95 (n/a)1.53 (n/a)0.85 (n/a)5273.20 (n/a)4148.80 (n/a)4137.70 (n/a)2240.00 (n/a)1212.67 (n/a)943.71 (n/a)560.09 (n/a)510.89 (n/a)400.88 (n/a)222.14 (n/a)

M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:49:320.23 (n/a)0.21 (n/a)0.22 (n/a)0.17 (n/a)0.03 (n/a)7526.60 (n/a)6066.28 (n/a)5708.90 (n/a)5395.80 (n/a)887.84 (n/a)12.44 (n/a)11.23 (n/a)11.76 (n/a)8.92 (n/a)1.47 (n/a)

M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:49:323.85 (n/a)3.67 (n/a)3.77 (n/a)3.28 (n/a)0.23 (n/a)3.85 (n/a)3.67 (n/a)3.77 (n/a)3.28 (n/a)0.23 (n/a)

M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:49:327.59 (n/a)6.94 (n/a)6.81 (n/a)6.10 (n/a)0.65 (n/a)7.59 (n/a)6.93 (n/a)6.81 (n/a)6.10 (n/a)0.65 (n/a)

M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
d22ca66 — 2026-04-17 15:49:3214.08 (n/a)10.35 (n/a)8.38 (n/a)7.76 (n/a)3.12 (n/a)14.07 (n/a)10.34 (n/a)8.38 (n/a)7.75 (n/a)3.12 (n/a)

embedding_dim_1024-hidden_dim_3584

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)17835.51 (n/a)15219.08 (n/a)14942.19 (n/a)13667.88 (n/a)1562.50 (n/a)

embedding_dim_2048-hidden_dim_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)20637.86 (n/a)12921.50 (n/a)13793.82 (n/a)7057.89 (n/a)5813.28 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.04 (n/a)0.02 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)773.40 (n/a)374.20 (n/a)324.60 (n/a)233.50 (n/a)134.54 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.02 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)313.70 (n/a)278.52 (n/a)286.10 (n/a)228.70 (n/a)31.34 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.04 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)552.60 (n/a)371.10 (n/a)385.20 (n/a)193.00 (n/a)145.19 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)478.30 (n/a)404.56 (n/a)434.50 (n/a)311.70 (n/a)67.80 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.04 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)0.01 (n/a)2462.80 (n/a)550.24 (n/a)358.90 (n/a)186.40 (n/a)556.68 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)486.60 (n/a)346.78 (n/a)270.40 (n/a)245.90 (n/a)124.75 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)530.20 (n/a)300.20 (n/a)240.70 (n/a)221.70 (n/a)129.84 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.05 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)572.80 (n/a)397.50 (n/a)429.00 (n/a)215.90 (n/a)161.11 (n/a)

input_length_2048-num_aie_columns_1-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.06 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)593.50 (n/a)431.99 (n/a)490.40 (n/a)204.70 (n/a)148.17 (n/a)

input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.07 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.02 (n/a)555.40 (n/a)367.30 (n/a)298.20 (n/a)186.70 (n/a)161.62 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.05 (n/a)0.02 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)1439.60 (n/a)408.83 (n/a)327.45 (n/a)158.90 (n/a)236.60 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)496.40 (n/a)347.80 (n/a)293.30 (n/a)235.80 (n/a)126.58 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)519.50 (n/a)397.54 (n/a)410.00 (n/a)239.40 (n/a)126.06 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.04 (n/a)0.03 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)594.20 (n/a)354.14 (n/a)239.80 (n/a)232.40 (n/a)167.75 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.04 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)0.01 (n/a)1903.20 (n/a)511.99 (n/a)465.20 (n/a)194.30 (n/a)358.22 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)600.90 (n/a)379.90 (n/a)268.90 (n/a)234.50 (n/a)172.06 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.03 (n/a)0.02 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)634.30 (n/a)375.66 (n/a)312.70 (n/a)244.90 (n/a)154.23 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.03 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)585.70 (n/a)361.24 (n/a)298.90 (n/a)280.10 (n/a)127.59 (n/a)

input_length_2048-num_aie_columns_2-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.07 (n/a)0.04 (n/a)0.04 (n/a)0.01 (n/a)0.02 (n/a)1072.30 (n/a)404.81 (n/a)331.55 (n/a)182.30 (n/a)252.33 (n/a)

input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.06 (n/a)0.04 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)581.20 (n/a)408.02 (n/a)456.80 (n/a)205.10 (n/a)167.65 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.04 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)0.01 (n/a)2458.10 (n/a)573.29 (n/a)416.30 (n/a)208.60 (n/a)533.34 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)887.70 (n/a)442.50 (n/a)374.10 (n/a)245.50 (n/a)262.42 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.04 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)586.10 (n/a)448.08 (n/a)483.40 (n/a)201.30 (n/a)147.55 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.04 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)715.40 (n/a)404.62 (n/a)388.90 (n/a)205.90 (n/a)193.33 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.04 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)656.80 (n/a)426.91 (n/a)445.50 (n/a)202.80 (n/a)130.51 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.03 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)692.60 (n/a)435.74 (n/a)496.80 (n/a)196.20 (n/a)211.31 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.03 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)630.80 (n/a)508.12 (n/a)591.90 (n/a)247.20 (n/a)159.08 (n/a)

input_length_2048-num_aie_columns_4-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.04 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)1948.30 (n/a)553.49 (n/a)400.10 (n/a)280.80 (n/a)500.85 (n/a)

input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)1951.90 (n/a)1163.34 (n/a)927.40 (n/a)499.20 (n/a)725.01 (n/a)

input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)434.70 (n/a)362.56 (n/a)418.70 (n/a)249.80 (n/a)90.52 (n/a)

input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.05 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)0.02 (n/a)576.60 (n/a)400.76 (n/a)439.80 (n/a)149.60 (n/a)176.73 (n/a)

input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.03 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)454.60 (n/a)307.86 (n/a)268.20 (n/a)257.60 (n/a)83.04 (n/a)

input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)660.20 (n/a)511.04 (n/a)480.40 (n/a)349.70 (n/a)142.40 (n/a)

input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)569.70 (n/a)470.98 (n/a)515.70 (n/a)340.20 (n/a)98.99 (n/a)

input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.04 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)560.50 (n/a)397.98 (n/a)422.40 (n/a)219.10 (n/a)125.96 (n/a)

input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.25 (n/a)0.14 (n/a)0.12 (n/a)0.05 (n/a)0.07 (n/a)2409.90 (n/a)1210.36 (n/a)1084.70 (n/a)521.40 (n/a)712.45 (n/a)

input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.41 (n/a)0.27 (n/a)0.24 (n/a)0.18 (n/a)0.08 (n/a)719.10 (n/a)526.58 (n/a)538.70 (n/a)323.40 (n/a)141.19 (n/a)

input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.48 (n/a)0.24 (n/a)0.22 (n/a)0.06 (n/a)0.15 (n/a)2101.00 (n/a)843.96 (n/a)608.20 (n/a)271.00 (n/a)719.19 (n/a)

rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.31 (n/a)0.25 (n/a)0.26 (n/a)0.18 (n/a)0.05 (n/a)549.80 (n/a)415.24 (n/a)374.60 (n/a)312.80 (n/a)94.94 (n/a)

rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.36 (n/a)0.14 (n/a)0.05 (n/a)0.05 (n/a)0.14 (n/a)2065.60 (n/a)1332.80 (n/a)1884.30 (n/a)271.30 (n/a)877.03 (n/a)

rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.40 (n/a)0.26 (n/a)0.20 (n/a)0.15 (n/a)0.11 (n/a)641.30 (n/a)436.34 (n/a)488.40 (n/a)243.90 (n/a)164.09 (n/a)

rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.28 (n/a)0.16 (n/a)0.13 (n/a)0.10 (n/a)0.07 (n/a)741.60 (n/a)513.56 (n/a)548.00 (n/a)263.60 (n/a)173.46 (n/a)

rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.24 (n/a)0.18 (n/a)0.19 (n/a)0.14 (n/a)0.04 (n/a)543.40 (n/a)418.60 (n/a)387.00 (n/a)309.80 (n/a)100.60 (n/a)

rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.25 (n/a)0.18 (n/a)0.15 (n/a)0.14 (n/a)0.05 (n/a)541.60 (n/a)431.84 (n/a)480.50 (n/a)298.60 (n/a)117.42 (n/a)

seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
d22ca66 — 2026-04-17 15:49:320.14 (n/a)0.09 (n/a)0.08 (n/a)0.08 (n/a)0.03 (n/a)27577.98 (n/a)23691.76 (n/a)26186.00 (n/a)14697.13 (n/a)5263.67 (n/a)
Phoenix - Examples

IRONCLAD

Tested on 2026_04_17_15_56_32 at commit d22ca66.

Test Checks TTFT (mean)TPS (mean)

Trends:

IRONCLAD Trends

@andrej andrej merged commit 6b7326b into amd:devel Apr 17, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants