aiter

AITER is AMD’s centralized repository that support various of high performance AI operators for AI workloads acceleration, where a good unified place for all the customer operator-level requests, which can match different customers' needs. Developers can focus on operators, and let the customers integrate this op collection into their own private/public/whatever framework.

Some summary of the features:

C++ level API
Python level API
The underneath kernel could come from triton/ck/asm
Not just inference kernels, but also training kernels and GEMM+communication kernels—allowing for workarounds in any kernel-framework combination for any architecture limitation.

Installation

git clone --recursive https://github.com/ROCm/aiter.git
cd aiter
python3 setup.py develop

If you happen to forget the --recursive during clone, you can use the following command after cd aiter

git submodule sync && git submodule update --init --recursive

FlyDSL (Optional)

AITER's FusedMoE supports FlyDSL-based kernels for mixed-precision MOE (e.g., A4W4). FlyDSL is optional — when not installed, AITER automatically falls back to CK kernels.

pip install --pre flydsl

Or install all optional dependencies at once:

pip install -r requirements.txt

Triton-based Communication (Iris)

AITER supports GPU-initiated communication using the Iris library. This enables high-performance Triton-based communication primitives like reduce-scatter and all-gather.

Installation

Install with Triton communication support:

# Install AITER with Triton communication dependencies
pip install -e .
pip install -r requirements-triton-comms.txt

For more details, see docs/triton_comms.md.

Run operators supported by aiter

There are number of op test, you can run them with: python3 op_tests/test_layernorm2d.py

Ops	Description
ELEMENT WISE	ops: + - * /
SIGMOID	(x) = 1 / (1 + e^-x)
AllREDUCE	Reduce + Broadcast
KVCACHE	W_K W_V
MHA	Multi-Head Attention
MLA	Multi-head Latent Attention with KV-Cache layout
PA	Paged Attention
FusedMoe	Mixture of Experts
QUANT	BF16/FP16 -> FP8/INT4
RMSNORM	root mean square
LAYERNORM	x = (x - u) / (σ2 + ϵ) e*0.5
ROPE	Rotary Position Embedding
GEMM	D=αAβB+C

Name		Name	Last commit message	Last commit date
Latest commit History 1,691 Commits
.claude/skills/opus-kernel-best-practice		.claude/skills/opus-kernel-best-practice
.githooks		.githooks
.github		.github
3rdparty		3rdparty
aiter		aiter
aiter_logs		aiter_logs
csrc		csrc
docs		docs
gradlib		gradlib
hsa		hsa
op_tests		op_tests
scripts		scripts
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.gitignore		.gitignore
.gitmodules		.gitmodules
CONTRIBUTE.md		CONTRIBUTE.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
REPO_CLEANUP_PLAN.md		REPO_CLEANUP_PLAN.md
pr-body.md		pr-body.md
pyproject.toml		pyproject.toml
requirements-triton-comms.txt		requirements-triton-comms.txt
requirements.txt		requirements.txt
setup.py		setup.py
split-tests-update-summary.md		split-tests-update-summary.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aiter

Installation

FlyDSL (Optional)

Triton-based Communication (Iris)

Run operators supported by aiter

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

aiter

Installation

FlyDSL (Optional)

Triton-based Communication (Iris)

Run operators supported by aiter

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages