Skip to content

EmbeddedLLM/aiter

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,691 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

aiter

image

AITER is AMD’s centralized repository that support various of high performance AI operators for AI workloads acceleration, where a good unified place for all the customer operator-level requests, which can match different customers' needs. Developers can focus on operators, and let the customers integrate this op collection into their own private/public/whatever framework.

Some summary of the features:

  • C++ level API
  • Python level API
  • The underneath kernel could come from triton/ck/asm
  • Not just inference kernels, but also training kernels and GEMM+communication kernels—allowing for workarounds in any kernel-framework combination for any architecture limitation.

Installation

git clone --recursive https://github.com/ROCm/aiter.git
cd aiter
python3 setup.py develop

If you happen to forget the --recursive during clone, you can use the following command after cd aiter

git submodule sync && git submodule update --init --recursive

FlyDSL (Optional)

AITER's FusedMoE supports FlyDSL-based kernels for mixed-precision MOE (e.g., A4W4). FlyDSL is optional — when not installed, AITER automatically falls back to CK kernels.

pip install --pre flydsl

Or install all optional dependencies at once:

pip install -r requirements.txt

Triton-based Communication (Iris)

AITER supports GPU-initiated communication using the Iris library. This enables high-performance Triton-based communication primitives like reduce-scatter and all-gather.

Installation

Install with Triton communication support:

# Install AITER with Triton communication dependencies
pip install -e .
pip install -r requirements-triton-comms.txt

For more details, see docs/triton_comms.md.

Run operators supported by aiter

There are number of op test, you can run them with: python3 op_tests/test_layernorm2d.py

Ops Description
ELEMENT WISE ops: + - * /
SIGMOID (x) = 1 / (1 + e^-x)
AllREDUCE Reduce + Broadcast
KVCACHE W_K W_V
MHA Multi-Head Attention
MLA Multi-head Latent Attention with KV-Cache layout
PA Paged Attention
FusedMoe Mixture of Experts
QUANT BF16/FP16 -> FP8/INT4
RMSNORM root mean square
LAYERNORM x = (x - u) / (σ2 + ϵ) e*0.5
ROPE Rotary Position Embedding
GEMM D=αAβB+C

About

AI Tensor Engine for ROCm

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 63.3%
  • Cuda 25.1%
  • C++ 10.8%
  • Jinja 0.4%
  • Shell 0.2%
  • C 0.2%