OmniHub: Tools for AI/ML Workload Analysis and Characterization

Introduction

This repository provides utilities (scripts, tools, and container images) to execute and analyze AI/ML workloads on AMD systems, evaluating performance across system scales, model granularities, and AMD performance tools. Tested with local ML models and pre-built Docker and Apptainer images, the tools enable flexible mix-and-match configurations via simple command line arguments. We use the Hugging Face and vLLM APIs in our examples, with support for additional frameworks available upon request (contact us).

ML Models Sample List

Llama (v2-v4, 3B-405B)
DeepSeek-R1
AMD-OLMo (1B)
Mistral (7B-24B)
Qwen (v2.5-7B)
NVLM (72B)
more can be added on request (contact us)

See here for more details on the available ML models on Frontier and the HPC Fund clusters.

Frameworks

Hugging Face: inference and finetuning
vLLM: inference
PyTorch: training and inference
SGLang: coming soon...

Systems

HPC Fund: multi-node, multi-GPU; Apptainer
Frontier (OLCF): multi-node, multi-GPU; Apptainer

Tools Sample List

rocprof/rocprof-compute
PyTorch Profiler
omnitrace
omnistat
more can be added on request (contact us).

Check the list of supported tools for more details about tools and their execution modes.

Generating and Executing Jobs with OmniHub

Our test clusters use SLURM for job management. OmniHub provides the tool omnihub-generate-job to automatically create SLURM job scripts that fit your target environment. Using this tool, researchers can easily generate and execute jobs with different combinations of ML models, container platforms, number of nodes, and performance tools. For non-SLURM systems, the tool can serve as a reference to modify launch commands as needed.

In its most basic form, SLURM job script generation and job execution work as follows:

git clone https://github.com/AMDResearch/omnihub.git $HOME/omnihub
cd $HOME/omnihub

./omnihub-generate-job --omnihub-dir $HOME/omnihub \
  --app-config applications/hf-infer/config-example.yaml \
  --output hf-infer.slurm
sbatch hf-infer.slurm

Where --omnihub-dir points to your working copy of the OmniHub repository in the cluster and --app-config points to the path to the application configuration file relative to the OmniHub directory. Refer to this document for more examples of using the omnihub-generate-job tool.

Generated SLURM scripts run scripts/sanity-check.sh on the allocated nodes before the main workload. This checks ROCm availability (rocminfo), GPU compute-mode, and PyTorch NCCL (broadcast / all_reduce under torchrun via scripts/sanity_torch_dist.py). If the sanity step fails with errors such as NCCL connection refused or traffic directed at link-local addresses (for example 169.254.x.x), fix cluster networking first—e.g. NCCL_SOCKET_IFNAME / GLOO_SOCKET_IFNAME and correct MASTER_ADDR reachability—before debugging application code.

List of Command Line Options for `omnihub-generate-job`

Flag	Options	Description
`omnihub-dir`		OmniHub working directory.
`app-config`		Relative path to the app config.
`cluster`	`hpcfund`, `frontier`	Cluster name.
`partition`		Partition or job queue name.
`num-nodes`		Number of nodes to allocate.
`platform`	`apptainer`, `docker`	Container platform to use.
`runner`	`manual`, `torchrun`	Distributed runner for multi-node.
`tools`	List of tools	List of profiling tools.
`time-limit`		SLURM job time limit.
`tasks-per-node`		Override Slurm tasks per node.
`image`		Custom container image path.

List of Supported Tools

Tool	Description
`rocprof-compute`	Collect all performance counters.
`omnistat`	Low-overhead system metrics, sampled at 1s intervals.
`omnistat-rocprofiler-pmc1`	Low-overhead performance counter collection, sampled at 1s intervals, 1st set of PMCs.
`omnistat-rocprofiler-pmc2`	Low-overhead performance counter collection, sampled at 1s intervals, 2nd set of PMCs.
`omnitrace`	Application tracing.
`pytorch-stats`	Collects detailed statistics of PyTorch operations.
`pytorch-trace`	PyTorch execution traces compatible with TensorBoard.
`rocprofv1-stats`	Kernel execution stats (to be deprecated soon).
`rocprofv2-pmc`	Profiling with performance counters (configuration).
`rocprofv3-stats`	Kernel execution stats.
`rccl-info`	Collects statistics of RCCL collective calls

Example Applications

Explore example applications that demonstrate the usage of various ML models and configurations:

Hugging Face Inference: LLM inference using the Hugging Face API.
Hugging Face Finetuning: Finetune LLM models.
vLLM Inference: Inference via the vLLM API.
vLLM Latency Benchmark: Measure response times.
vLLM Throughput Benchmark: Test concurrent processing.
PyTorch Training: Train a simple CNN with PyTorch.
PyTorch Inference: Run inference on a simple CNN.

Optional Note: To review vLLM benchmark scripts, launch an OmniHub container and navigate to /app/vllm/benchmarks.

For Apptainer, run:
apptainer shell /path/to/omnihub-image.sif
cd /app/vllm/benchmarks/
See docs/images.md for cluster-specific image paths.

All examples include YAML configuration files specifying the main entrypoint, tensor parallel size, and other settings. A snippet of the Hugging Face inference configuration file is provided below.

# Specifies the main script to run the application. The main function in the
# application may be decorated with `omnihub.entrypoint` and other functions may be
# decorated with `omnihub.tools.profile` to enable detailed profiling.
entrypoint: applications/hf-infer/infer.py

# Provides details for loading the model. The script will check if the model exists in
# `OMNIHUB_MODELS_DIR` before automatically downloading from Hugging Face.
ModelArguments:
  pretrained_model_name_or_path: meta-llama/Llama-3.1-8B-Instruct

Sweeping Jobs and Application Arguments

The OmniHub sweep tool (omnihub-sweep) automates job generation and submission. It creates different job setups based on CLI flags and generates application configurations from templates.

Application templates are based off application config files and let you list multiple values for certain fields. The sweep tool then produces every possible combination of those fields. For example, the following template that sweeps the input and output lengths for the vLLM latency benchmark will generate 4 configuration files with different combinations of input_len/output_len: 32/128, 32/256, 64/128, and 64/256.

entrypoint: /app/vllm/benchmarks/benchmark_latency.py
model: meta-llama/Llama-3.1-8B-Instruct
tensor_parallel_size: 1
input_len:
  - 32
  - 64
output_len:
  - 128
  - 256
batch_size: 8
# more configuration params go here...

Note: For creating a template for vLLM throughput benchmark or any example app, simply start with the sample config file in the app directory.

The omnihub-sweep CLI also allows sweeping over job-related options, including:

partitions: cluster partitions.
num-nodes: number of nodes.
tools: profiling tools to enable (tools can be set multiple times to enable different sets of tools).

For example, using the previously listed template stored in a file named vllm-latency-template.yaml, the following omnihub-sweep will generate 4 configuration files, and then it will submit jobs for all of them in 2 partitions using 2 different sets of tools:

mkdir sweep-vllm
./omnihub-sweep --omnihub-dir $PWD --sweep-dir ./sweep-vllm \
  --template vllm-latency-template.yaml \
  --partitions mi2104x mi2508x \
  --tools omnistat --tools omnistat rocprofv3-stats

Starting a new sweep
.. Generated configurations: 4
Number of jobs in this sweep: 16
Submitting job: mi2104x/1/omnistat/config-00001.yaml
Submitting job: mi2104x/1/omnistat/config-00003.yaml
Submitting job: mi2104x/1/omnistat/config-00002.yaml
Submitting job: mi2104x/1/omnistat/config-00000.yaml
Submitting job: mi2104x/1/omnistat,rocprofv3-stats/config-00001.yaml
Submitting job: mi2104x/1/omnistat,rocprofv3-stats/config-00003.yaml
Submitting job: mi2104x/1/omnistat,rocprofv3-stats/config-00002.yaml
Submitting job: mi2104x/1/omnistat,rocprofv3-stats/config-00000.yaml
Submitting job: mi2508x/1/omnistat/config-00001.yaml
Submitting job: mi2508x/1/omnistat/config-00003.yaml
Submitting job: mi2508x/1/omnistat/config-00002.yaml
Submitting job: mi2508x/1/omnistat/config-00000.yaml
Submitting job: mi2508x/1/omnistat,rocprofv3-stats/config-00001.yaml
Submitting job: mi2508x/1/omnistat,rocprofv3-stats/config-00003.yaml
Submitting job: mi2508x/1/omnistat,rocprofv3-stats/config-00002.yaml
Submitting job: mi2508x/1/omnistat,rocprofv3-stats/config-00000.yaml

To test sweeps and job generation without submitting jobs to the cluster, use the --dry-run flag.

Processing Metrics and Results

After running OmniHub-generated jobs, you can process the results to create a summary in a standardized format (omnihub-process). The summary includes details from various sources:

Job configuration options
Application configuration options
Application metrics (if available)
Default monitor metrics
Omnistat report metrics

To process and index the results, use:

./omnihub-process --results-dir /path/to/results/omnihub -j 4
./omnihub-index --results-dir /path/to/results/omnihub --output index

Note: omnihub-index requires that omnihub-process has completed successfully and that processed data is present in each job directory. If no processed data is found, omnihub-index will exit with an error message. Please ensure all jobs have been processed before running the index step.

This will generate an index.csv file in the top directory of the repository. The CSV uses two header rows and can be loaded in Pandas as follows:

import pandas
df = pandas.read_csv("index.csv", header=[0,1], index_col=0)

Known Issues

rocprofiler-compute does not work with multi-node runs.
Tools like pytorch-trace, omnitrace, and rocprofv3-stats tend to generate many GBs trace data per rank and you may run out of disk space very soon.

Developer Corner

If you want to contribute to OmniHub, make sure you read this document for developer pre-requisites.

Contact

Email: dl.RAD-omnihub@amd.com
GitHub Discussions

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
.claude/commands		.claude/commands
.cursor		.cursor
.github/workflows		.github/workflows
applications		applications
config		config
docs		docs
omnihub		omnihub
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.yamlfmt		.yamlfmt
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
omnihub-generate-job		omnihub-generate-job
omnihub-index		omnihub-index
omnihub-process		omnihub-process
omnihub-run		omnihub-run
omnihub-sweep		omnihub-sweep
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmniHub: Tools for AI/ML Workload Analysis and Characterization

Table of Contents

Introduction

Generating and Executing Jobs with OmniHub

List of Command Line Options for `omnihub-generate-job`

List of Supported Tools

Example Applications

Sweeping Jobs and Application Arguments

Processing Metrics and Results

Known Issues

Developer Corner

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OmniHub: Tools for AI/ML Workload Analysis and Characterization

Table of Contents

Introduction

Generating and Executing Jobs with OmniHub

List of Command Line Options for omnihub-generate-job

List of Supported Tools

Example Applications

Sweeping Jobs and Application Arguments

Processing Metrics and Results

Known Issues

Developer Corner

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

List of Command Line Options for `omnihub-generate-job`

Packages