Skip to content

Benchmark: MI300X multi-instance Wan 2.2 concurrency #17

@Stanley-blik

Description

@Stanley-blik

Goal

Determine optimal number of concurrent Wan 2.2 instances per MI300X GPU.

Theoretical: 4-9 instances per GPU (32-72 total on 8-GPU node). Needs real hardware validation.

Tasks

  • Get access to MI300X hardware (RunPod, Hot Aisle, or Azure ND MI300X v5)
  • Run baseline: single instance peak VRAM + time per clip
  • Scale test: 2, 3, 4 instances on same GPU — measure throughput
  • Test weight sharing approach (shared model, independent working memory)
  • Full node test: optimal × 8 GPUs
  • Record all metrics (VRAM, bandwidth, time/clip, total throughput)
  • Update docs/research/mi300x-benchmarking.md with results
  • Set default max_instances_per_gpu in config based on findings

Details

See docs/research/mi300x-benchmarking.md for full protocol.

Blocked By

Access to MI300X hardware.

Metadata

Metadata

Assignees

No one assigned

    Labels

    gpu-backendGPU inference server and model deploymentpriority:mediumMedium priorityresearchResearch and experimentation

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions