Skip to content

Add 4 verified GPUs and GPU compatibility test script#60

Open
slacki-ai wants to merge 1 commit intolongtermrisk:v0.9from
slacki-ai:add-verified-gpus
Open

Add 4 verified GPUs and GPU compatibility test script#60
slacki-ai wants to merge 1 commit intolongtermrisk:v0.9from
slacki-ai:add-verified-gpus

Conversation

@slacki-ai
Copy link
Copy Markdown
Contributor

Summary

  • Adds 4 new GPUs to VERIFIED_GPUs after testing both finetuning and inference with our CUDA 12.8 docker image: RTX 6000 Ada, RTX 3090, A40, RTX A4500
  • Adds tests/gpu_compatibility_check.py — a manual test script that spawns RunPod pods directly (bypassing the cluster manager) to validate GPU/docker image compatibility

Test results

Each GPU was tested by spawning a RunPod pod with our production docker image and running a minimal 3-step SFT finetuning job and a 3-prompt vLLM inference job on unsloth/Llama-3.2-1B-Instruct.

GPU Finetuning Inference Verdict
RTX 6000 Ada (6000Ada) Added
A40 Added
RTX 3090 Added
RTX A4500 Added
RTX 5090 Broken — CUDA 12.8 incompatible (workers crash with vram=0)
RTX 4090 Broken — same CUDA detection failure
RTX 2000 Ada Broken — same CUDA detection failure
RTX A5000 "Does not have the resources"
RTX 4000 Ada No RunPod stock available
RTX A4000 No RunPod stock available
L4 No RunPod stock available
L40S No RunPod stock available

About the test script

tests/gpu_compatibility_check.py is a standalone manual script (not picked up by pytest). Usage:

python tests/gpu_compatibility_check.py --gpus L40 A100 A40
python tests/gpu_compatibility_check.py --gpus A40 --job-types inference

It creates RunPod pods directly, creates targeted OpenWeights jobs, polls for completion, and reports a results table + JSON output.

Test plan

  • Verified finetuning completes on all 4 GPUs
  • Verified inference completes on all 4 GPUs
  • Test script syntax validated
  • Reviewer: confirm new GPUs are desirable for the verified list

🤖 Generated with Claude Code

Tested finetuning and inference on RunPod with our CUDA 12.8 docker image.
Added 6000Ada, RTX3090, A40, and A4500 to VERIFIED_GPUs after confirming
both job types complete successfully.

Includes tests/gpu_compatibility_check.py — a manual test script that
spawns RunPod pods directly (bypassing the cluster manager) to validate
GPU compatibility. Not picked up by pytest (no test_ prefix).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant