Skip to content

GPU Backend: AWS and GCP deployment support #11

@Stanley-blik

Description

@Stanley-blik

Overview

Support deploying the GPU backend on AWS (p4d/p5) and GCP (a2/a3) instances.

Tasks

  • AWS p4d.24xlarge (8× A100) deployment scripts
  • AWS p5.48xlarge (8× H100) deployment scripts
  • GCP a2-highgpu-8g (8× A100) deployment scripts
  • Terraform/Pulumi IaC templates
  • Auto-scaling configuration
  • Spot instance support for cost savings
  • AMI/image baking with pre-loaded model weights

Acceptance Criteria

  • One-command deployment to AWS or GCP
  • Model weights pre-loaded or cached for fast cold start
  • Spot instance fallback logic

Metadata

Metadata

Assignees

No one assigned

    Labels

    gpu-backendGPU inference server and model deploymentinfraInfrastructure, deployment, and cloud providersmilestone:v2Post-MVP improvementspriority:lowLow priority

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions