Skip to content

Add PhysicsIQ benchmark reproduction cookbook for Cosmos3#194

Draft
akashgokul wants to merge 1 commit into
NVIDIA:mainfrom
akashgokul:feature/physicsiq-benchmark-notebook
Draft

Add PhysicsIQ benchmark reproduction cookbook for Cosmos3#194
akashgokul wants to merge 1 commit into
NVIDIA:mainfrom
akashgokul:feature/physicsiq-benchmark-notebook

Conversation

@akashgokul
Copy link
Copy Markdown
Collaborator

Adds an end-to-end notebook reproducing the PhysicsIQ benchmark with Cosmos3-Super (and Cosmos3-Nano) via the native cosmos-framework PyTorch entrypoint. Covers both I2V and V2V task formats with verified reference scores (I2V: 43.8, V2V: 59.7). Also adds the prompts we used for I2V and V2V in assets.

@akashgokul akashgokul marked this pull request as draft June 5, 2026 21:00
@akashgokul akashgokul force-pushed the feature/physicsiq-benchmark-notebook branch 2 times, most recently from 8fa9ce7 to 90346ea Compare June 5, 2026 21:30
Adds an end-to-end notebook for reproducing the PhysicsIQ benchmark with
Cosmos3-Super using the native cosmos-framework PyTorch entrypoint.

Location: evaluation/cosmos3/Physics_IQ/

Contents:
- run_with_cosmos_framework.ipynb: walks through I2V and V2V task formats
  end-to-end — download the PhysicsIQ dataset, generate, stage, and
  optionally score with the official PhysicsIQ scorer.
- assets/i2v_prompts.json: 198 per-case I2V prompts + negative prompts
- assets/v2v_prompts.json: 198 per-case V2V prompts + negative prompts

Reference scores (Cosmos3-Super): I2V 43.8, V2V 59.7.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@akashgokul akashgokul force-pushed the feature/physicsiq-benchmark-notebook branch from 90346ea to 79ddcae Compare June 5, 2026 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant