Add PhysicsIQ benchmark reproduction cookbook for Cosmos3#194
Draft
akashgokul wants to merge 1 commit into
Draft
Add PhysicsIQ benchmark reproduction cookbook for Cosmos3#194akashgokul wants to merge 1 commit into
akashgokul wants to merge 1 commit into
Conversation
8fa9ce7 to
90346ea
Compare
Adds an end-to-end notebook for reproducing the PhysicsIQ benchmark with Cosmos3-Super using the native cosmos-framework PyTorch entrypoint. Location: evaluation/cosmos3/Physics_IQ/ Contents: - run_with_cosmos_framework.ipynb: walks through I2V and V2V task formats end-to-end — download the PhysicsIQ dataset, generate, stage, and optionally score with the official PhysicsIQ scorer. - assets/i2v_prompts.json: 198 per-case I2V prompts + negative prompts - assets/v2v_prompts.json: 198 per-case V2V prompts + negative prompts Reference scores (Cosmos3-Super): I2V 43.8, V2V 59.7. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
90346ea to
79ddcae
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds an end-to-end notebook reproducing the PhysicsIQ benchmark with Cosmos3-Super (and Cosmos3-Nano) via the native cosmos-framework PyTorch entrypoint. Covers both I2V and V2V task formats with verified reference scores (I2V: 43.8, V2V: 59.7). Also adds the prompts we used for I2V and V2V in assets.