Skip to content

MegatronBridge: hf_token validator is too strict — rejects non-gated models and breaks CI #844

@rutayan-nv

Description

@rutayan-nv

Bug Description

MegatronBridgeCmdArgs.validate_hf_token raises a ValidationError when hf_token is empty and HF_TOKEN is not set in the environment, even for models that do not require a HuggingFace token (e.g., Qwen3-235B-A22B, which has gated: False on HuggingFace).

This breaks test_test_definitions.py in CI environments where HF_TOKEN is not set.

Root Cause

The validator in megatron_bridge.py:

@field_validator("hf_token", mode="after", check_fields=False)
def validate_hf_token(cls, v):
    token = (v or "").strip() or os.environ.get("HF_TOKEN", "").strip()
    if not token:
        raise ValueError(
            "cmd_args.hf_token is required. Please set HF_TOKEN environment variable ..."
        )
    return token

However, in Megatron-Bridge's argument_parser.py, --hf_token is optional with no default — it is only required for gated models. The mismatch means cloudai enforces a requirement that Megatron-Bridge itself does not.

Steps to Reproduce

  1. Add a MegatronBridge test TOML without hf_token (or with hf_token = "") for a non-gated model
  2. Run pytest tests/test_test_definitions.py without HF_TOKEN set in the environment
  3. Observe ValidationError: cmd_args.hf_token is required
# conf/.../test/my_model.toml
[cmd_args]
model_family_name = "qwen"
model_recipe_name = "qwen3_235b_a22b"
compute_dtype = "bf16"
# no hf_token — model is not gated
pydantic_core._pydantic_core.ValidationError: 1 validation error for MegatronBridgeTestDefinition
cmd_args.hf_token
  Value error, cmd_args.hf_token is required. Please set HF_TOKEN environment variable (recommended)
  or cmd_args.hf_token with your actual HF token value.

Expected Behavior

hf_token should be optional. When empty, it should be passed as None to Megatron-Bridge (which already handles None gracefully). The validator should not raise for non-gated models.

Suggested Fix

Make hf_token optional — pass it through as-is when empty rather than raising:

hf_token: str | None = Field(default=None)

@field_validator("hf_token", mode="after")
def validate_hf_token(cls, v):
    token = (v or "").strip() or os.environ.get("HF_TOKEN", "").strip()
    return token or None  # None is valid — Megatron-Bridge handles it

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions