Don't fail import when the gated Llama-2 tokenizer is inaccessible (c4.py) by kylefoxaustin · Pull Request #10 · IST-DASLab/Quartet

kylefoxaustin · 2026-06-03T23:18:55Z

Problem

src/data/c4.py loads the gated meta-llama/Llama-2-7b-hf tokenizer at module-import time (a top-level statement). Because src/data/utils.py imports c4, this fires on any dataset import — so a user without gated access to that repo cannot train on any dataset (wikitext, shakespeare, etc.). They hit:

huggingface_hub.errors.GatedRepoError: 403 Client Error ...
OSError: You are trying to access a gated repo.

Fix

Guard the module-level load in try/except so import never fails. c4 still loads/uses the tokenizer when that dataset is actually selected (a user training on c4 is expected to have access).

Testing

--dataset wikitext now imports and trains with no access to meta-llama/Llama-2-7b-hf.

src/data/c4.py loads the gated meta-llama/Llama-2-7b-hf tokenizer at module-import time. Because src/data/utils.py imports c4, this fires on ANY dataset import, so a user without gated access to that repo cannot train on any dataset (wikitext, shakespeare, ...) -- they hit GatedRepoError / 'trying to access a gated repo'. Guard the load in try/except so import never fails; c4 still uses the tokenizer when actually selected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't fail import when the gated Llama-2 tokenizer is inaccessible (c4.py)#10

Don't fail import when the gated Llama-2 tokenizer is inaccessible (c4.py)#10
kylefoxaustin wants to merge 1 commit into
IST-DASLab:mainfrom
kylefoxaustin:fix-c4-gated-tokenizer-import

kylefoxaustin commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kylefoxaustin commented Jun 3, 2026

Problem

Fix

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant