Skip to content

Version the training data with DVC#4

Merged
danielbusnz merged 1 commit into
mainfrom
add-dvc-data-versioning
May 29, 2026
Merged

Version the training data with DVC#4
danielbusnz merged 1 commit into
mainfrom
add-dvc-data-versioning

Conversation

@danielbusnz

Copy link
Copy Markdown
Owner

Tracks the data/ training pool with DVC: git holds the *.jsonl.dvc pointers, the data lives in a Cloudflare R2 remote (dvc pull to fetch). The frozen evals/holdout.jsonl stays committed directly so the benchmark reproduces without DVC. R2 credentials live in the gitignored .dvc/config.local; only the endpoint is committed. README updated with the pull instructions.

Track data/*.jsonl with DVC (git holds the .dvc pointers, data lives in an
R2 remote). The frozen evals/holdout.jsonl stays committed directly so the
benchmark reproduces without DVC. Credentials live in the gitignored
.dvc/config.local, only the endpoint is committed.
@danielbusnz danielbusnz merged commit 4859ab0 into main May 29, 2026
3 checks passed
@danielbusnz danielbusnz deleted the add-dvc-data-versioning branch May 29, 2026 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant