From-scratch 135M Transformer pretraining on 10B FineWeb-Edu tokens using a single NVIDIA L20, with public checkpoint and lm-eval comparisons.
-
Updated
Jun 27, 2026 - Python
From-scratch 135M Transformer pretraining on 10B FineWeb-Edu tokens using a single NVIDIA L20, with public checkpoint and lm-eval comparisons.
Add a description, image, and links to the fineweb-edu topic page so that developers can more easily learn about it.
To associate your repository with the fineweb-edu topic, visit your repo's landing page and select "manage topics."