A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
-
Updated
Mar 19, 2025 - Python
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
An educational Python project for learning tokenization step by step by building character-level, byte-level, and BPE tokenizers from scratch.
Simple and easy to understand PyTorch implementation of Large Language Model (LLM) GPT and LLAMA from scratch with detailed steps. Implemented: Byte-Pair Tokenizer, Rotational Positional Embedding (RoPe), SwishGLU, RMSNorm, Mixture of Experts (MOE). Tested on Taylor Swift song lyrics dataset.
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
Byte Pair Encoding Implementation From Scratch (Rust)
A modular Byte Pair Encoding (BPE) tokenizer implemented in Python
This repository provides a comprehensive toolkit for Byte Pair Encoding (BPE), encompassing functionalities for vocabulary training, encoding, decoding, and vector training.
Implementation of Byte-Pair-Encoding (BPE) tokenization
Add a description, image, and links to the byte-pair-tokenizer topic page so that developers can more easily learn about it.
To associate your repository with the byte-pair-tokenizer topic, visit your repo's landing page and select "manage topics."