Skip to content

allura-org/dumbo

Repository files navigation

▓█████▄  █    ██  ███▄ ▄███▓ ▄▄▄▄    ▒█████     __QQ  
▒██▀ ██▌ ██  ▓██▒▓██▒▀█▀ ██▒▓█████▄ ▒██▒  ██▒  (_)_"> 
░██   █▌▓██  ▒██░▓██    ▓██░▒██▒ ▄██▒██░  ██▒ _)      
░▓█▄   ▌▓▓█  ░██░▒██    ▒██ ▒██░█▀  ▒██   ██░ version 0.0.1-rc
░▒████▓ ▒▒█████▓ ▒██▒   ░██▒░▓█  ▀█▓░ ████▓▒░ Research Preview, please steal
 ▒▒▓  ▒ ░▒▓▒ ▒ ▒ ░ ▒░   ░  ░░▒▓███▀▒░ ▒░▒░▒░  
 ░ ▒  ▒ ░░▒░ ░ ░ ░  ░      ░▒░▒   ░   ░ ▒ ▒░  
 ░ ░  ░  ░░░ ░ ░ ░      ░    ░    ░ ░ ░ ░ ▒   
   ░       ░            ░    ░          ░ ░   
============================================================================
dumbo is a modular-by-design machine learning program, originally created
for training llms.

USAGE
`dumbo config.yml` runs the train described in config.yml. see `examples/`
for further (bad, vibe-configed) examples of configuration files.

LIMITATIONS / TODOS
- very unoptimized and some parts are untested (todo!!!)
    - needs flash_attn model patch(?)
    - needs cut cross entropy model patch
    - needs quantization/qlora patch
- single gpu only (for now)
- will only support fsdp even when multigpu is fully supported 
    (contributions welcome for deepspeed if you maintain it)
- no RL, online or offline (for now!)

MOTIVATION
dumbo was designed to absolve the sins of previous training harnesses
and frameworks:
- it is plugin-first and designed to be extensible
    - all default features are implemented as plugins and can be
      easily swapped out

ACKNOWLEDGEMENTS
- allura <3
- moonshot ai for creating kimi, the model that oneshot half of this codebase
- anthropic for creating claude code, the harness that oneshot half of this
codebase

About

a modular-by-design llm (and more!) trainer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages