[NeurIPS 2025 Oral] Official Code for Exploring Diffusion Transformer Designs via Grafting
-
Updated
Jan 9, 2026 - Jupyter Notebook
[NeurIPS 2025 Oral] Official Code for Exploring Diffusion Transformer Designs via Grafting
Run IBM Granite 4.0 locally on Raspberry Pi 5 with Ollama.This is a privacy-first AI. Your data never leaves your device because it runs 100% locally. There are no cloud uploads and no third-party tracking.
mech-interp suite for Granite4 models that use Mamba-2 architecture
Early baby steps towards a long-term vision regarding Mamba-2's state interpretability.
A simple, minimalistic, and explainable code implementation of of Nemotron 3 Nano in JAX
Mechanistic interpretability for State-Space Models: SAEs, feature visualization, and a Hub registry for Mamba/Mamba-2.
Systematic study of LoRA fine-tuning strategies for IBM Granite 4.0-H-Micro (Mamba-2 + Transformer hybrid). Demonstrates the impact of architecture-aware target selection and SSM core parameter co-training, including analysis of PEFT serialization behavior. Reports up to 37% relative improvement over LoRA-only baselines.
A simple, minimalistic, and explainable JAX implementation of Mamba 2 & Mamba 3
Integrated mechanistic interpretability + sparse autoencoder framework for Hybrid SSM-Attention models (Mamba-2, Hymba, RWKV-7). v0.1.2 alpha: real forward-pass intervention + mean-ablation patching shipped, CPU smoke; GPU/real adapters in v0.2.
Add a description, image, and links to the mamba-2 topic page so that developers can more easily learn about it.
To associate your repository with the mamba-2 topic, visit your repo's landing page and select "manage topics."