OpenMOSS presents a collection of our research on Large Language Models and Multimodal Foundation Models, supported by Shanghai Innovation Institute (SII), Fudan University, and MOSI.AI.
📌 This page is a curated overview. For the complete and most recent list of repositories, visit the OpenMOSS organization.
Last updated: 2026-05-27
Foundation language models and training infrastructure.
- MOSS GitHub ⭐ 12.1k
- An open-source tool-augmented conversational language model from Fudan University — the founding project of the OpenMOSS series.
- BandPO GitHub
- Probability-aware bounds for LLM reinforcement learning. Replaces canonical PPO/GRPO clipping with dynamic bounds to resolve exploration bottlenecks and prevent entropy collapse.
- CoLLiE GitHub
- A library for collaborative training of large language models in an efficient way.
Multimodal models for visual understanding.
- MOSS-VL GitHub
- Core multimodal model series within the OpenMOSS ecosystem, dedicated to visual understanding. Includes the XRoPE architecture and a fully open training stack.
- AnyGPT GitHub ⭐ 880+
- Unified multimodal LLM with discrete sequence modeling.
End-to-end speech, audio, and music foundation models.
- MOSS-TTS-Nano GitHub ⭐ 3.2k
- A 0.1B-parameter open-source multilingual TTS model — runs in real time on CPU without a GPU, designed for local demos, web serving, and lightweight product integration.
- MOSS-TTS GitHub ⭐ 1.9k
- Open-source TTS family covering stable long-form speech, multi-speaker dialogue, voice/character design, environmental sound effects, and real-time streaming TTS.
- MOSS-TTSD GitHub ⭐ 1.3k
- Spoken dialogue generation model with expressive multi-speaker synthesis, long-context modeling, flexible speaker control, multilingual support, and zero-shot voice cloning.
- 🤗 HuggingFace: MOSS-TTSD-v0.5
- MOSS-Audio GitHub
- Open-source foundation model for unified audio understanding — speech, sound, music, captioning, QA, and reasoning in real-world scenarios.
- MOSS-Audio-Tokenizer GitHub
- Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of audio, supports streaming and variable bitrates, delivers SOTA reconstruction.
- MOSS-Speech GitHub
- A true speech-to-speech large language model without text guidance.
- MOSS-Music GitHub
- Music understanding model for captioning, lyrics ASR, structural analysis, chord/key/tempo reasoning, and long-form musical QA.
- SpeechGPT-2.0-preview GitHub
- GPT-4o-level, real-time spoken dialogue system.
Video understanding and synchronized video–audio generation.
- MOVA GitHub ⭐ 1k
- Towards scalable and synchronized video–audio generation.
- MOSS-Video-Preview GitHub
- A real-time video understanding foundation model built on Llama-3.2-Vision, with comprehensively extended video processing and multimodal reasoning capabilities.
Embodied AI: humanoid control, robotic manipulation, and embodied planning.
- FRoM-W1 GitHub
- Towards general humanoid whole-body control with language instructions (arXiv 2026). Supports Unitree H1/G1, FFTAI humanoid robots.
- RoboOmni GitHub
- Proactive robot manipulation in omni-modal context.
- Embodied-Planner-R1 arXiv GitHub
- A reinforcement learning framework that enables LLMs to acquire embodied planning capabilities through autonomous exploration with sparse rewards.
- RoboJuDo GitHub
- Deployment framework for the FRoM-W1 humanoid project.
- VehicleWorld GitHub
- First comprehensive multi-device environment for intelligent vehicle interaction, modeling complex interconnected systems in modern cockpits.
Mechanistic interpretability of large language models.
- Llamascopium GitHub (formerly Language-Model-SAEs)
- A performant, fully-distributed framework for training, analyzing, and visualizing Sparse Autoencoders (SAEs) and frontier variants, empowering scalable and systematic mechanistic interpretability research.
- 🤗 HuggingFace: Llama Scope
- 🌐 Visualization: Llama Scope on Neuronpedia
- Lorsa GitHub
- Low-rank sparse attention for interpretability.
The Embodied AI Team empowers large models to execute real-world tasks, aiming to automate tedious chores and unlock superhuman intelligence through environmental interaction. We believe true AI emerges from engaging with the physical world.
- VLABench arXiv GitHub — ICCV 2025
- The first large-scale robot manipulation benchmark designed to fairly evaluate the multi-dimensional ability of general-purpose Vision-Language-Action models.
- Dual Preference Optimization for Embodied Task Planning arXiv GitHub — ACL 2025
- A unified learning framework that empowers embodied agents with stronger world modeling and embodied planning ability via dual preference optimization.
- World-Aware-Planning arXiv GitHub
- World-aware narrative enhancement bridging high-level task instructions and nuanced real-world environment details.
- Embodied-Planner-R1 arXiv GitHub
- RL framework enabling LLMs to acquire embodied planning capabilities through autonomous exploration with sparse rewards.
- Awesome-WAM GitHub ⭐ 560+
- A curated, continuously updated reading list, paper blogs, and resources for World Action Models in embodied AI.
The SII-OpenMOSS New Architecture Team explores new architectures and paradigms of LLMs, particularly from the perspective of long-context capability and efficiency.
- ReAttention arXiv GitHub — ICLR 2025
- Training-free approach that enables LLMs to support infinite context length extrapolation with finite attention scope.
- LongLLaDA arXiv GitHub — AAAI 2026
- First systematic investigation comparing long-context performance of diffusion LLMs and traditional auto-regressive LLMs.
- RoPE++ GitHub — ICLR 2026
- Beyond Real: imaginary extension of Rotary Position Embeddings for long-context LLMs.
- Sparse-dLLM GitHub
- Sparse diffusion-based large language models.
- FourierAttention arXiv
- Training-free framework that exploits the heterogeneous roles of transformer head dimensions.
- Thus Spake Long-Context LLM arXiv GitHub
- A survey on the lifecycle of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation.
- GAOKAO-MM GitHub — ACL 2024 Findings
- A Chinese human-level benchmark for multimodal model evaluation.
- HalluQA GitHub
- Dataset and evaluation script for evaluating hallucinations in Chinese large language models.
- Say-I-Don't-Know GitHub — ICML 2024
- Can AI assistants know what they don't know?
- LongSafety GitHub
- Safety evaluation for long-context LLMs.
- UnifiedToolHub GitHub
- A comprehensive project supporting LLM-based tool use — unifies dataset formats and provides training, annotation, and evaluation functionalities.
- ABC-Bench GitHub
- A benchmark for agentic backend coding — evaluates whether code agents can explore repos, edit code, configure environments, deploy services, and pass external end-to-end API tests.
For collaborations, internships, or general inquiries: openmoss@sii.edu.cn