OpenMOSS

OpenMOSS presents a collection of our research on Large Language Models and Multimodal Foundation Models, supported by Shanghai Innovation Institute (SII), Fudan University, and MOSI.AI.

📌 This page is a curated overview. For the complete and most recent list of repositories, visit the OpenMOSS organization.

Last updated: 2026-05-27

Projects

MOSS-LLM

Foundation language models and training infrastructure.

MOSS GitHub ⭐ 12.1k
- An open-source tool-augmented conversational language model from Fudan University — the founding project of the OpenMOSS series.
BandPO GitHub
- Probability-aware bounds for LLM reinforcement learning. Replaces canonical PPO/GRPO clipping with dynamic bounds to resolve exploration bottlenecks and prevent entropy collapse.
CoLLiE GitHub
- A library for collaborative training of large language models in an efficient way.

MOSS-VL

Multimodal models for visual understanding.

MOSS-VL GitHub
- Core multimodal model series within the OpenMOSS ecosystem, dedicated to visual understanding. Includes the XRoPE architecture and a fully open training stack.
AnyGPT GitHub ⭐ 880+
- Unified multimodal LLM with discrete sequence modeling.

MOSS-Audio

End-to-end speech, audio, and music foundation models.

MOSS-TTS-Nano GitHub ⭐ 3.2k
- A 0.1B-parameter open-source multilingual TTS model — runs in real time on CPU without a GPU, designed for local demos, web serving, and lightweight product integration.
MOSS-TTS GitHub ⭐ 1.9k
- Open-source TTS family covering stable long-form speech, multi-speaker dialogue, voice/character design, environmental sound effects, and real-time streaming TTS.
MOSS-TTSD GitHub ⭐ 1.3k
- Spoken dialogue generation model with expressive multi-speaker synthesis, long-context modeling, flexible speaker control, multilingual support, and zero-shot voice cloning.
- 🤗 HuggingFace: MOSS-TTSD-v0.5
MOSS-Audio GitHub
- Open-source foundation model for unified audio understanding — speech, sound, music, captioning, QA, and reasoning in real-world scenarios.
MOSS-Audio-Tokenizer GitHub
- Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of audio, supports streaming and variable bitrates, delivers SOTA reconstruction.
MOSS-Speech GitHub
- A true speech-to-speech large language model without text guidance.
MOSS-Music GitHub
- Music understanding model for captioning, lyrics ASR, structural analysis, chord/key/tempo reasoning, and long-form musical QA.
SpeechGPT-2.0-preview GitHub
- GPT-4o-level, real-time spoken dialogue system.

MOSS-Video

Video understanding and synchronized video–audio generation.

MOVA GitHub ⭐ 1k
- Towards scalable and synchronized video–audio generation.
MOSS-Video-Preview GitHub
- A real-time video understanding foundation model built on Llama-3.2-Vision, with comprehensively extended video processing and multimodal reasoning capabilities.

MOSS-Robot

Embodied AI: humanoid control, robotic manipulation, and embodied planning.

FRoM-W1 GitHub
- Towards general humanoid whole-body control with language instructions (arXiv 2026). Supports Unitree H1/G1, FFTAI humanoid robots.
RoboOmni GitHub
- Proactive robot manipulation in omni-modal context.
Embodied-Planner-R1 arXiv GitHub
- A reinforcement learning framework that enables LLMs to acquire embodied planning capabilities through autonomous exploration with sparse rewards.
RoboJuDo GitHub
- Deployment framework for the FRoM-W1 humanoid project.
VehicleWorld GitHub
- First comprehensive multi-device environment for intelligent vehicle interaction, modeling complex interconnected systems in modern cockpits.

MOSS-Aiology

Mechanistic interpretability of large language models.

Llamascopium GitHub (formerly Language-Model-SAEs)
- A performant, fully-distributed framework for training, analyzing, and visualizing Sparse Autoencoders (SAEs) and frontier variants, empowering scalable and systematic mechanistic interpretability research.
- 🤗 HuggingFace: Llama Scope
- 🌐 Visualization: Llama Scope on Neuronpedia
Lorsa GitHub
- Low-rank sparse attention for interpretability.

Research

Embodied-AI

The Embodied AI Team empowers large models to execute real-world tasks, aiming to automate tedious chores and unlock superhuman intelligence through environmental interaction. We believe true AI emerges from engaging with the physical world.

VLABench arXiv GitHub — ICCV 2025
- The first large-scale robot manipulation benchmark designed to fairly evaluate the multi-dimensional ability of general-purpose Vision-Language-Action models.
Dual Preference Optimization for Embodied Task Planning arXiv GitHub — ACL 2025
- A unified learning framework that empowers embodied agents with stronger world modeling and embodied planning ability via dual preference optimization.
World-Aware-Planning arXiv GitHub
- World-aware narrative enhancement bridging high-level task instructions and nuanced real-world environment details.
Embodied-Planner-R1 arXiv GitHub
- RL framework enabling LLMs to acquire embodied planning capabilities through autonomous exploration with sparse rewards.
Awesome-WAM GitHub ⭐ 560+
- A curated, continuously updated reading list, paper blogs, and resources for World Action Models in embodied AI.

NewArch

The SII-OpenMOSS New Architecture Team explores new architectures and paradigms of LLMs, particularly from the perspective of long-context capability and efficiency.

ReAttention arXiv GitHub — ICLR 2025
- Training-free approach that enables LLMs to support infinite context length extrapolation with finite attention scope.
LongLLaDA arXiv GitHub — AAAI 2026
- First systematic investigation comparing long-context performance of diffusion LLMs and traditional auto-regressive LLMs.
RoPE++ GitHub — ICLR 2026
- Beyond Real: imaginary extension of Rotary Position Embeddings for long-context LLMs.
Sparse-dLLM GitHub
- Sparse diffusion-based large language models.
FourierAttention arXiv
- Training-free framework that exploits the heterogeneous roles of transformer head dimensions.
Thus Spake Long-Context LLM arXiv GitHub
- A survey on the lifecycle of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation.

Multimodal Evaluation

GAOKAO-MM GitHub — ACL 2024 Findings
- A Chinese human-level benchmark for multimodal model evaluation.

Alignment & Safety

HalluQA GitHub
- Dataset and evaluation script for evaluating hallucinations in Chinese large language models.
Say-I-Don't-Know GitHub — ICML 2024
- Can AI assistants know what they don't know?
LongSafety GitHub
- Safety evaluation for long-context LLMs.

Tool Use & Agents

UnifiedToolHub GitHub
- A comprehensive project supporting LLM-based tool use — unifies dataset formats and provides training, annotation, and evaluation functionalities.
ABC-Bench GitHub
- A benchmark for agentic backend coding — evaluates whether code agents can explore repos, edit code, configure environments, deploy services, and pass external end-to-end API tests.

Contact

For collaborations, internships, or general inquiries: openmoss@sii.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenMOSS

Projects

MOSS-LLM

MOSS-VL

MOSS-Audio

MOSS-Video

MOSS-Robot

MOSS-Aiology

Research

Embodied-AI

NewArch

Multimodal Evaluation

Alignment & Safety

Tool Use & Agents

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

OpenMOSS

Projects

MOSS-LLM

MOSS-VL

MOSS-Audio

MOSS-Video

MOSS-Robot

MOSS-Aiology

Research

Embodied-AI

NewArch

Multimodal Evaluation

Alignment & Safety

Tool Use & Agents

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages