Skip to content

sii-research/OpenMOSS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 

Repository files navigation

OpenMOSS

OpenMOSS presents a collection of our research on Large Language Models and Multimodal Foundation Models, supported by Shanghai Innovation Institute (SII), Fudan University, and MOSI.AI.

📌 This page is a curated overview. For the complete and most recent list of repositories, visit the OpenMOSS organization.

Last updated: 2026-05-27


Projects

MOSS-LLM

Foundation language models and training infrastructure.

  • MOSS GitHub ⭐ 12.1k
    • An open-source tool-augmented conversational language model from Fudan University — the founding project of the OpenMOSS series.
  • BandPO GitHub
    • Probability-aware bounds for LLM reinforcement learning. Replaces canonical PPO/GRPO clipping with dynamic bounds to resolve exploration bottlenecks and prevent entropy collapse.
  • CoLLiE GitHub
    • A library for collaborative training of large language models in an efficient way.

MOSS-VL

Multimodal models for visual understanding.

  • MOSS-VL GitHub
    • Core multimodal model series within the OpenMOSS ecosystem, dedicated to visual understanding. Includes the XRoPE architecture and a fully open training stack.
  • AnyGPT GitHub ⭐ 880+
    • Unified multimodal LLM with discrete sequence modeling.

MOSS-Audio

End-to-end speech, audio, and music foundation models.

  • MOSS-TTS-Nano GitHub ⭐ 3.2k
    • A 0.1B-parameter open-source multilingual TTS model — runs in real time on CPU without a GPU, designed for local demos, web serving, and lightweight product integration.
  • MOSS-TTS GitHub ⭐ 1.9k
    • Open-source TTS family covering stable long-form speech, multi-speaker dialogue, voice/character design, environmental sound effects, and real-time streaming TTS.
  • MOSS-TTSD GitHub ⭐ 1.3k
    • Spoken dialogue generation model with expressive multi-speaker synthesis, long-context modeling, flexible speaker control, multilingual support, and zero-shot voice cloning.
    • 🤗 HuggingFace: MOSS-TTSD-v0.5
  • MOSS-Audio GitHub
    • Open-source foundation model for unified audio understanding — speech, sound, music, captioning, QA, and reasoning in real-world scenarios.
  • MOSS-Audio-Tokenizer GitHub
    • Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of audio, supports streaming and variable bitrates, delivers SOTA reconstruction.
  • MOSS-Speech GitHub
    • A true speech-to-speech large language model without text guidance.
  • MOSS-Music GitHub
    • Music understanding model for captioning, lyrics ASR, structural analysis, chord/key/tempo reasoning, and long-form musical QA.
  • SpeechGPT-2.0-preview GitHub
    • GPT-4o-level, real-time spoken dialogue system.

MOSS-Video

Video understanding and synchronized video–audio generation.

  • MOVA GitHub ⭐ 1k
    • Towards scalable and synchronized video–audio generation.
  • MOSS-Video-Preview GitHub
    • A real-time video understanding foundation model built on Llama-3.2-Vision, with comprehensively extended video processing and multimodal reasoning capabilities.

MOSS-Robot

Embodied AI: humanoid control, robotic manipulation, and embodied planning.

  • FRoM-W1 GitHub
    • Towards general humanoid whole-body control with language instructions (arXiv 2026). Supports Unitree H1/G1, FFTAI humanoid robots.
  • RoboOmni GitHub
    • Proactive robot manipulation in omni-modal context.
  • Embodied-Planner-R1 arXiv GitHub
    • A reinforcement learning framework that enables LLMs to acquire embodied planning capabilities through autonomous exploration with sparse rewards.
  • RoboJuDo GitHub
    • Deployment framework for the FRoM-W1 humanoid project.
  • VehicleWorld GitHub
    • First comprehensive multi-device environment for intelligent vehicle interaction, modeling complex interconnected systems in modern cockpits.

MOSS-Aiology

Mechanistic interpretability of large language models.

  • Llamascopium GitHub (formerly Language-Model-SAEs)
    • A performant, fully-distributed framework for training, analyzing, and visualizing Sparse Autoencoders (SAEs) and frontier variants, empowering scalable and systematic mechanistic interpretability research.
    • 🤗 HuggingFace: Llama Scope
    • 🌐 Visualization: Llama Scope on Neuronpedia
  • Lorsa GitHub
    • Low-rank sparse attention for interpretability.

Research

Embodied-AI

The Embodied AI Team empowers large models to execute real-world tasks, aiming to automate tedious chores and unlock superhuman intelligence through environmental interaction. We believe true AI emerges from engaging with the physical world.

  • VLABench arXiv GitHub — ICCV 2025
    • The first large-scale robot manipulation benchmark designed to fairly evaluate the multi-dimensional ability of general-purpose Vision-Language-Action models.
  • Dual Preference Optimization for Embodied Task Planning arXiv GitHub — ACL 2025
    • A unified learning framework that empowers embodied agents with stronger world modeling and embodied planning ability via dual preference optimization.
  • World-Aware-Planning arXiv GitHub
    • World-aware narrative enhancement bridging high-level task instructions and nuanced real-world environment details.
  • Embodied-Planner-R1 arXiv GitHub
    • RL framework enabling LLMs to acquire embodied planning capabilities through autonomous exploration with sparse rewards.
  • Awesome-WAM GitHub ⭐ 560+
    • A curated, continuously updated reading list, paper blogs, and resources for World Action Models in embodied AI.

NewArch

The SII-OpenMOSS New Architecture Team explores new architectures and paradigms of LLMs, particularly from the perspective of long-context capability and efficiency.

  • ReAttention arXiv GitHub — ICLR 2025
    • Training-free approach that enables LLMs to support infinite context length extrapolation with finite attention scope.
  • LongLLaDA arXiv GitHub — AAAI 2026
    • First systematic investigation comparing long-context performance of diffusion LLMs and traditional auto-regressive LLMs.
  • RoPE++ GitHub — ICLR 2026
    • Beyond Real: imaginary extension of Rotary Position Embeddings for long-context LLMs.
  • Sparse-dLLM GitHub
    • Sparse diffusion-based large language models.
  • FourierAttention arXiv
    • Training-free framework that exploits the heterogeneous roles of transformer head dimensions.
  • Thus Spake Long-Context LLM arXiv GitHub
    • A survey on the lifecycle of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation.

Multimodal Evaluation

  • GAOKAO-MM GitHub — ACL 2024 Findings
    • A Chinese human-level benchmark for multimodal model evaluation.

Alignment & Safety

  • HalluQA GitHub
    • Dataset and evaluation script for evaluating hallucinations in Chinese large language models.
  • Say-I-Don't-Know GitHub — ICML 2024
    • Can AI assistants know what they don't know?
  • LongSafety GitHub
    • Safety evaluation for long-context LLMs.

Tool Use & Agents

  • UnifiedToolHub GitHub
    • A comprehensive project supporting LLM-based tool use — unifies dataset formats and provides training, annotation, and evaluation functionalities.
  • ABC-Bench GitHub
    • A benchmark for agentic backend coding — evaluates whether code agents can explore repos, edit code, configure environments, deploy services, and pass external end-to-end API tests.

Contact

For collaborations, internships, or general inquiries: openmoss@sii.edu.cn

About

OpenMOSS presents a collection of our research on LLMs, supported by SII, Fudan and Mosi.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors