embodied.cpp

Embodied.cpp is an inference runtime for embodied AI models — Vision-Language-Action (VLA) and World-Action Models (WAM) that let robots perceive and act in the real world. It runs these models efficiently on heterogeneous hardware (CPU / CUDA GPU / NPU) using GGUF weights, and ships ready-to-use servers and evaluation clients.

Supported Models

VLA Models

VLA models take sensor images and language instructions as input and output robot action commands directly.

Model	Status	What it does
pi0.5	✅	PaliGemma-based policy — great starting point for experimentation
HY-VLA	✅	Hunyuan dual-tower vision-language model with action head; supports RoboTwin dual-arm evaluation
StarVLA	🚧	Modular backbone with swappable action heads (coming soon)
OpenVLA	🚧	7B open VLA with Llama 2 + DINOv2/SigLIP features (coming soon)
Qwen-VLA	🚧	Qwen3.5-4B backbone with DiT flow-matching action decoder (coming soon)

World-Action Models

World-Action models predict future video or latent trajectories as part of planning actions — they reason about "what happens next" before deciding what to do.

Model	Status	What it does
LingBot-VA	✅	Video-action model with VAE bridge — evaluated on LIBERO task suites
DreamZero	🚧	14B video-diffusion world model for zero-shot policies (coming soon)
UnifoLM-WMA-0	🚧	Multi-embodiment robot learning with world-model action head (coming soon)
Being-H0.7	🚧	Latent world-action model with future-aware reasoning (coming soon)
FastWAM	🚧	Fast WAM that skips test-time future generation for speed (coming soon)

Quick Start

1. Prepare dependencies

# Clone the repo and fetch third-party code
git clone <repo-url> && cd embodied.cpp
./patches/init_third_party.sh

2. Build

CPU-only:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --target vla-server lingbot-world-server -j$(nproc)

CUDA GPU:

cmake -S . -B build \
  -DCMAKE_BUILD_TYPE=Release \
  -DGGML_CUDA=ON \
  -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc \
  -DCMAKE_CUDA_ARCHITECTURES=<your-arch> \
  -DProtobuf_PROTOC_EXECUTABLE=/usr/bin/protoc
cmake --build build --target lingbot-world-server -j$(nproc)

3. Start a server

# VLA server (pi0.5, HY-VLA)
./build/vla-server --model <path-to-gguf> (<path-to-mmproj>)

# LingBot world-action server
./build/lingbot-world-server --model <path-to-gguf>

4. Evaluate in simulation (example: LIBERO with LingBot-VA)

# Install the LIBERO runtime once
bash eval/sim/libero/setup_libero.sh

# Run a test episode
eval/sim/libero/libero_uv/.venv/bin/python eval/client/run_sim_client_direct.py \
  --arch lingbot_va \
  --libero-suite object \
  --task-id 0 \
  --n-episodes 1 \
  --tokenizer /path/to/lingbot-va-tokenizer \
  --vla-addr tcp://localhost:5555

Run a Server

Executable	What it serves
`./build/vla-server`	VLA models — takes observations + text, outputs robot action chunks
`./build/lingbot-world-server`	LingBot-VA world-action model — video-conditioned future-aware planning
`./build/hy-vla-direct-debug`	Debug HY-VLA in-process (no server)

Run with --help to see all model, checkpoint, and quantization options.

Evaluate in Simulation

LIBERO

LIBERO tests robotic manipulation skills on four task suites: spatial, object, goal, and 10. A fifth suite long (90 tasks) is also available.

--libero-suite spatial  → libero_spatial
--libero-suite object   → libero_object
--libero-suite goal     → libero_goal
--libero-suite 10       → libero_10
--libero-suite long     → libero_90

Use --task-id 0..9 (or 0..89 for long) to pick individual tasks.

RoboTwin

RoboTWIN is a dual-arm robot benchmark with real-world-style manipulation tasks. Run HY-VLA natively in C++:

bash eval/sim/robotwin/setup_robotwin.sh   # one-time setup

GGML_CUDA_DISABLE_GRAPHS=1 \
eval/sim/robotwin/robotwin_uv/.venv/bin/python \
  eval/client/run_robotwin_native_hy_vla.py \
  --model <path-to-gguf> \
  --task-name place_empty_cup \
  --episodes 1

See eval/sim/robotwin/README.md for detailed setup modes and troubleshooting.

Convert Your Own Model

GGUF conversion scripts are in scripts/:

Script	Converts
`convert_pi05_to_gguf.py`	pi0.5 model weights
`convert_pi05_mmproj_to_gguf.py`	pi0.5 multimodal projector
`convert_hy_vla_to_gguf.py`	HY-VLA combined vision+action
`convert_lingbot_va_to_gguf.py`	LingBot-VA transformer + companion GGUFs

Quantization helpers:

Script	Quantizes
`quantize_hy_vla_gguf.py`	HY-VLA models
`quantize_lingbot_wan_gguf.py`	LingBot-VA models

Project Structure

What lives where, in plain language:

Directory	What it contains
`models/`	C++ model implementations (pi0.5, HY-VLA, LingBot-VA)
`runtime/`	Model registry, architecture detection, shared utilities
`adapter/`	I/O boundary — translates sensor/simulator data into typed inputs the models understand
`serving/`	Server code (ZeroMQ/Protobuf) for VLA and LingBot APIs
`kernels/`	Custom CUDA kernels (used when building with GPU support)
`scripts/`	GGUF conversion, quantization, and evaluation helpers
`tools/`	Local debug utilities
`patches/`	Third-party code patches applied during setup
`eval/`	Evaluation clients and simulation setups (LIBERO, RoboTwin)

License

This project is released under the Apache License 2.0. Third-party dependencies, model checkpoints, datasets, and upstream reference implementations are distributed under their own licenses.

Acknowledgements

Supported models:

Foundational projects this build depends on:

llama.cpp (LLM inference engine)
vla.cpp (unified VLA runtime)
LIBERO (manipulation benchmark)
RoboTwin (dual-arm robot benchmark)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

embodied.cpp

Table of Contents

Supported Models

VLA Models

World-Action Models

Quick Start

1. Prepare dependencies

2. Build

3. Start a server

4. Evaluate in simulation (example: LIBERO with LingBot-VA)

Run a Server

Evaluate in Simulation

LIBERO

RoboTwin

Convert Your Own Model

Project Structure

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
adapter		adapter
assets		assets
eval		eval
kernels/lingbot		kernels/lingbot
models		models
patches		patches
runtime		runtime
scripts		scripts
serving		serving
tools		tools
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE.md		LICENSE.md
README.md		README.md
SECURITY.md		SECURITY.md

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

embodied.cpp

Table of Contents

Supported Models

VLA Models

World-Action Models

Quick Start

1. Prepare dependencies

2. Build

3. Start a server

4. Evaluate in simulation (example: LIBERO with LingBot-VA)

Run a Server

Evaluate in Simulation

LIBERO

RoboTwin

Convert Your Own Model

Project Structure

License

Acknowledgements

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages