LegoNav

LegoNav is a modular visual language navigation framework — designed like Lego bricks: each module (S1, S2) is independently replaceable and freely composable.

S2 (High-level reasoning): Any vision-language model — local GPU or cloud API.
S1 (Low-level motion control): Any navigation policy — NavDP, GNM, ViNT, NoMaD, DD-PPO, iPlanner, ViPlanner, …

Plug in any combination, run on Jetson edge hardware.

🏗️ System Architecture

Modular Design

LegoNav decouples high-level "where to go" (S2) from low-level "how to move" (S1).

graph TB
    subgraph S2["S2 — Visual Language Reasoning"]
        direction TB
        VLM["VLM Backend\n─────────────\nlocal: Qwen2.5-VL / Qwen3-VL\napi: GPT-4o / Gemini / Kimi / …"]
        S2Srv["S2 HTTP Server :8890\nPOST /s2_step"]
        VLM --> S2Srv
    end

    subgraph Jetson["Jetson Edge — S1 Navigation Policy"]
        direction TB
        CAM["RGB-D Camera"]
        ROS["ROS2 Node\nros_client.py"]
        PIPE["LegoNavPipeline\npipeline.py"]
        S1["S1 Client\nNavDP / GNM / ViNT / NoMaD\nDD-PPO / iPlanner / ViPlanner"]
        CTRL["MPC + PID Controller"]
        ROBOT["Mobile Robot"]

        CAM --> ROS
        ROS --> PIPE
        PIPE --> S1
        S1 --> CTRL
        CTRL -->|"v, ω"| ROBOT
    end

    USER["Language Instruction\n'Go to the black chair'"]
    USER --> ROS
    PIPE <-->|"HTTP REST (LAN)"| S2Srv

Navigation Loop

LegoNav uses World-coordinate goal tracking to ensure geometric consistency without constant VLM re-queries.

sequenceDiagram
    participant Cam as RGB-D Camera
    participant ROS as ROS2 Node (+ Odometry)
    participant S2 as S2 Server (VLM)
    participant Pipe as LegoNavPipeline
    participant S1 as S1 Policy
    participant Robot as Robot Actuator

    ROS->>S2: POST /s2_step {image, instruction}
    S2-->>ROS: Structured task list (JSON) — pixel goal / rotation / stop

    loop Per Frame
        Cam->>ROS: RGB-D frame + odom [x, y, yaw]
        ROS->>Pipe: step(rgb, depth, odom)

        alt First step of pixel_point task
            Pipe->>Pipe: pixel + depth + odom → world_target [wx, wy, wz]
            Note over Pipe: 3D anchor locked once, never re-identified
        end

        Pipe->>Pipe: world_target + odom → camera_goal [x_fwd, y_left, z_up]
        Pipe->>S1: pointgoal_step(camera_goal, rgb, depth)
        S1-->>Pipe: trajectory (B, T, 3)
        Pipe-->>ROS: {mode, trajectory, camera_goal, …}
        ROS->>Robot: MPC / PID execution
    end

Deployment Modes

graph LR
    subgraph ModeA["Mode A: Local S1 (Recommended)"]
        J1["Jetson"] -- "local inference" --> N1["NavDP (--local_s1)"]
        J1 <-- "HTTP :8890" --> G1["GPU Server (S2)"]
    end

    subgraph ModeB["Mode B: Remote S1"]
        J2["Jetson"] <-- "HTTP :8901" --> N2["S1 Policy Server"]
        J2 <-- "HTTP :8890" --> G2["GPU Server (S2)"]
    end

    subgraph ModeC["Mode C: S2 API (No GPU Required)"]
        PC["Any Machine"] -- "runs S2 server" --> SRV["S2 Server :8890"]
        SRV <-- "HTTPS" --> API["Cloud VLM API\nOpenAI / Gemini / Kimi / Qwen"]
    end

📂 Project Structure

LegoNav/
├── legonav/
│   ├── server/       # S2: VLM HTTP server (port 8890)
│   ├── clients/      # S1: Policy clients (NavDP, GNM, ViNT, etc.)
│   ├── core/         # Orchestration (LegoNavPipeline)
│   ├── robot/        # ROS2 node & MPC/PID controllers
│   └── utils/        # Shared utilities
├── scripts/          # Launch scripts for Jetson & S2
├── tests/            # Connectivity & pipeline tests
└── requirements_*.txt # Environment-specific dependencies

💻 Requirements

Hardware

Component	Minimum	Recommended
S2 GPU (local)	16 GB VRAM	24+ GB VRAM
S1 Edge	Jetson Orin NX 8 GB	Jetson Orin NX 16 GB
Camera	Astra S (640×480)	Gemini 336L (1280×720)

Software

Python 3.10+, PyTorch 2.1+, ROS2 Humble, CUDA 11.8+.

🛠️ Installation

S2 Server (GPU Machine)

conda create -n legonav_s2 python=3.10 && conda activate legonav_s2
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -e . && pip install -r requirements_server.txt

S1 Edge (Jetson)

conda create -n legonav_s1 python=3.10 && conda activate legonav_s1
# Clone NavDP as a sibling: git clone https://github.com/InternRobotics/NavDP
# Install prebuilt torchvision wheel for aarch64 if available
pip install -e . && pip install -r requirements_jetson.txt
sudo apt install ros-humble-cv-bridge ros-humble-message-filters

⚡ Quick Start

1. Start S2 Server

# Local GPU (Qwen2.5-VL)
python -m legonav.server.s2_server --model_path /path/to/Qwen2.5-VL-7B-Instruct

# Cloud API (GPT-4o)
OPENAI_API_KEY=sk-xxx python -m legonav.server.s2_server --backend api --provider openai --model_path gpt-4o

2. Test S2 Connectivity

python tests/test_s2_client.py --host 127.0.0.1 --port 8890 --random --instruction "Go to the chair"

3. Pipeline Test (S2 Only)

python -m legonav.core.pipeline \
    --s2_host 127.0.0.1 --s2_port 8890 \
    --random --skip_s1 \
    --instruction "Turn left, go to the door"

4. Full Jetson Deployment

# Terminal 1: Robot base & Camera
ros2 launch wheeltec_robot base_node.launch.py
ros2 launch orbbec_camera gemini_336l.launch.py

# Terminal 2: LegoNav ROS2 Node
conda activate legonav_s1
python -m legonav.robot.ros_client \
    --instruction "Go to the black chair" \
    --s2_host 192.168.1.100 \
    --local_s1 --s1_checkpoint /path/to/navdp.ckpt --s1_half

🧩 S2 — Visual Language Model

Supported Backends

Local (--backend local): Load model weights on your GPU (Qwen2.5-VL, Qwen3-VL).
API (--backend api): Call external VLM via OpenAI-compatible API.

Provider	`--provider`	Example Models	Env Var
OpenAI	`openai`	`gpt-4o`, `gpt-4-turbo`	`OPENAI_API_KEY`
Google	`gemini`	`gemini-1.5-pro`, `gemini-2.0-flash`	`GEMINI_API_KEY`
Moonshot	`kimi`	`moonshot-v1-vision`	`MOONSHOT_API_KEY`
DashScope	`qwen`	`qwen-vl-max`	`DASHSCOPE_API_KEY`

🧱 S1 — Navigation Policy

All S1 clients inherit BaseS1Client, making them easily swappable in the LegoNavPipeline.

Supported Policies

Client	Base Model	Goal Support	Stop mechanism
`NavDPClient`	NavDP	Pixel, Point, Image, No-goal	Learned Critic
`GNMClient`	GNM	Image, No-goal	Distance
`ViNTClient`	ViNT	Image, No-goal	Distance
`NoMaDClient`	NoMaD	Image, No-goal	Distance
`DDPPOClient`	DD-PPO	Pixel, Point	Action=STOP
`ViPlannerClient`	ViPlanner	Pixel, Point	Distance

Pipeline Output Fields

`mode`	Key Fields
`"trajectory"`	`trajectory (1,T,3)`, `all_trajectory`, `values`, `target`, `camera_goal [x,y,z]`
`"rotate"`	`rotation_rad` (positive = CCW / left)
`"stop"`	—
`"error"`	`message`, `s2` (raw S2 response)

🏗️ Extending LegoNav

Dropping in a new S1 policy is as simple as implementing BaseS1Client:

from legonav.clients.base_client import BaseS1Client

class MyPolicyClient(BaseS1Client):
    algo_name = "my_policy"

    def pixelgoal_step(self, pixel_goals, rgb_images, depth_images):
        # Your model logic here
        return self._wrap_single_trajectory(traj)

# Usage in pipeline
pipeline = LegoNavPipeline(s2_host="...", s1_client=MyPolicyClient(...))

Helper methods available in BaseS1Client:

_wrap_single_trajectory(traj): Wraps (B,T,3) into the standard response tuple.
_waypoints_to_trajectory(wps, T): Converts waypoints to trajectory.

📷 Camera Intrinsics

Camera	Resolution	Constant
Gemini 336L (default)	1280×720	`GEMINI_336L_INTRINSIC`
Astra S	640×480	`ASTRA_S_INTRINSIC`

🔍 Troubleshooting

ModuleNotFoundError: Run pip install -e . from the repository root.
NAVDP_ROOT error: navdp_agent.py expects NavDP/ as a sibling directory.
Jetson OOM: Use --s1_half for FP16 inference.
S2 503 error: The model is still loading; wait for /health to return "ok".

🙏 Acknowledgements

Special thanks to the authors of NavDP, Qwen, VisualNav Transformer, and Habitat for their foundational models and frameworks.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.claude		.claude
docs		docs
legonav		legonav
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements_jetson.txt		requirements_jetson.txt
requirements_server.txt		requirements_server.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LegoNav

📑 Table of Contents

🏗️ System Architecture

Modular Design

Navigation Loop

Deployment Modes

📂 Project Structure

💻 Requirements

Hardware

Software

🛠️ Installation

S2 Server (GPU Machine)

S1 Edge (Jetson)

⚡ Quick Start

1. Start S2 Server

2. Test S2 Connectivity

3. Pipeline Test (S2 Only)

4. Full Jetson Deployment

🧩 S2 — Visual Language Model

Supported Backends

🧱 S1 — Navigation Policy

Supported Policies

Pipeline Output Fields

🏗️ Extending LegoNav

📷 Camera Intrinsics

🔍 Troubleshooting

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LegoNav

📑 Table of Contents

🏗️ System Architecture

Modular Design

Navigation Loop

Deployment Modes

📂 Project Structure

💻 Requirements

Hardware

Software

🛠️ Installation

S2 Server (GPU Machine)

S1 Edge (Jetson)

⚡ Quick Start

1. Start S2 Server

2. Test S2 Connectivity

3. Pipeline Test (S2 Only)

4. Full Jetson Deployment

🧩 S2 — Visual Language Model

Supported Backends

🧱 S1 — Navigation Policy

Supported Policies

Pipeline Output Fields

🏗️ Extending LegoNav

📷 Camera Intrinsics

🔍 Troubleshooting

🙏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages