Skip to content

TingdeLiu/LegoNav

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LegoNav

LegoNav Banner

LegoNav is a modular visual language navigation framework — designed like Lego bricks: each module (S1, S2) is independently replaceable and freely composable.

  • S2 (High-level reasoning): Any vision-language model — local GPU or cloud API.
  • S1 (Low-level motion control): Any navigation policy — NavDP, GNM, ViNT, NoMaD, DD-PPO, iPlanner, ViPlanner, …

Plug in any combination, run on Jetson edge hardware.


📑 Table of Contents


🏗️ System Architecture

Modular Design

LegoNav decouples high-level "where to go" (S2) from low-level "how to move" (S1).

graph TB
    subgraph S2["S2 — Visual Language Reasoning"]
        direction TB
        VLM["VLM Backend\n─────────────\nlocal: Qwen2.5-VL / Qwen3-VL\napi: GPT-4o / Gemini / Kimi / …"]
        S2Srv["S2 HTTP Server :8890\nPOST /s2_step"]
        VLM --> S2Srv
    end

    subgraph Jetson["Jetson Edge — S1 Navigation Policy"]
        direction TB
        CAM["RGB-D Camera"]
        ROS["ROS2 Node\nros_client.py"]
        PIPE["LegoNavPipeline\npipeline.py"]
        S1["S1 Client\nNavDP / GNM / ViNT / NoMaD\nDD-PPO / iPlanner / ViPlanner"]
        CTRL["MPC + PID Controller"]
        ROBOT["Mobile Robot"]

        CAM --> ROS
        ROS --> PIPE
        PIPE --> S1
        S1 --> CTRL
        CTRL -->|"v, ω"| ROBOT
    end

    USER["Language Instruction\n'Go to the black chair'"]
    USER --> ROS
    PIPE <-->|"HTTP REST (LAN)"| S2Srv
Loading

Navigation Loop

LegoNav uses World-coordinate goal tracking to ensure geometric consistency without constant VLM re-queries.

sequenceDiagram
    participant Cam as RGB-D Camera
    participant ROS as ROS2 Node (+ Odometry)
    participant S2 as S2 Server (VLM)
    participant Pipe as LegoNavPipeline
    participant S1 as S1 Policy
    participant Robot as Robot Actuator

    ROS->>S2: POST /s2_step {image, instruction}
    S2-->>ROS: Structured task list (JSON) — pixel goal / rotation / stop

    loop Per Frame
        Cam->>ROS: RGB-D frame + odom [x, y, yaw]
        ROS->>Pipe: step(rgb, depth, odom)

        alt First step of pixel_point task
            Pipe->>Pipe: pixel + depth + odom → world_target [wx, wy, wz]
            Note over Pipe: 3D anchor locked once, never re-identified
        end

        Pipe->>Pipe: world_target + odom → camera_goal [x_fwd, y_left, z_up]
        Pipe->>S1: pointgoal_step(camera_goal, rgb, depth)
        S1-->>Pipe: trajectory (B, T, 3)
        Pipe-->>ROS: {mode, trajectory, camera_goal, …}
        ROS->>Robot: MPC / PID execution
    end
Loading

Deployment Modes

graph LR
    subgraph ModeA["Mode A: Local S1 (Recommended)"]
        J1["Jetson"] -- "local inference" --> N1["NavDP (--local_s1)"]
        J1 <-- "HTTP :8890" --> G1["GPU Server (S2)"]
    end

    subgraph ModeB["Mode B: Remote S1"]
        J2["Jetson"] <-- "HTTP :8901" --> N2["S1 Policy Server"]
        J2 <-- "HTTP :8890" --> G2["GPU Server (S2)"]
    end

    subgraph ModeC["Mode C: S2 API (No GPU Required)"]
        PC["Any Machine"] -- "runs S2 server" --> SRV["S2 Server :8890"]
        SRV <-- "HTTPS" --> API["Cloud VLM API\nOpenAI / Gemini / Kimi / Qwen"]
    end
Loading

📂 Project Structure

LegoNav/
├── legonav/
│   ├── server/       # S2: VLM HTTP server (port 8890)
│   ├── clients/      # S1: Policy clients (NavDP, GNM, ViNT, etc.)
│   ├── core/         # Orchestration (LegoNavPipeline)
│   ├── robot/        # ROS2 node & MPC/PID controllers
│   └── utils/        # Shared utilities
├── scripts/          # Launch scripts for Jetson & S2
├── tests/            # Connectivity & pipeline tests
└── requirements_*.txt # Environment-specific dependencies

💻 Requirements

Hardware

Component Minimum Recommended
S2 GPU (local) 16 GB VRAM 24+ GB VRAM
S1 Edge Jetson Orin NX 8 GB Jetson Orin NX 16 GB
Camera Astra S (640×480) Gemini 336L (1280×720)

Software

  • Python 3.10+, PyTorch 2.1+, ROS2 Humble, CUDA 11.8+.

🛠️ Installation

S2 Server (GPU Machine)

conda create -n legonav_s2 python=3.10 && conda activate legonav_s2
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -e . && pip install -r requirements_server.txt

S1 Edge (Jetson)

conda create -n legonav_s1 python=3.10 && conda activate legonav_s1
# Clone NavDP as a sibling: git clone https://github.com/InternRobotics/NavDP
# Install prebuilt torchvision wheel for aarch64 if available
pip install -e . && pip install -r requirements_jetson.txt
sudo apt install ros-humble-cv-bridge ros-humble-message-filters

⚡ Quick Start

1. Start S2 Server

# Local GPU (Qwen2.5-VL)
python -m legonav.server.s2_server --model_path /path/to/Qwen2.5-VL-7B-Instruct

# Cloud API (GPT-4o)
OPENAI_API_KEY=sk-xxx python -m legonav.server.s2_server --backend api --provider openai --model_path gpt-4o

2. Test S2 Connectivity

python tests/test_s2_client.py --host 127.0.0.1 --port 8890 --random --instruction "Go to the chair"

3. Pipeline Test (S2 Only)

python -m legonav.core.pipeline \
    --s2_host 127.0.0.1 --s2_port 8890 \
    --random --skip_s1 \
    --instruction "Turn left, go to the door"

4. Full Jetson Deployment

# Terminal 1: Robot base & Camera
ros2 launch wheeltec_robot base_node.launch.py
ros2 launch orbbec_camera gemini_336l.launch.py

# Terminal 2: LegoNav ROS2 Node
conda activate legonav_s1
python -m legonav.robot.ros_client \
    --instruction "Go to the black chair" \
    --s2_host 192.168.1.100 \
    --local_s1 --s1_checkpoint /path/to/navdp.ckpt --s1_half

🧩 S2 — Visual Language Model

Supported Backends

  • Local (--backend local): Load model weights on your GPU (Qwen2.5-VL, Qwen3-VL).
  • API (--backend api): Call external VLM via OpenAI-compatible API.
Provider --provider Example Models Env Var
OpenAI openai gpt-4o, gpt-4-turbo OPENAI_API_KEY
Google gemini gemini-1.5-pro, gemini-2.0-flash GEMINI_API_KEY
Moonshot kimi moonshot-v1-vision MOONSHOT_API_KEY
DashScope qwen qwen-vl-max DASHSCOPE_API_KEY

🧱 S1 — Navigation Policy

All S1 clients inherit BaseS1Client, making them easily swappable in the LegoNavPipeline.

Supported Policies

Client Base Model Goal Support Stop mechanism
NavDPClient NavDP Pixel, Point, Image, No-goal Learned Critic
GNMClient GNM Image, No-goal Distance
ViNTClient ViNT Image, No-goal Distance
NoMaDClient NoMaD Image, No-goal Distance
DDPPOClient DD-PPO Pixel, Point Action=STOP
ViPlannerClient ViPlanner Pixel, Point Distance

Pipeline Output Fields

mode Key Fields
"trajectory" trajectory (1,T,3), all_trajectory, values, target, camera_goal [x,y,z]
"rotate" rotation_rad (positive = CCW / left)
"stop"
"error" message, s2 (raw S2 response)

🏗️ Extending LegoNav

Dropping in a new S1 policy is as simple as implementing BaseS1Client:

from legonav.clients.base_client import BaseS1Client

class MyPolicyClient(BaseS1Client):
    algo_name = "my_policy"

    def pixelgoal_step(self, pixel_goals, rgb_images, depth_images):
        # Your model logic here
        return self._wrap_single_trajectory(traj)

# Usage in pipeline
pipeline = LegoNavPipeline(s2_host="...", s1_client=MyPolicyClient(...))

Helper methods available in BaseS1Client:

  • _wrap_single_trajectory(traj): Wraps (B,T,3) into the standard response tuple.
  • _waypoints_to_trajectory(wps, T): Converts waypoints to trajectory.

📷 Camera Intrinsics

Camera Resolution Constant
Gemini 336L (default) 1280×720 GEMINI_336L_INTRINSIC
Astra S 640×480 ASTRA_S_INTRINSIC

🔍 Troubleshooting

  • ModuleNotFoundError: Run pip install -e . from the repository root.
  • NAVDP_ROOT error: navdp_agent.py expects NavDP/ as a sibling directory.
  • Jetson OOM: Use --s1_half for FP16 inference.
  • S2 503 error: The model is still loading; wait for /health to return "ok".

🙏 Acknowledgements

Special thanks to the authors of NavDP, Qwen, VisualNav Transformer, and Habitat for their foundational models and frameworks.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors