feat(backends): add NanoDeploy backend with dlslime-ctrl discovery by JimyMa · Pull Request #15 · DeepLink-org/DLRouter

JimyMa · 2026-06-02T16:55:11Z

Summary

Integrate NanoDeploy's single-process OpenAI server (nanodeploy serve) as a first-class DLRouter backend (--backend nanodeploy).
Add BackendType.NANODEPLOY and the nanoctrl service-discovery mode that polls a dlslime-ctrl entity registry for nanodeploy nodes and reconciles their HTTP endpoints into the NodeManager (served-model-name, model-path, and basename aliases).
Auto-discovery activates in hybrid serving when --ctrl_address is set; manual POST /nodes/add still works otherwise.
Docs: README updated with a supported-backend row, a dedicated NanoDeploy + dlslime-ctrl quick start, and a request example.

Running NanoDeploy with DLRouter

1. Start the dlslime-ctrl control plane (only needed for auto-discovery)

dlslime-ctrl server --redis-url redis://127.0.0.1:6379

2. Start the NanoDeploy OpenAI server

# inside the nanodeploy conda env
nanodeploy serve /path/to/Qwen3-0.6B \
  --host 0.0.0.0 --port 8100 \
  --served-model-name Qwen3-0.6B \
  --ctrl_address 127.0.0.1:4479

Notes:

The positional argument is the model path (you can also use --model /path/to/...).
--served-model-name is the public model id; if omitted it defaults to the basename of the model path.
--ctrl_address enables self-registration + heartbeat to dlslime-ctrl. Omit it to run as a standalone HTTP server.
All other Config fields (--ray_address, --tp, etc.) share the same names/semantics as engine_server.py. --host/--port bind the uvicorn HTTP API.

3. Call NanoDeploy directly (bypass DLRouter, verify the server itself)

curl http://localhost:8100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen3-0.6B","messages":[{"role":"user","content":"Hello"}]}'

Other endpoints:

curl http://localhost:8100/health        # health check
curl http://localhost:8100/v1/models     # served-name / path / basename are all aliases

# /v1/completions (text completion)
curl http://localhost:8100/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen3-0.6B","prompt":"Once upon a time","max_tokens":64}'

# streaming
curl -N http://localhost:8100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen3-0.6B","messages":[{"role":"user","content":"Hello"}],"stream":true}'

4. Call through DLRouter (end-to-end)

# DLRouter auto-discovers NanoDeploy nodes from dlslime-ctrl
python -m dlrouter \
  --backend nanodeploy \
  --serving_strategy hybrid \
  --ctrl_address 127.0.0.1:4479

# Request hits port 8000 (DLRouter), which forwards to 8100 (NanoDeploy)
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen3-0.6B","messages":[{"role":"user","content":"Hello"}]}'

Without dlslime-ctrl, drop --ctrl_address on the DLRouter side and register the node manually:

curl -X POST http://localhost:8000/nodes/add \
  -H "Content-Type: application/json" \
  -d '{"url":"http://127.0.0.1:8100"}'

Test plan

pytest tests/core/test_nanoctrl_discovery.py tests/backends/test_backend_contracts.py
End-to-end: nanodeploy serve + dlslime-ctrl + python -m dlrouter --backend nanodeploy --serving_strategy hybrid --ctrl_address 127.0.0.1:4479, then a /v1/chat/completions curl (verified working manually).

Integrate NanoDeploy's single-process OpenAI server (`nanodeploy serve`) as a first-class DLRouter backend. Adds the `nanodeploy` BackendType and the `nanoctrl` service-discovery mode, which polls a dlslime-ctrl entity registry for `nanodeploy` nodes and reconciles their HTTP endpoints (served model name, model path, and basename aliases) into the NodeManager. Auto-discovery activates in hybrid serving when `--ctrl_address` is set; manual `POST /nodes/add` still works otherwise. Co-authored-by: Cursor <cursoragent@cursor.com>

Implement prefill/decode disaggregation for the NanoDeploy backend: - supports_pd_disagg() now returns True and handle_pd_request runs the two-stage flow: prefill node returns a KV migration payload, decode node RDMA-pulls the KV and generates the completion, then prefill KV blocks are released via POST /pd/free. - Forward kv_transfer_params to NanoDeploy serve nodes. - When the prefill node fully finishes a request locally (e.g. first token is EOS) it returns no migration payload; return that completion directly (with a streaming SSE fallback) instead of erroring. - nanoctrl discovery maps entity metadata.role -> EngineRole PREFILL/DECODE/HYBRID instead of always HYBRID. - Update backend contract and discovery tests accordingly. Co-authored-by: Cursor <cursoragent@cursor.com>

JimyMa requested a review from Denny991 June 3, 2026 06:05

JimyMa assigned caikun-pjlab and Denny991 Jun 3, 2026

JimyMa requested review from caikun-pjlab and removed request for Denny991 June 3, 2026 06:09

JimyMa and others added 3 commits June 7, 2026 12:53

Update README.md

46f3d7c

Update README.md

788c0f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(backends): add NanoDeploy backend with dlslime-ctrl discovery#15

feat(backends): add NanoDeploy backend with dlslime-ctrl discovery#15
JimyMa wants to merge 4 commits into
mainfrom
init_nanodeploy_backend

JimyMa commented Jun 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

JimyMa commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Running NanoDeploy with DLRouter

1. Start the dlslime-ctrl control plane (only needed for auto-discovery)

2. Start the NanoDeploy OpenAI server

3. Call NanoDeploy directly (bypass DLRouter, verify the server itself)

4. Call through DLRouter (end-to-end)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JimyMa commented Jun 2, 2026 •

edited

Loading