Skip to content

feat(backends): add NanoDeploy backend with dlslime-ctrl discovery#15

Open
JimyMa wants to merge 4 commits into
mainfrom
init_nanodeploy_backend
Open

feat(backends): add NanoDeploy backend with dlslime-ctrl discovery#15
JimyMa wants to merge 4 commits into
mainfrom
init_nanodeploy_backend

Conversation

@JimyMa
Copy link
Copy Markdown

@JimyMa JimyMa commented Jun 2, 2026

Summary

  • Integrate NanoDeploy's single-process OpenAI server (nanodeploy serve) as a first-class DLRouter backend (--backend nanodeploy).
  • Add BackendType.NANODEPLOY and the nanoctrl service-discovery mode that polls a dlslime-ctrl entity registry for nanodeploy nodes and reconciles their HTTP endpoints into the NodeManager (served-model-name, model-path, and basename aliases).
  • Auto-discovery activates in hybrid serving when --ctrl_address is set; manual POST /nodes/add still works otherwise.
  • Docs: README updated with a supported-backend row, a dedicated NanoDeploy + dlslime-ctrl quick start, and a request example.

Running NanoDeploy with DLRouter

1. Start the dlslime-ctrl control plane (only needed for auto-discovery)

dlslime-ctrl server --redis-url redis://127.0.0.1:6379

2. Start the NanoDeploy OpenAI server

# inside the nanodeploy conda env
nanodeploy serve /path/to/Qwen3-0.6B \
  --host 0.0.0.0 --port 8100 \
  --served-model-name Qwen3-0.6B \
  --ctrl_address 127.0.0.1:4479

Notes:

  • The positional argument is the model path (you can also use --model /path/to/...).
  • --served-model-name is the public model id; if omitted it defaults to the basename of the model path.
  • --ctrl_address enables self-registration + heartbeat to dlslime-ctrl. Omit it to run as a standalone HTTP server.
  • All other Config fields (--ray_address, --tp, etc.) share the same names/semantics as engine_server.py. --host/--port bind the uvicorn HTTP API.

3. Call NanoDeploy directly (bypass DLRouter, verify the server itself)

curl http://localhost:8100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen3-0.6B","messages":[{"role":"user","content":"Hello"}]}'

Other endpoints:

curl http://localhost:8100/health        # health check
curl http://localhost:8100/v1/models     # served-name / path / basename are all aliases

# /v1/completions (text completion)
curl http://localhost:8100/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen3-0.6B","prompt":"Once upon a time","max_tokens":64}'

# streaming
curl -N http://localhost:8100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen3-0.6B","messages":[{"role":"user","content":"Hello"}],"stream":true}'

4. Call through DLRouter (end-to-end)

# DLRouter auto-discovers NanoDeploy nodes from dlslime-ctrl
python -m dlrouter \
  --backend nanodeploy \
  --serving_strategy hybrid \
  --ctrl_address 127.0.0.1:4479

# Request hits port 8000 (DLRouter), which forwards to 8100 (NanoDeploy)
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen3-0.6B","messages":[{"role":"user","content":"Hello"}]}'

Without dlslime-ctrl, drop --ctrl_address on the DLRouter side and register the node manually:

curl -X POST http://localhost:8000/nodes/add \
  -H "Content-Type: application/json" \
  -d '{"url":"http://127.0.0.1:8100"}'

Test plan

  • pytest tests/core/test_nanoctrl_discovery.py tests/backends/test_backend_contracts.py
  • End-to-end: nanodeploy serve + dlslime-ctrl + python -m dlrouter --backend nanodeploy --serving_strategy hybrid --ctrl_address 127.0.0.1:4479, then a /v1/chat/completions curl (verified working manually).

Integrate NanoDeploy's single-process OpenAI server (`nanodeploy serve`) as
a first-class DLRouter backend. Adds the `nanodeploy` BackendType and the
`nanoctrl` service-discovery mode, which polls a dlslime-ctrl entity registry
for `nanodeploy` nodes and reconciles their HTTP endpoints (served model name,
model path, and basename aliases) into the NodeManager.

Auto-discovery activates in hybrid serving when `--ctrl_address` is set;
manual `POST /nodes/add` still works otherwise.

Co-authored-by: Cursor <cursoragent@cursor.com>
@JimyMa JimyMa requested a review from Denny991 June 3, 2026 06:05
@JimyMa JimyMa requested review from caikun-pjlab and removed request for Denny991 June 3, 2026 06:09
JimyMa and others added 3 commits June 7, 2026 12:53
Implement prefill/decode disaggregation for the NanoDeploy backend:
- supports_pd_disagg() now returns True and handle_pd_request runs the
  two-stage flow: prefill node returns a KV migration payload, decode node
  RDMA-pulls the KV and generates the completion, then prefill KV blocks are
  released via POST /pd/free.
- Forward kv_transfer_params to NanoDeploy serve nodes.
- When the prefill node fully finishes a request locally (e.g. first token is
  EOS) it returns no migration payload; return that completion directly
  (with a streaming SSE fallback) instead of erroring.
- nanoctrl discovery maps entity metadata.role -> EngineRole
  PREFILL/DECODE/HYBRID instead of always HYBRID.
- Update backend contract and discovery tests accordingly.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants