feat(backends): add NanoDeploy backend with dlslime-ctrl discovery#15
Open
JimyMa wants to merge 4 commits into
Open
feat(backends): add NanoDeploy backend with dlslime-ctrl discovery#15JimyMa wants to merge 4 commits into
JimyMa wants to merge 4 commits into
Conversation
Integrate NanoDeploy's single-process OpenAI server (`nanodeploy serve`) as a first-class DLRouter backend. Adds the `nanodeploy` BackendType and the `nanoctrl` service-discovery mode, which polls a dlslime-ctrl entity registry for `nanodeploy` nodes and reconciles their HTTP endpoints (served model name, model path, and basename aliases) into the NodeManager. Auto-discovery activates in hybrid serving when `--ctrl_address` is set; manual `POST /nodes/add` still works otherwise. Co-authored-by: Cursor <cursoragent@cursor.com>
Implement prefill/decode disaggregation for the NanoDeploy backend: - supports_pd_disagg() now returns True and handle_pd_request runs the two-stage flow: prefill node returns a KV migration payload, decode node RDMA-pulls the KV and generates the completion, then prefill KV blocks are released via POST /pd/free. - Forward kv_transfer_params to NanoDeploy serve nodes. - When the prefill node fully finishes a request locally (e.g. first token is EOS) it returns no migration payload; return that completion directly (with a streaming SSE fallback) instead of erroring. - nanoctrl discovery maps entity metadata.role -> EngineRole PREFILL/DECODE/HYBRID instead of always HYBRID. - Update backend contract and discovery tests accordingly. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
nanodeploy serve) as a first-class DLRouter backend (--backend nanodeploy).BackendType.NANODEPLOYand thenanoctrlservice-discovery mode that polls a dlslime-ctrl entity registry fornanodeploynodes and reconciles their HTTP endpoints into the NodeManager (served-model-name, model-path, and basename aliases).--ctrl_addressis set; manualPOST /nodes/addstill works otherwise.Running NanoDeploy with DLRouter
1. Start the dlslime-ctrl control plane (only needed for auto-discovery)
2. Start the NanoDeploy OpenAI server
# inside the nanodeploy conda env nanodeploy serve /path/to/Qwen3-0.6B \ --host 0.0.0.0 --port 8100 \ --served-model-name Qwen3-0.6B \ --ctrl_address 127.0.0.1:4479Notes:
--model /path/to/...).--served-model-nameis the public model id; if omitted it defaults to the basename of the model path.--ctrl_addressenables self-registration + heartbeat to dlslime-ctrl. Omit it to run as a standalone HTTP server.Configfields (--ray_address,--tp, etc.) share the same names/semantics asengine_server.py.--host/--portbind the uvicorn HTTP API.3. Call NanoDeploy directly (bypass DLRouter, verify the server itself)
Other endpoints:
4. Call through DLRouter (end-to-end)
Without dlslime-ctrl, drop
--ctrl_addresson the DLRouter side and register the node manually:Test plan
pytest tests/core/test_nanoctrl_discovery.py tests/backends/test_backend_contracts.pynanodeploy serve+dlslime-ctrl+python -m dlrouter --backend nanodeploy --serving_strategy hybrid --ctrl_address 127.0.0.1:4479, then a/v1/chat/completionscurl (verified working manually).