Mtp optimization ema overlap#1310
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates benchmarking and speculative decoding scripts, notably adding SSE stream parsing logic to benchmark_sharegpt.py and introducing a vLLM baseline script. Review feedback identifies a critical bug where the stream parameter was disabled while the logic depends on it, along with concerns regarding non-portable hardcoded paths, potential UTF-8 decoding issues in the stream buffer, and the use of SIGKILL for process termination. Corrections were also requested for a backend mismatch in the newly added no_mtp_fa3.sh script.
| "top_p": 1.0, | ||
| "temperature": 0, | ||
| "stream": True, | ||
| # "stream": True, |
There was a problem hiding this comment.
The stream parameter is commented out, but the logic added in lines 240-261 is specifically designed to parse an SSE (Server-Sent Events) stream. If streaming is disabled, the server will return a standard JSON response, and the parsing loop will fail to extract any content because it expects lines starting with data:. This will likely result in empty output and incorrect benchmark metrics.
| # "stream": True, | |
| "stream": True, |
| chunks.append(delta_time) | ||
| # OpenAI-compatible stream is SSE; one TCP chunk may contain | ||
| # partial/multiple events. Parse by complete lines safely. | ||
| sse_buffer += chunk.decode("utf-8", errors="ignore") |
| pkill -9 -f "vllm serve" 2>/dev/null || true | ||
| pkill -9 -f "vllm.entrypoints.openai.api_server" 2>/dev/null || true |
There was a problem hiding this comment.
Using pkill -9 (SIGKILL) is generally discouraged as it prevents processes from performing necessary cleanup (e.g., releasing GPU memory or deleting temporary files). It is better to use the default SIGTERM first.
| pkill -9 -f "vllm serve" 2>/dev/null || true | |
| pkill -9 -f "vllm.entrypoints.openai.api_server" 2>/dev/null || true | |
| pkill -f "vllm serve" 2>/dev/null || true | |
| pkill -f "vllm.entrypoints.openai.api_server" 2>/dev/null || true |
| --model_dir ${MODEL_DIR} \ | ||
| --disable_dynamic_prompt_cache \ | ||
| --graph_grow_step_size 1 \ | ||
| --llm_decode_att_backend triton No newline at end of file |
There was a problem hiding this comment.
| PATH=/data/nvme0/chenjunyi/miniconda3/envs/lightllm/bin:$PATH | ||
|
|
||
| LOADWORKER=18 /data/nvme0/chenjunyi/miniconda3/envs/lightllm/bin/python -m lightllm.server.api_server --port 8088 \ |
There was a problem hiding this comment.
No description provided.