Mtp optimization ema overlap by flyinglandlord · Pull Request #1310 · ModelTC/LightLLM

flyinglandlord · 2026-05-14T12:17:33Z

No description provided.

gemini-code-assist

Code Review

This pull request updates benchmarking and speculative decoding scripts, notably adding SSE stream parsing logic to benchmark_sharegpt.py and introducing a vLLM baseline script. Review feedback identifies a critical bug where the stream parameter was disabled while the logic depends on it, along with concerns regarding non-portable hardcoded paths, potential UTF-8 decoding issues in the stream buffer, and the use of SIGKILL for process termination. Corrections were also requested for a backend mismatch in the newly added no_mtp_fa3.sh script.

gemini-code-assist · 2026-05-14T12:19:03Z

            "top_p": 1.0,
            "temperature": 0,
-            "stream": True,
+            # "stream": True,


The stream parameter is commented out, but the logic added in lines 240-261 is specifically designed to parse an SSE (Server-Sent Events) stream. If streaming is disabled, the server will return a standard JSON response, and the parsing loop will fail to extract any content because it expects lines starting with data:. This will likely result in empty output and incorrect benchmark metrics.

Suggested change

# "stream": True,

"stream": True,

gemini-code-assist · 2026-05-14T12:19:03Z

                    chunks.append(delta_time)
+                    # OpenAI-compatible stream is SSE; one TCP chunk may contain
+                    # partial/multiple events. Parse by complete lines safely.
+                    sse_buffer += chunk.decode("utf-8", errors="ignore")


Decoding individual chunks with errors="ignore" can lead to data corruption if a multi-byte UTF-8 character is split across TCP chunks. A more robust approach is to use an IncrementalDecoder from the codecs module to handle partial characters correctly.

gemini-code-assist · 2026-05-14T12:19:03Z

+    pkill -9 -f "vllm serve" 2>/dev/null || true
+    pkill -9 -f "vllm.entrypoints.openai.api_server" 2>/dev/null || true


Using pkill -9 (SIGKILL) is generally discouraged as it prevents processes from performing necessary cleanup (e.g., releasing GPU memory or deleting temporary files). It is better to use the default SIGTERM first.

Suggested change

pkill -9 -f "vllm serve" 2>/dev/null || true

pkill -9 -f "vllm.entrypoints.openai.api_server" 2>/dev/null || true

pkill -f "vllm serve" 2>/dev/null || true

pkill -f "vllm.entrypoints.openai.api_server" 2>/dev/null || true

gemini-code-assist · 2026-05-14T12:19:03Z

+--model_dir ${MODEL_DIR} \
+--disable_dynamic_prompt_cache \
+--graph_grow_step_size 1 \
+--llm_decode_att_backend triton


There is a mismatch between the filename (no_mtp_fa3.sh) and the configured attention backend (triton). If the intention of this script is to test the FlashAttention 3 backend, this parameter should be updated.

Suggested change

--llm_decode_att_backend triton

--llm_decode_att_backend fa3

gemini-code-assist · 2026-05-14T12:19:03Z

+PATH=/data/nvme0/chenjunyi/miniconda3/envs/lightllm/bin:$PATH
+
+LOADWORKER=18 /data/nvme0/chenjunyi/miniconda3/envs/lightllm/bin/python -m lightllm.server.api_server --port 8088 \


The script contains hardcoded absolute paths to a specific user's conda environment (/data/nvme0/chenjunyi/...). This makes the script non-portable and difficult for other developers to use. Consider using environment variables or assuming the environment is already activated.

flyinglandlord added 3 commits April 20, 2026 20:23

add vllm test script

74189c2

remove 200000 token limit in test script

df54935

first commit for ema

db0a2ca

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mtp optimization ema overlap#1310

Mtp optimization ema overlap#1310
flyinglandlord wants to merge 3 commits into
mtp_optimizationfrom
mtp_optimization_ema_overlap

flyinglandlord commented May 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Uh oh!

gemini-code-assist Bot May 14, 2026

Uh oh!

gemini-code-assist Bot May 14, 2026

Uh oh!

gemini-code-assist Bot May 14, 2026

Uh oh!

gemini-code-assist Bot May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		pkill -9 -f "vllm serve" 2>/dev/null \|\| true
		pkill -9 -f "vllm.entrypoints.openai.api_server" 2>/dev/null \|\| true

		PATH=/data/nvme0/chenjunyi/miniconda3/envs/lightllm/bin:$PATH

		LOADWORKER=18 /data/nvme0/chenjunyi/miniconda3/envs/lightllm/bin/python -m lightllm.server.api_server --port 8088 \

Conversation

flyinglandlord commented May 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant