Skip to content

Mtp optimization ema overlap#1310

Open
flyinglandlord wants to merge 3 commits into
mtp_optimizationfrom
mtp_optimization_ema_overlap
Open

Mtp optimization ema overlap#1310
flyinglandlord wants to merge 3 commits into
mtp_optimizationfrom
mtp_optimization_ema_overlap

Conversation

@flyinglandlord
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates benchmarking and speculative decoding scripts, notably adding SSE stream parsing logic to benchmark_sharegpt.py and introducing a vLLM baseline script. Review feedback identifies a critical bug where the stream parameter was disabled while the logic depends on it, along with concerns regarding non-portable hardcoded paths, potential UTF-8 decoding issues in the stream buffer, and the use of SIGKILL for process termination. Corrections were also requested for a backend mismatch in the newly added no_mtp_fa3.sh script.

"top_p": 1.0,
"temperature": 0,
"stream": True,
# "stream": True,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The stream parameter is commented out, but the logic added in lines 240-261 is specifically designed to parse an SSE (Server-Sent Events) stream. If streaming is disabled, the server will return a standard JSON response, and the parsing loop will fail to extract any content because it expects lines starting with data:. This will likely result in empty output and incorrect benchmark metrics.

Suggested change
# "stream": True,
"stream": True,

chunks.append(delta_time)
# OpenAI-compatible stream is SSE; one TCP chunk may contain
# partial/multiple events. Parse by complete lines safely.
sse_buffer += chunk.decode("utf-8", errors="ignore")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Decoding individual chunks with errors="ignore" can lead to data corruption if a multi-byte UTF-8 character is split across TCP chunks. A more robust approach is to use an IncrementalDecoder from the codecs module to handle partial characters correctly.

Comment on lines +200 to +201
pkill -9 -f "vllm serve" 2>/dev/null || true
pkill -9 -f "vllm.entrypoints.openai.api_server" 2>/dev/null || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using pkill -9 (SIGKILL) is generally discouraged as it prevents processes from performing necessary cleanup (e.g., releasing GPU memory or deleting temporary files). It is better to use the default SIGTERM first.

Suggested change
pkill -9 -f "vllm serve" 2>/dev/null || true
pkill -9 -f "vllm.entrypoints.openai.api_server" 2>/dev/null || true
pkill -f "vllm serve" 2>/dev/null || true
pkill -f "vllm.entrypoints.openai.api_server" 2>/dev/null || true

--model_dir ${MODEL_DIR} \
--disable_dynamic_prompt_cache \
--graph_grow_step_size 1 \
--llm_decode_att_backend triton No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a mismatch between the filename (no_mtp_fa3.sh) and the configured attention backend (triton). If the intention of this script is to test the FlashAttention 3 backend, this parameter should be updated.

Suggested change
--llm_decode_att_backend triton
--llm_decode_att_backend fa3

Comment on lines +19 to +21
PATH=/data/nvme0/chenjunyi/miniconda3/envs/lightllm/bin:$PATH

LOADWORKER=18 /data/nvme0/chenjunyi/miniconda3/envs/lightllm/bin/python -m lightllm.server.api_server --port 8088 \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The script contains hardcoded absolute paths to a specific user's conda environment (/data/nvme0/chenjunyi/...). This makes the script non-portable and difficult for other developers to use. Consider using environment variables or assuming the environment is already activated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant