fix benchmark template by Dogacel · Pull Request #552 · sgl-project/SpecForge

Dogacel · 2026-04-30T04:47:07Z

Motivation

Chat template of some models (gpt-oss, qwen3...) are not rendered correctly in SGLang, causing
accuracy to be reported incorrectly.

This patch checks if SGLang consists a template for the given model, if not it handles chat formatting manually using HF tokenizer and sends the templated text directly to SGLang.

Modifications

Auto detect chat template format and override it if sglang fails to auto-detect.
Fix GSM8K benchmark not applying the chat format correctly.

Also adds 3 new benchmarks:

SVAMP: Simple Variations on Arithmetic Math word Problems - https://github.com/arkilpatel/SVAMP
GSM1K: Basically GSM8K but designed to measure overfit on GSM8K.
Alpaca: Tatsu-lab's conversational fine-tuning dataset - https://huggingface.co/datasets/tatsu-lab/alpaca.

Related Issues

Fixes #551

Accuracy Test

Tested on 3 model families with different template formats gpt-oss-20b, qwen3 8B, llama3.1 8B.

Any model that reported their accuracy using the previous benchmark should ensure SGLang was able to auto-detect the template. Moreover all models should re-report their GSM8K benchmark results because not applying template correctly causes accuracy drops.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://sgl-fru7574.slack.com/archives/C09784E3EN6 to discuss your PR.

gemini-code-assist · 2026-04-30T04:47:10Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

fix benchmark template

86c528b

Dogacel requested a review from FrankLeeeee as a code owner April 30, 2026 04:47

Dogacel mentioned this pull request May 5, 2026

[Bug] EAGLE3 Tree Decoding CUDA error: an illegal memory access was encountered sgl-project/sglang#24402

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix benchmark template#552

fix benchmark template#552
Dogacel wants to merge 1 commit into
sgl-project:mainfrom
Dogacel:fix-benchmark-template

Dogacel commented Apr 30, 2026

Uh oh!

gemini-code-assist Bot commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Dogacel commented Apr 30, 2026

Motivation

Modifications

Related Issues

Accuracy Test

Checklist

Uh oh!

gemini-code-assist Bot commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant