feat: Add streaming tool-call parse buffer limit to prevent excessive memory usage by pskiran1 · Pull Request #8811 · triton-inference-server/server

pskiran1 · 2026-05-31T16:09:39Z

What does the PR do?

The streaming tool-call parser (partial_json_parser.loads()) re-parses the full accumulated output on every chunk, resulting in excessive CPU and memory growth for large tool-call arguments. This PR adds a configurable per-request buffer cap --max-tool-call-parse-bytes that truncates the stream gracefully when exceeded.

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

Where should the reviewer start?

Test plan:

CI Pipeline ID: 53226753

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

whoisj

LGTM. I did have a couple of non-blocking questions.

Would be good if we could get @yinggeh to review this as well, but please merge by EoD Friday even if he's not able to get a review completed by then.

whoisj · 2026-06-04T18:59:21Z

        self.chat_template = load_chat_template(chat_template)

+        if self.tool_call_parser is not None:
+            print(


interesting, why print("[INFO] ...") and use logger.info()?

Currently, in openai_frontend, the root logger is not configured in main, as a result logging does not appear to be working. So, I have been using print statements, similar to the approach in fastapi_frontend.py.

You mean following lines doesn't work at all?

server/python/openai/openai_frontend/engine/triton_engine.py

Lines 757 to 763 in afee02f

except Exception:

logger.debug(

"Failed to cancel inference after tool-call parse "

"truncation (request %s)",

request_id,

exc_info=True,

)

whoisj · 2026-06-04T18:59:33Z

+                and len(previous_text) + len(delta_text)
+                > self.max_tool_call_parse_bytes
+            ):
+                print(


better as logger.warning()?

yinggeh · 2026-06-05T09:42:52Z

Docs?

yinggeh · 2026-06-05T09:46:56Z

+# streaming tool-call parser processes per request.
+# Since the parser re-parses the entire buffer with each new chunk,
+# this limit helps bound per-request CPU and memory usage.
+DEFAULT_MAX_TOOL_CALL_PARSE_BYTES: int = 16 * 1024


Why 16 KiB? Can this limit be bigger?

…tils.py Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>

…oolpparsing-can-oom-kill' of https://github.com/triton-inference-server/server into spolisetty/tri-1016-psirt-triton-openai-frontend-auto-toolpparsing-can-oom-kill

…tend-auto-toolpparsing-can-oom-kill

pskiran1 · 2026-06-05T17:05:16Z

Docs?

Sorry, I missed committing the README changes earlier. The documentation has now been updated.
Thank you.

…tend-auto-toolpparsing-can-oom-kill

pskiran1 added 5 commits May 31, 2026 10:47

Update

baa3e0f

Update

dd6d54a

Update

d4a894d

Update

dea81e5

Fix pre-commit errors

a98e846

pskiran1 requested review from mattwittwer, whoisj and yinggeh June 1, 2026 03:29

pskiran1 added the PR: feat A new feature label Jun 1, 2026

whoisj previously approved these changes Jun 4, 2026

View reviewed changes

yinggeh reviewed Jun 5, 2026

View reviewed changes

Comment thread python/openai/openai_frontend/engine/triton_engine.py Outdated

Update python/openai/openai_frontend/engine/utils/tool_call_parsers/u…

bc9227c

…tils.py Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>

pskiran1 dismissed whoisj’s stale review via bc9227c June 5, 2026 11:20

pskiran1 added 8 commits June 5, 2026 16:51

Update

bf646c6

Merge branch 'spolisetty/tri-1016-psirt-triton-openai-frontend-auto-t…

ec0faff

…oolpparsing-can-oom-kill' of https://github.com/triton-inference-server/server into spolisetty/tri-1016-psirt-triton-openai-frontend-auto-toolpparsing-can-oom-kill

Merge branch 'main' into spolisetty/tri-1016-psirt-triton-openai-fron…

328eb9f

…tend-auto-toolpparsing-can-oom-kill

Update

ff0f19a

Update

8fb93db

Fix pre-commit

52c477b

Fix pre-commit

2bc32ba

Add doc

5ec9ea7

pskiran1 requested a review from yinggeh June 5, 2026 17:23

Merge branch 'main' into spolisetty/tri-1016-psirt-triton-openai-fron…

afee02f

…tend-auto-toolpparsing-can-oom-kill

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add streaming tool-call parse buffer limit to prevent excessive memory usage#8811

feat: Add streaming tool-call parse buffer limit to prevent excessive memory usage#8811
pskiran1 wants to merge 15 commits into
mainfrom
spolisetty/tri-1016-psirt-triton-openai-frontend-auto-toolpparsing-can-oom-kill

pskiran1 commented May 31, 2026 •

edited

Loading

Uh oh!

whoisj left a comment

Uh oh!

whoisj Jun 4, 2026

Uh oh!

pskiran1 Jun 5, 2026

Uh oh!

yinggeh Jun 9, 2026

Uh oh!

whoisj Jun 4, 2026

Uh oh!

yinggeh commented Jun 5, 2026

Uh oh!

yinggeh Jun 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pskiran1 commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

	except Exception:
	logger.debug(
	"Failed to cancel inference after tool-call parse "
	"truncation (request %s)",
	request_id,
	exc_info=True,
	)

Conversation

pskiran1 commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

whoisj left a comment

Choose a reason for hiding this comment

Uh oh!

whoisj Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

pskiran1 Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

yinggeh Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

whoisj Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

yinggeh commented Jun 5, 2026

Uh oh!

yinggeh Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pskiran1 commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

pskiran1 commented May 31, 2026 •

edited

Loading