feat: Add streaming tool-call parse buffer limit to prevent excessive memory usage#8811
Conversation
| self.chat_template = load_chat_template(chat_template) | ||
|
|
||
| if self.tool_call_parser is not None: | ||
| print( |
There was a problem hiding this comment.
interesting, why print("[INFO] ...") and use logger.info()?
There was a problem hiding this comment.
Currently, in openai_frontend, the root logger is not configured in main, as a result logging does not appear to be working. So, I have been using print statements, similar to the approach in fastapi_frontend.py.
There was a problem hiding this comment.
You mean following lines doesn't work at all?
server/python/openai/openai_frontend/engine/triton_engine.py
Lines 757 to 763 in afee02f
| and len(previous_text) + len(delta_text) | ||
| > self.max_tool_call_parse_bytes | ||
| ): | ||
| print( |
|
Docs? |
| # streaming tool-call parser processes per request. | ||
| # Since the parser re-parses the entire buffer with each new chunk, | ||
| # this limit helps bound per-request CPU and memory usage. | ||
| DEFAULT_MAX_TOOL_CALL_PARSE_BYTES: int = 16 * 1024 |
There was a problem hiding this comment.
Why 16 KiB? Can this limit be bigger?
…tils.py Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>
…oolpparsing-can-oom-kill' of https://github.com/triton-inference-server/server into spolisetty/tri-1016-psirt-triton-openai-frontend-auto-toolpparsing-can-oom-kill
…tend-auto-toolpparsing-can-oom-kill
Sorry, I missed committing the README changes earlier. The documentation has now been updated. |
…tend-auto-toolpparsing-can-oom-kill
What does the PR do?
The streaming tool-call parser (
partial_json_parser.loads()) re-parses the full accumulated output on every chunk, resulting in excessive CPU and memory growth for large tool-call arguments. This PR adds a configurable per-request buffer cap--max-tool-call-parse-bytesthat truncates the stream gracefully when exceeded.Checklist
<commit_type>: <Title>Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
Where should the reviewer start?
Test plan:
Caveats:
Background
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)