feat: Enable PyTorch2 Batching Tests#8814
Conversation
…-inference-server/server into mwittwer/enable_pytorch2_batching
There was a problem hiding this comment.
Pull request overview
This PR expands the PyTorch AOTInductor (PT2 / torch_aoti) QA assets to support and validate batching, including dynamic batching behavior, plus adds new sequence-batching models and corresponding L0 test coverage.
Changes:
- Export AOTI models (simple add/sub + torchvision) with a dynamic batch dimension and configure
max_batch_size: 8+ dynamic batching. - Add new AOTI batching-coverage models (variable non-batch dim, multi-instance) and AOTI sequence-batching models (including forward-interface + initial_state + negative-load variants).
- Extend
L0_torch_aotitests to cover batched inference, dynamic batching coalescing, multi-instance correctness, variable-shape batching, sequence scheduling, and negative load-failure checks.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
qa/L0_torch_aoti/torch_aoti_infer_test.py |
Adds batched inference cases, dynamic batching coalescing checks, variable-shape and multi-instance coverage, and new sequence-batching tests. |
qa/L0_torch_aoti/test.sh |
Adds additional models to the repo setup, pulls in sequence models, and runs a new negative load-failure phase. |
qa/common/gen_qa_models.py |
Exports AOTI models with dynamic batch dims, sets max_batch_size, and adds new batching-coverage model generators. |
qa/common/gen_qa_model_repository |
Wires AOTI implicit-sequence model generation into the model repository build step. |
qa/common/gen_qa_implicit_models.py |
Implements AOTI sequence model + configs (including variants and negative configs) and adds a --torch-aoti flag. |
Comments suppressed due to low confidence (1)
qa/L0_torch_aoti/test.sh:197
- The redirection operator is incorrect (
&1>2). This does not redirect output to stderr as intended; use1>&2so the test runner properly captures failures.
if [[ ${RET} -ne 0 ]]; then
echo -e "${COLOR_ERROR}\n***\n*** Test Suite FAILED\n***${COLOR_RESET}" &1>2
else
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| kill -s SIGINT ${SERVER_PID} | ||
| wait ${SERVER_PID} || true | ||
| fi | ||
| rm -rf ${BAD_MODELDIR} |
There was a problem hiding this comment.
Remove model directories at the top so that one can inspect them after test completes.
There was a problem hiding this comment.
Updated and moved to the top of the tests:
https://github.com/triton-inference-server/server/pull/8814/changes#diff-97b7cf613201d3c908b9cb50d67db76ebf5bb026a737d67a8a805ba0fe653d9cR83-R85
| max_sequence_idle_microseconds: 5000000 | ||
| control_input [ | ||
| {{ | ||
| name: "INPUT__2" |
There was a problem hiding this comment.
we should probably have a test with the other parameter naming schema used as well.
There was a problem hiding this comment.
Added: test_forward_interface_sequence to cover the ARGS[...]/RESULT[...] schema
|
@whoisj Should we test functionality of dynamic and sequence batching in L0_batcher and L0_sequence_batcher? |
|
We should, but we should NOT block this PR because of it. |
What does the PR do?
Makes the AOTI test models batch‑capable and adds coverage: the simple add/sub model is exported with a dynamic batch dim and max_batch_size: 8; a new sequence (implicit‑state accumulator) model + config is added to gen_qa_implicit_models.py and wired into gen_qa_model_repository; and torch_aoti_infer_test.py gains batched inference cases (batch 1/4/8 across dtypes) plus a sequence test class (single + interleaved sequences), with test.sh updated to pull and run them.
Checklist
<commit_type>: <Title>Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
triton-inference-server/pytorch_backend#196
Where should the reviewer start?
Test plan:
Caveats:
Background
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)