Fix VLLM inference HTTP connection limits#186
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces configurable connection limits for OpenAI clients using httpx.Limits and adds a fail_soft error handling mode to the robotics reward_score function to gracefully capture and log inference failures. Feedback was provided to improve the exception chain traversal logic in _exception_chain so that it correctly respects Python's exception context suppression when __cause__ is explicitly set to None.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| current: BaseException | None = exc | ||
| while current is not None and id(current) not in seen: | ||
| seen.add(id(current)) | ||
| chain.append( | ||
| { | ||
| "type": type(current).__name__, | ||
| "message": _truncate_debug_text(str(current)), | ||
| } | ||
| ) | ||
| current = current.__cause__ or current.__context__ |
There was a problem hiding this comment.
In _exception_chain, the traversal logic current = current.__cause__ or current.__context__ does not respect Python's exception context suppression (i.e., raise ... from None). When __cause__ is explicitly set to None, __suppress_context__ is set to True, but the current implementation will still fall back to __context__ because None is falsy.
To correctly respect context suppression, we should check __suppress_context__ when __cause__ is None.
| current: BaseException | None = exc | |
| while current is not None and id(current) not in seen: | |
| seen.add(id(current)) | |
| chain.append( | |
| { | |
| "type": type(current).__name__, | |
| "message": _truncate_debug_text(str(current)), | |
| } | |
| ) | |
| current = current.__cause__ or current.__context__ | |
| current: BaseException | None = exc | |
| while current is not None and id(current) not in seen: | |
| seen.add(id(current)) | |
| chain.append( | |
| { | |
| "type": type(current).__name__, | |
| "message": _truncate_debug_text(str(current)), | |
| } | |
| ) | |
| if current.__cause__ is not None: | |
| current = current.__cause__ | |
| elif getattr(current, "__suppress_context__", False): | |
| current = None | |
| else: | |
| current = current.__context__ |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3f9c5e738b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
| error_column: str = "robometer_error", | ||
| debug_column: str = "robometer_debug", | ||
| max_concurrent_requests: int = 256, | ||
| fail_soft: bool = True, |
There was a problem hiding this comment.
Keep API failures fail-fast by default
With fail_soft defaulting to true, any InferenceAPICallError or InferenceRetryError from the pooling endpoint is converted into empty reward_score and robometer_success lists. In environments where the Robometer/vLLM endpoint is misconfigured or returns non-transient errors such as 400/404/invalid JSON, the pipeline now silently marks every episode as unscored instead of failing, which can corrupt downstream experiment metrics unless callers know to opt out with fail_soft=False.
Useful? React with 👍 / 👎.
Co-authored-by: Codex <codex@openai.com>
3f9c5e7 to
7e321a8
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7e321a88fe
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
Co-authored-by: OpenAI Codex <codex@openai.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 798cc15649
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
| session = aiohttp.ClientSession( | ||
| connector=connector, | ||
| headers=dict(self.headers), | ||
| timeout=aiohttp.ClientTimeout(total=_ENDPOINT_TIMEOUT_SECONDS), | ||
| ) |
There was a problem hiding this comment.
Preserve environment proxy support for endpoint clients
In deployments that reach OpenAI-compatible endpoints through HTTP_PROXY/HTTPS_PROXY, this new aiohttp.ClientSession does not opt into environment proxy settings (trust_env=True). The previous httpx.AsyncClient honored those env settings by default, so after this switch the same requests bypass the configured proxy and fail with connection/timeouts even though the environment is configured correctly. Please enable trust_env or otherwise plumb proxy configuration into the session.
Useful? React with 👍 / 👎.
Co-authored-by: OpenAI Codex <codex@openai.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c47bbbde04
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
| response = requests.request( | ||
| method="PUT", | ||
| url=url, | ||
| headers=required_headers, | ||
| content=payload_bytes, |
There was a problem hiding this comment.
Use requests' data argument for uploads
When a cloud run needs to upload a NEW file, this now calls requests.request(..., content=payload_bytes), but requests does not accept a content keyword (the body parameter is data). That raises a TypeError before any PUT is sent, and it is not caught by the requests.RequestException handler, so cloud file uploads fail instead of uploading the payload.
Useful? React with 👍 / 👎.
Co-authored-by: OpenAI Codex <codex@openai.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9149c88670
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
| return await _handle_json_response( | ||
| response, | ||
| url=_request_url(client, endpoint_path), | ||
| request_body=dict(payload), |
There was a problem hiding this comment.
Handle aiohttp body read failures as retryable
With aiohttp, client.post() returns after the response headers, so the actual body read happens inside _handle_json_response() via response.json()/response.text() outside this try. If an endpoint drops the connection or times out while streaming the JSON body, aiohttp raises errors such as ClientPayloadError/timeout from this line; _retry() then treats them as non-InferenceAPICallError and fails on the first attempt instead of retrying a transient transport failure. Wrap the body parsing in the same transport-error conversion path or catch these aiohttp read errors in _handle_json_response().
Useful? React with 👍 / 👎.
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: abe4f760cc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
| "hf://datasets/lerobot/libero", | ||
| ) | ||
| OUTPUT_ROOT = os.environ.get( | ||
| "ROBOMETER_OUTPUT_ROOT", |
There was a problem hiding this comment.
Default Robometer example to row task metadata
When ROBOMETER_TASK is unset, this example still passes the generic string "complete the robot manipulation task" into reward_score, so _resolve_task_text never uses each LeRobotRow.tasks value. On multi-task inputs such as the default lerobot/libero dataset, every episode is scored against the same vague instruction rather than its actual task, which can produce misleading reward/success columns for users who run the example as-is; leave this as None unless the environment variable is explicitly set.
Useful? React with 👍 / 👎.
This reverts commit 8d06a03.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 035f88f577
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
| output = f"{OUTPUT_ROOT}/{stamp}" | ||
|
|
||
| ( | ||
| mdr.read_lerobot(INPUT_DATASET, num_shards=NUM_SHARDS) |
There was a problem hiding this comment.
Add docs for the Robometer workflow
This adds a user-visible Robometer scoring workflow, but the commit has no corresponding docs/ update; /workspace/refiner/AGENTS.md requires that “Any new feature ... must include corresponding doc updates”. Without a Mintlify docs page, users have to infer the environment variables, cloud launch behavior, and expected output columns from this example script instead of the documented workflow, so please add the matching docs in the same change set.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 035f88f577
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
| output = f"{OUTPUT_ROOT}/{stamp}" | ||
|
|
||
| ( | ||
| mdr.read_lerobot(INPUT_DATASET, num_shards=NUM_SHARDS) |
There was a problem hiding this comment.
Add docs for the Robometer workflow
This adds a user-visible Robometer scoring workflow, but the commit has no corresponding docs/ update; /workspace/refiner/AGENTS.md requires that “Any new feature ... must include corresponding doc updates”. Without a Mintlify docs page, users have to infer the environment variables, cloud launch behavior, and expected output columns from this example script instead of the documented workflow, so please add the matching docs in the same change set.
Useful? React with 👍 / 👎.
Summary
Root cause
Refiner's max_concurrent_requests only controlled an asyncio semaphore. The underlying httpx.AsyncClient was still using httpx defaults: max_connections=100 and max_keepalive_connections=20. A Robometer run configured for 256/512 concurrent requests could therefore queue hundreds of large multimodal payloads inside the client/runtime instead of actually sending them to vLLM.
There was a second bug in the retry layer: httpx.RemoteProtocolError is an httpx.TransportError, but not an httpx.NetworkError, so Modal tunnel disconnects could bypass Refiner's retry wrapper and fail the worker directly.
With Robometer-sized payloads, the 512-request Modal repro generated about 1.08 GB of serialized JSON, so low-memory workers can still be a separate resource issue if max_in_flight is set very high.
Duplicate check
I checked the open Refiner PRs for concurrency/httpx/VLLM/pooling work. PR #139 is related to adaptive inference rate limiting, but it targets older text-generation runtime code and does not wire generate_pooling/VLLM httpx connection limits or transport-error wrapping.
Verification
AI assistance was used for this change.