Skip to content

Fix VLLM inference HTTP connection limits#186

Merged
Hynek Kydlíček (hynky1999) merged 41 commits into
mainfrom
codex/vllm-http-connection-limits
Jun 3, 2026
Merged

Fix VLLM inference HTTP connection limits#186
Hynek Kydlíček (hynky1999) merged 41 commits into
mainfrom
codex/vllm-http-connection-limits

Conversation

@hynky1999

@hynky1999 Hynek Kydlíček (hynky1999) commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • wire OpenAI-compatible httpx connection limits to Refiner's max_concurrent_requests setting
  • apply the same limit wiring to VLLM runtime-service clients used by generate_pooling / Robometer
  • retry/wrap all httpx transport errors, including RemoteProtocolError from Modal tunnels
  • make Robometer inference transport failures fail-soft with bounded debug columns instead of killing the whole run after retries are exhausted

Root cause

Refiner's max_concurrent_requests only controlled an asyncio semaphore. The underlying httpx.AsyncClient was still using httpx defaults: max_connections=100 and max_keepalive_connections=20. A Robometer run configured for 256/512 concurrent requests could therefore queue hundreds of large multimodal payloads inside the client/runtime instead of actually sending them to vLLM.

There was a second bug in the retry layer: httpx.RemoteProtocolError is an httpx.TransportError, but not an httpx.NetworkError, so Modal tunnel disconnects could bypass Refiner's retry wrapper and fail the worker directly.

With Robometer-sized payloads, the 512-request Modal repro generated about 1.08 GB of serialized JSON, so low-memory workers can still be a separate resource issue if max_in_flight is set very high.

Duplicate check

I checked the open Refiner PRs for concurrency/httpx/VLLM/pooling work. PR #139 is related to adaptive inference rate limiting, but it targets older text-generation runtime code and does not wire generate_pooling/VLLM httpx connection limits or transport-error wrapping.

Verification

  • .venv/bin/python -m pytest tests/inference/test_generate_pooling.py tests/inference/test_transport.py tests/robotics/test_reward.py -q
  • uv run ruff check src/refiner/inference/internal/transport.py src/refiner/inference/providers/openai.py src/refiner/inference/internal/runtime.py src/refiner/robotics/reward.py tests/inference/test_generate_pooling.py tests/inference/test_transport.py tests/robotics/test_reward.py
  • commit hooks: ruff, ruff format, ty
  • Modal direct vLLM tunnel repro: 512/512 Robometer /pooling requests succeeded with explicit httpx limits; app ap-mXRNao7oX2D3TLZ8maN8j5
  • Macrodata dev Refiner cloud repro: https://dev.macrodata.co/jobs/macrodata/019e89cb-0c3f-77e0-9315-dcb817961b61 completed; stage 0 used async_map.max_in_flight=512, max_concurrent_requests=512, max in_flight=512, waiting_requests=0, successful_requests=588

AI assistance was used for this change.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces configurable connection limits for OpenAI clients using httpx.Limits and adds a fail_soft error handling mode to the robotics reward_score function to gracefully capture and log inference failures. Feedback was provided to improve the exception chain traversal logic in _exception_chain so that it correctly respects Python's exception context suppression when __cause__ is explicitly set to None.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/refiner/robotics/reward.py Outdated
Comment on lines +243 to +252
current: BaseException | None = exc
while current is not None and id(current) not in seen:
seen.add(id(current))
chain.append(
{
"type": type(current).__name__,
"message": _truncate_debug_text(str(current)),
}
)
current = current.__cause__ or current.__context__

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In _exception_chain, the traversal logic current = current.__cause__ or current.__context__ does not respect Python's exception context suppression (i.e., raise ... from None). When __cause__ is explicitly set to None, __suppress_context__ is set to True, but the current implementation will still fall back to __context__ because None is falsy.

To correctly respect context suppression, we should check __suppress_context__ when __cause__ is None.

Suggested change
current: BaseException | None = exc
while current is not None and id(current) not in seen:
seen.add(id(current))
chain.append(
{
"type": type(current).__name__,
"message": _truncate_debug_text(str(current)),
}
)
current = current.__cause__ or current.__context__
current: BaseException | None = exc
while current is not None and id(current) not in seen:
seen.add(id(current))
chain.append(
{
"type": type(current).__name__,
"message": _truncate_debug_text(str(current)),
}
)
if current.__cause__ is not None:
current = current.__cause__
elif getattr(current, "__suppress_context__", False):
current = None
else:
current = current.__context__

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3f9c5e738b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

Comment thread src/refiner/robotics/reward.py Outdated
error_column: str = "robometer_error",
debug_column: str = "robometer_debug",
max_concurrent_requests: int = 256,
fail_soft: bool = True,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep API failures fail-fast by default

With fail_soft defaulting to true, any InferenceAPICallError or InferenceRetryError from the pooling endpoint is converted into empty reward_score and robometer_success lists. In environments where the Robometer/vLLM endpoint is misconfigured or returns non-transient errors such as 400/404/invalid JSON, the pipeline now silently marks every episode as unscored instead of failing, which can corrupt downstream experiment metrics unless callers know to opt out with fail_soft=False.

Useful? React with 👍 / 👎.

Co-authored-by: Codex <codex@openai.com>
@hynky1999 Hynek Kydlíček (hynky1999) force-pushed the codex/vllm-http-connection-limits branch from 3f9c5e7 to 7e321a8 Compare June 2, 2026 19:23

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7e321a88fe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

Comment thread src/refiner/robotics/reward.py
Co-authored-by: OpenAI Codex <codex@openai.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 798cc15649

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

Comment on lines +98 to +102
session = aiohttp.ClientSession(
connector=connector,
headers=dict(self.headers),
timeout=aiohttp.ClientTimeout(total=_ENDPOINT_TIMEOUT_SECONDS),
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve environment proxy support for endpoint clients

In deployments that reach OpenAI-compatible endpoints through HTTP_PROXY/HTTPS_PROXY, this new aiohttp.ClientSession does not opt into environment proxy settings (trust_env=True). The previous httpx.AsyncClient honored those env settings by default, so after this switch the same requests bypass the configured proxy and fail with connection/timeouts even though the environment is configured correctly. Please enable trust_env or otherwise plumb proxy configuration into the session.

Useful? React with 👍 / 👎.

Co-authored-by: OpenAI Codex <codex@openai.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c47bbbde04

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

Comment thread src/refiner/platform/client/api.py Outdated
Comment on lines 346 to 350
response = requests.request(
method="PUT",
url=url,
headers=required_headers,
content=payload_bytes,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use requests' data argument for uploads

When a cloud run needs to upload a NEW file, this now calls requests.request(..., content=payload_bytes), but requests does not accept a content keyword (the body parameter is data). That raises a TypeError before any PUT is sent, and it is not caught by the requests.RequestException handler, so cloud file uploads fail instead of uploading the payload.

Useful? React with 👍 / 👎.

Co-authored-by: OpenAI Codex <codex@openai.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9149c88670

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

Comment on lines 171 to 174
return await _handle_json_response(
response,
url=_request_url(client, endpoint_path),
request_body=dict(payload),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle aiohttp body read failures as retryable

With aiohttp, client.post() returns after the response headers, so the actual body read happens inside _handle_json_response() via response.json()/response.text() outside this try. If an endpoint drops the connection or times out while streaming the JSON body, aiohttp raises errors such as ClientPayloadError/timeout from this line; _retry() then treats them as non-InferenceAPICallError and fails on the first attempt instead of retrying a transient transport failure. Wrap the body parsing in the same transport-error conversion path or catch these aiohttp read errors in _handle_json_response().

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: abe4f760cc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

"hf://datasets/lerobot/libero",
)
OUTPUT_ROOT = os.environ.get(
"ROBOMETER_OUTPUT_ROOT",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Default Robometer example to row task metadata

When ROBOMETER_TASK is unset, this example still passes the generic string "complete the robot manipulation task" into reward_score, so _resolve_task_text never uses each LeRobotRow.tasks value. On multi-task inputs such as the default lerobot/libero dataset, every episode is scored against the same vague instruction rather than its actual task, which can produce misleading reward/success columns for users who run the example as-is; leave this as None unless the environment variable is explicitly set.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 035f88f577

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

output = f"{OUTPUT_ROOT}/{stamp}"

(
mdr.read_lerobot(INPUT_DATASET, num_shards=NUM_SHARDS)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add docs for the Robometer workflow

This adds a user-visible Robometer scoring workflow, but the commit has no corresponding docs/ update; /workspace/refiner/AGENTS.md requires that “Any new feature ... must include corresponding doc updates”. Without a Mintlify docs page, users have to infer the environment variables, cloud launch behavior, and expected output columns from this example script instead of the documented workflow, so please add the matching docs in the same change set.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 035f88f577

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

output = f"{OUTPUT_ROOT}/{stamp}"

(
mdr.read_lerobot(INPUT_DATASET, num_shards=NUM_SHARDS)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add docs for the Robometer workflow

This adds a user-visible Robometer scoring workflow, but the commit has no corresponding docs/ update; /workspace/refiner/AGENTS.md requires that “Any new feature ... must include corresponding doc updates”. Without a Mintlify docs page, users have to infer the environment variables, cloud launch behavior, and expected output columns from this example script instead of the documented workflow, so please add the matching docs in the same change set.

Useful? React with 👍 / 👎.

@hynky1999 Hynek Kydlíček (hynky1999) merged commit dfe9141 into main Jun 3, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant