Fix VLLM inference HTTP connection limits by hynky1999 · Pull Request #186 · macrodata-labs/refiner

Hynek Kydlíček (hynky1999) · 2026-06-02T18:56:59Z

Summary

wire OpenAI-compatible httpx connection limits to Refiner's max_concurrent_requests setting
apply the same limit wiring to VLLM runtime-service clients used by generate_pooling / Robometer
retry/wrap all httpx transport errors, including RemoteProtocolError from Modal tunnels
make Robometer inference transport failures fail-soft with bounded debug columns instead of killing the whole run after retries are exhausted

Root cause

Refiner's max_concurrent_requests only controlled an asyncio semaphore. The underlying httpx.AsyncClient was still using httpx defaults: max_connections=100 and max_keepalive_connections=20. A Robometer run configured for 256/512 concurrent requests could therefore queue hundreds of large multimodal payloads inside the client/runtime instead of actually sending them to vLLM.

There was a second bug in the retry layer: httpx.RemoteProtocolError is an httpx.TransportError, but not an httpx.NetworkError, so Modal tunnel disconnects could bypass Refiner's retry wrapper and fail the worker directly.

With Robometer-sized payloads, the 512-request Modal repro generated about 1.08 GB of serialized JSON, so low-memory workers can still be a separate resource issue if max_in_flight is set very high.

Duplicate check

I checked the open Refiner PRs for concurrency/httpx/VLLM/pooling work. PR #139 is related to adaptive inference rate limiting, but it targets older text-generation runtime code and does not wire generate_pooling/VLLM httpx connection limits or transport-error wrapping.

Verification

.venv/bin/python -m pytest tests/inference/test_generate_pooling.py tests/inference/test_transport.py tests/robotics/test_reward.py -q
uv run ruff check src/refiner/inference/internal/transport.py src/refiner/inference/providers/openai.py src/refiner/inference/internal/runtime.py src/refiner/robotics/reward.py tests/inference/test_generate_pooling.py tests/inference/test_transport.py tests/robotics/test_reward.py
commit hooks: ruff, ruff format, ty
Modal direct vLLM tunnel repro: 512/512 Robometer /pooling requests succeeded with explicit httpx limits; app ap-mXRNao7oX2D3TLZ8maN8j5
Macrodata dev Refiner cloud repro: https://dev.macrodata.co/jobs/macrodata/019e89cb-0c3f-77e0-9315-dcb817961b61 completed; stage 0 used async_map.max_in_flight=512, max_concurrent_requests=512, max in_flight=512, waiting_requests=0, successful_requests=588

AI assistance was used for this change.

gemini-code-assist

Code Review

This pull request introduces configurable connection limits for OpenAI clients using httpx.Limits and adds a fail_soft error handling mode to the robotics reward_score function to gracefully capture and log inference failures. Feedback was provided to improve the exception chain traversal logic in _exception_chain so that it correctly respects Python's exception context suppression when __cause__ is explicitly set to None.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-02T18:58:10Z

+    current: BaseException | None = exc
+    while current is not None and id(current) not in seen:
+        seen.add(id(current))
+        chain.append(
+            {
+                "type": type(current).__name__,
+                "message": _truncate_debug_text(str(current)),
+            }
+        )
+        current = current.__cause__ or current.__context__


In _exception_chain, the traversal logic current = current.__cause__ or current.__context__ does not respect Python's exception context suppression (i.e., raise ... from None). When __cause__ is explicitly set to None, __suppress_context__ is set to True, but the current implementation will still fall back to __context__ because None is falsy.

To correctly respect context suppression, we should check __suppress_context__ when __cause__ is None.

Suggested change

current: BaseException | None = exc

while current is not None and id(current) not in seen:

seen.add(id(current))

chain.append(

{

"type": type(current).__name__,

"message": _truncate_debug_text(str(current)),

}

)

current = current.__cause__ or current.__context__

current: BaseException | None = exc

while current is not None and id(current) not in seen:

seen.add(id(current))

chain.append(

{

"type": type(current).__name__,

"message": _truncate_debug_text(str(current)),

}

)

if current.__cause__ is not None:

current = current.__cause__

elif getattr(current, "__suppress_context__", False):

current = None

else:

current = current.__context__

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3f9c5e738b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

chatgpt-codex-connector · 2026-06-02T18:59:16Z

+    error_column: str = "robometer_error",
+    debug_column: str = "robometer_debug",
    max_concurrent_requests: int = 256,
+    fail_soft: bool = True,


Keep API failures fail-fast by default

With fail_soft defaulting to true, any InferenceAPICallError or InferenceRetryError from the pooling endpoint is converted into empty reward_score and robometer_success lists. In environments where the Robometer/vLLM endpoint is misconfigured or returns non-transient errors such as 400/404/invalid JSON, the pipeline now silently marks every episode as unscored instead of failing, which can corrupt downstream experiment metrics unless callers know to opt out with fail_soft=False.

Useful? React with 👍 / 👎.

Co-authored-by: Codex <codex@openai.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7e321a88fe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

Co-authored-by: OpenAI Codex <codex@openai.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 798cc15649

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

chatgpt-codex-connector · 2026-06-02T20:05:55Z

+            session = aiohttp.ClientSession(
+                connector=connector,
+                headers=dict(self.headers),
+                timeout=aiohttp.ClientTimeout(total=_ENDPOINT_TIMEOUT_SECONDS),
+            )


Preserve environment proxy support for endpoint clients

In deployments that reach OpenAI-compatible endpoints through HTTP_PROXY/HTTPS_PROXY, this new aiohttp.ClientSession does not opt into environment proxy settings (trust_env=True). The previous httpx.AsyncClient honored those env settings by default, so after this switch the same requests bypass the configured proxy and fail with connection/timeouts even though the environment is configured correctly. Please enable trust_env or otherwise plumb proxy configuration into the session.

Useful? React with 👍 / 👎.

Co-authored-by: OpenAI Codex <codex@openai.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c47bbbde04

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

chatgpt-codex-connector · 2026-06-02T20:23:11Z

+            response = requests.request(
                method="PUT",
                url=url,
                headers=required_headers,
                content=payload_bytes,


Use requests' data argument for uploads

When a cloud run needs to upload a NEW file, this now calls requests.request(..., content=payload_bytes), but requests does not accept a content keyword (the body parameter is data). That raises a TypeError before any PUT is sent, and it is not caught by the requests.RequestException handler, so cloud file uploads fail instead of uploading the payload.

Useful? React with 👍 / 👎.

Co-authored-by: OpenAI Codex <codex@openai.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9149c88670

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

chatgpt-codex-connector · 2026-06-02T21:49:51Z

+        return await _handle_json_response(
            response,
            url=_request_url(client, endpoint_path),
            request_body=dict(payload),


Handle aiohttp body read failures as retryable

With aiohttp, client.post() returns after the response headers, so the actual body read happens inside _handle_json_response() via response.json()/response.text() outside this try. If an endpoint drops the connection or times out while streaming the JSON body, aiohttp raises errors such as ClientPayloadError/timeout from this line; _retry() then treats them as non-InferenceAPICallError and fails on the first attempt instead of retrying a transient transport failure. Wrap the body parsing in the same transport-error conversion path or catch these aiohttp read errors in _handle_json_response().

Useful? React with 👍 / 👎.

Co-authored-by: OpenAI Codex <codex@openai.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: abe4f760cc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

chatgpt-codex-connector · 2026-06-03T08:43:22Z

+    "hf://datasets/lerobot/libero",
+)
+OUTPUT_ROOT = os.environ.get(
+    "ROBOMETER_OUTPUT_ROOT",


Default Robometer example to row task metadata

When ROBOMETER_TASK is unset, this example still passes the generic string "complete the robot manipulation task" into reward_score, so _resolve_task_text never uses each LeRobotRow.tasks value. On multi-task inputs such as the default lerobot/libero dataset, every episode is scored against the same vague instruction rather than its actual task, which can produce misleading reward/success columns for users who run the example as-is; leave this as None unless the environment variable is explicitly set.

Useful? React with 👍 / 👎.

This reverts commit 8d06a03.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 035f88f577

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

chatgpt-codex-connector · 2026-06-03T08:58:17Z

+    output = f"{OUTPUT_ROOT}/{stamp}"
+
+    (
+        mdr.read_lerobot(INPUT_DATASET, num_shards=NUM_SHARDS)


Add docs for the Robometer workflow

This adds a user-visible Robometer scoring workflow, but the commit has no corresponding docs/ update; /workspace/refiner/AGENTS.md requires that “Any new feature ... must include corresponding doc updates”. Without a Mintlify docs page, users have to infer the environment variables, cloud launch behavior, and expected output columns from this example script instead of the documented workflow, so please add the matching docs in the same change set.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 035f88f577

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

chatgpt-codex-connector · 2026-06-03T08:58:18Z

+    output = f"{OUTPUT_ROOT}/{stamp}"
+
+    (
+        mdr.read_lerobot(INPUT_DATASET, num_shards=NUM_SHARDS)


Add docs for the Robometer workflow

This adds a user-visible Robometer scoring workflow, but the commit has no corresponding docs/ update; /workspace/refiner/AGENTS.md requires that “Any new feature ... must include corresponding doc updates”. Without a Mintlify docs page, users have to infer the environment variables, cloud launch behavior, and expected output columns from this example script instead of the documented workflow, so please add the matching docs in the same change set.

Useful? React with 👍 / 👎.

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 2, 2026

View reviewed changes

Align OpenAI client limits with inference concurrency

7e321a8

Co-authored-by: Codex <codex@openai.com>

Hynek Kydlíček (hynky1999) force-pushed the codex/vllm-http-connection-limits branch from 3f9c5e7 to 7e321a8 Compare June 2, 2026 19:23

chatgpt-codex-connector Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread src/refiner/robotics/reward.py

Use aiohttp for OpenAI inference transport

798cc15

Co-authored-by: OpenAI Codex <codex@openai.com>

chatgpt-codex-connector Bot reviewed Jun 2, 2026

View reviewed changes

Remove direct httpx usage

c47bbbd

Co-authored-by: OpenAI Codex <codex@openai.com>

chatgpt-codex-connector Bot reviewed Jun 2, 2026

View reviewed changes

Address aiohttp transport review findings

9149c88

Co-authored-by: OpenAI Codex <codex@openai.com>

chatgpt-codex-connector Bot reviewed Jun 2, 2026

View reviewed changes

Hynek Kydlíček (hynky1999) and others added 19 commits June 2, 2026 23:53

Tighten platform response typing

64cdf96

Co-authored-by: OpenAI Codex <codex@openai.com>

Keep httpx outside inference transport

100690b

Co-authored-by: OpenAI Codex <codex@openai.com>

Tighten inference client cleanup typing

c5e5e1c

Co-authored-by: OpenAI Codex <codex@openai.com>

Tighten inference transport client typing

2b94597

Co-authored-by: OpenAI Codex <codex@openai.com>

Remove temporary Robometer fail-soft debug path

af1ee90

Co-authored-by: OpenAI Codex <codex@openai.com>

Release Robometer decoded frames before inference await

b31a567

Stream Robometer sampled frames

b130ece

Add Robometer reward example

a5933dc

Use Libero in Robometer reward example

f13a814

Retry aiohttp response body read failures

70f7ae3

Use aiohttp response interface in inference transport

e991814

Inline inference endpoint URL join

1f0b05d

Document Robometer frame sampling length source

34b7679

Use aiohttp response status fields directly

1529242

Inline aiohttp response reads

66ea27c

Inline error response JSON parsing

9ad2dea

Inline inference transport response metadata

65d270a

Use refiner extras in Robometer example

c1d9dc7

Raise Robometer example concurrency default

3a47442

Hynek Kydlíček (hynky1999) added 7 commits June 3, 2026 09:29

Limit Robometer example input shards

73f2497

Document Robometer reward example

2f7b586

Update provider test response fakes

8c26b55

Raise Robometer transport retry budget

4f8b7cd

Lower Robometer example concurrency default

c73c817

Skip empty LeRobot shard commits

8d06a03

Focus Robometer tests on sampling

abe4f76

chatgpt-codex-connector Bot reviewed Jun 3, 2026

View reviewed changes

Revert "Skip empty LeRobot shard commits"

035f88f

This reverts commit 8d06a03.

chatgpt-codex-connector Bot reviewed Jun 3, 2026

View reviewed changes

Hynek Kydlíček (hynky1999) added 10 commits June 3, 2026 11:46

Use NVIDIA Libero LeRobot dataset

658dea5

Test Robometer empty video handling

beb1831

Trim Robometer example docstring

5ec0235

Remove Robometer retry payload override

2d3d558

Update Robometer reward scoring docs

bf9d6e8

Update score rewards example

0162db8

Trim reward scoring docs

770a503

Explain Robometer reward scoring usefulness

e297e72

Refine Robometer docs wording

6a354c6

Clarify Robometer task description docs

ea53b64

Hynek Kydlíček (hynky1999) merged commit dfe9141 into main Jun 3, 2026
5 checks passed

Uh oh!

Conversation

Hynek Kydlíček (hynky1999) commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Duplicate check

Verification

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Hynek Kydlíček (hynky1999) commented Jun 2, 2026 •

edited

Loading