fix: batch #74, #75 — explicit timeouts on OpenAI and Gemini API calls#84
Conversation
- OpenAI: httpx.Timeout on AsyncOpenAI / openai-compatible client (read=600s, connect=10s) - Gemini: call_gemini_with_timeout() wraps asyncio.to_thread with wait_for - utils.py: OPENAI_TIMEOUT_SEC / OPENAI_CONNECT_TIMEOUT_SEC constants aligned with GEMINI_TIMEOUT_SEC - All refine/verify/suggest/metadata/catchphrase/quiz call sites updated for both providers - tests/test_timeout.py covers constants, client init, and the new helpers Closes #74, #75. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- gemini.py: short timeout (30s) for files.delete; a stuck delete must not stall job teardown - api/settings.py: route OpenAI key-validation through create_openai_compatible_client with short timeout; wrap Gemini models.list with call_gemini_with_timeout (was blocking the event loop) - utils.py: add SHORT_RPC_TIMEOUT_SEC constant for cheap RPCs - metadata.py: drop redundant what-comment - test_timeout.py: replace inspect-based default-timeout test with monkeypatch-based behavioral test Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds an explicit timeout policy for all outbound OpenAI and Gemini LLM calls to comply with the repo’s async-first rule and prevent indefinite hangs, while also consolidating repeated Gemini to_thread patterns into a shared helper.
Changes:
- Introduces centralized timeout constants and a
call_gemini_with_timeout()helper to enforceasyncio.wait_for(...to_thread...)across Gemini call sites. - Extends
create_openai_compatible_client()to apply explicithttpx.Timeoutdefaults (and allow overrides) for OpenAI/OpenAI-compatible clients; migrates Whisper to use this helper. - Updates API key validation to use short, explicit timeouts and avoids event-loop blocking for Gemini model listing; adds unit tests for timeout behavior.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/test_timeout.py | Adds unit tests covering timeout constants, OpenAI client timeout defaults/overrides, and Gemini timeout helper behavior. |
| src/services/utils.py | Defines timeout policy constants, extends OpenAI-compatible client creation with explicit timeouts, and adds call_gemini_with_timeout(). |
| src/services/whisper.py | Routes Whisper transcription client creation through the timeout-configured OpenAI-compatible client helper. |
| src/services/refine.py | Switches Gemini calls from raw to_thread to call_gemini_with_timeout() for consistent timeout enforcement. |
| src/services/quiz.py | Switches Gemini calls to call_gemini_with_timeout() for consistent timeout enforcement. |
| src/services/metadata.py | Switches Gemini calls to call_gemini_with_timeout() and hardens response.text handling. |
| src/services/gemini.py | Removes inline timeout boilerplate in favor of call_gemini_with_timeout() and uses a shorter timeout for cleanup deletes. |
| src/services/catchphrase.py | Switches Gemini calls to call_gemini_with_timeout() for consistent timeout enforcement. |
| src/api/settings.py | Applies short explicit timeouts during key validation and avoids blocking the event loop for Gemini model listing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| async def call_gemini_with_timeout( | ||
| fn: Callable[..., T], | ||
| *args: Any, | ||
| timeout: float = GEMINI_TIMEOUT_SEC, | ||
| **kwargs: Any, | ||
| ) -> T: | ||
| """Run a synchronous Gemini SDK call in a worker thread with an asyncio timeout. | ||
|
|
||
| The Gemini Python SDK is blocking, so the project wraps every call in | ||
| ``asyncio.to_thread``. Without ``asyncio.wait_for`` the thread can wait | ||
| indefinitely if the API hangs (.claude/rules/async-first.md). Use this | ||
| helper for every Gemini call site so the timeout policy stays uniform. | ||
| """ | ||
| return await asyncio.wait_for( | ||
| asyncio.to_thread(fn, *args, **kwargs), | ||
| timeout=timeout, | ||
| ) |
There was a problem hiding this comment.
Thanks — the analysis is correct on both points. Addressed and deferred respectively in 10b820f:
Adopted: timeout logging. Wrapped asyncio.wait_for in try/except TimeoutError and added logger.warning(...) with the timeout budget and fn.__qualname__ so repeated hangs are surface-able from server logs. Re-raise unchanged so existing caller try/except contracts (e.g. gemini.transcribe_with_gemini's best-effort files.delete) keep working.
Deferred: dedicated bounded executor. The thread-leak risk is real but bounded by this app's operational shape:
- VoiceSRT is a single-user-ish desktop tool, not a throughput service. Typical concurrency is 1 job at a time; 5–10 is the realistic upper bound.
- The default
ThreadPoolExecutorhasmin(32, os.cpu_count() + 4)slots — ~36 on a typical machine. Exhaustion needs ≥36 concurrent hung Gemini calls. - A dedicated executor adds lifecycle (create/shutdown), config (pool size), and error paths for a hypothetical-but-not-likely scale.
Per the repo's scope-discipline (CLAUDE.md / system-level guidance: "don't introduce abstractions beyond what the task requires"), we'll revisit if/when concurrent job runners land or telemetry shows actual exhaustion. The warning log added here is the operational tripwire that would surface that signal.
Reference: src/services/utils.py:193-208 @ 10b820f
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code |
Address Copilot PR #84 feedback on src/services/utils.py:196 — the suggestion to add explicit timeout logging is adopted. The companion suggestion to introduce a dedicated bounded executor is deferred (see PR thread): the default thread pool has ~36 slots and exhaustion would require dozens of concurrent hung Gemini calls, which the single-user operational shape of this app does not realistically reach. - Wrap asyncio.wait_for in try/except TimeoutError - logger.warning includes timeout budget + fn qualname so repeated hangs are diagnosable from server logs - Re-raise unchanged so existing caller try/except contracts are preserved Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| # Best-effort cleanup; a stuck delete must not stall job teardown. | ||
| try: | ||
| await asyncio.to_thread(client.files.delete, name=uploaded.name) | ||
| await call_gemini_with_timeout(client.files.delete, name=uploaded.name, timeout=SHORT_RPC_TIMEOUT_SEC) | ||
| except Exception: | ||
| logger.warning("Failed to delete uploaded file: %s", uploaded.name) |
There was a problem hiding this comment.
Good catch — fixed in 1ccbb12. Restructured into try/finally so the delete runs even on generate_content / parse_json_response failure. CLAUDE.md mandates this exact pattern ("Temp files: Always clean up in finally blocks").
The prompt build + generate-content path moved to a helper _transcribe_uploaded() so the outer function holds a single, narrow finally around just the cleanup — keeps the upload→use→delete lifecycle visible at the function-top level.
| from collections.abc import Callable | ||
| from typing import Any, TypeVar | ||
|
|
||
| import httpx |
There was a problem hiding this comment.
Fair point — fixed in 1ccbb12. Added httpx>=0.27.0 to [project].dependencies. Production no longer depends on openai's transitive httpx to satisfy our module-level import.
| The Gemini Python SDK is blocking, so the project wraps every call in | ||
| ``asyncio.to_thread``. Without ``asyncio.wait_for`` the thread can wait | ||
| indefinitely if the API hangs (.claude/rules/async-first.md). Use this | ||
| helper for every Gemini call site so the timeout policy stays uniform. |
There was a problem hiding this comment.
Correct — fixed in 1ccbb12. Migrated model_validator._check_gemini's await asyncio.to_thread(_retrieve) to await call_gemini_with_timeout(_retrieve, timeout=SHORT_RPC_TIMEOUT_SEC). Validation is a cheap one-shot lookup so it gets the 30s short budget rather than the full 600s.
Now issue #75's "all Gemini calls have explicit asyncio.wait_for timeouts" goal is fully met across src/services/.
Three findings from Copilot: 1. src/services/model_validator.py:125 — bare `asyncio.to_thread(_retrieve)` for Gemini model validation had no timeout, contradicting the helper docstring claim of "every Gemini call site". Migrate to `call_gemini_with_timeout(_retrieve, timeout=SHORT_RPC_TIMEOUT_SEC)` so issue #75's "all Gemini calls timeout-wrapped" goal is fully met. 2. src/services/utils.py imports httpx at module level but pyproject.toml only listed it under [project.optional-dependencies].dev — runtime relied on openai's transitive httpx dep. Promote httpx>=0.27.0 to [project].dependencies so production installs cannot break from an openai dependency-graph change. 3. src/services/gemini.py:transcribe_with_gemini — file cleanup ran only on the success path; a raise from generate_content / parse_json_response leaked the uploaded file on Gemini's side. Restructure into try/finally per CLAUDE.md "Temp files: Always clean up in `finally` blocks" — split the prompt-build + generate-content into _transcribe_uploaded() so the outer function holds a single, narrow finally block around it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| dependencies = [ | ||
| "fastapi>=0.115.0,<1.0.0", | ||
| "httpx>=0.27.0", | ||
| "uvicorn[standard]>=0.34.0,<1.0.0", | ||
| "jinja2>=3.1.0,<4.0.0", |
There was a problem hiding this comment.
Both points adopted in 39cb4bc:
- Upper bound:
httpx>=0.27.0,<1.0.0— matches the project-wide convention (every other[project].dependenciesentry constrains the major version). - Dedup: removed
httpx>=0.27.0from[optional-dependencies].devsince runtime deps now cover dev installs too. Prevents drift between the two lists.
Address Copilot PR #84 round-3 feedback on pyproject.toml: - Add `<1.0.0` upper bound on `httpx>=0.27.0` to match the project-wide convention (every other [project].dependencies entry constrains the major version) — guards against an httpx 1.x release silently breaking prod installs. - Remove the now-redundant `httpx>=0.27.0` from [optional-dependencies].dev since runtime deps cover dev installs too. Prevents version drift between the two lists. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/services/model_validator.py:136
call_gemini_with_timeout()emits a warning log when it times out; this handler also logs a warning for all exceptions, so a timeout will produce duplicate warnings for the same event. Consider special-casingTimeoutError/asyncio.TimeoutErrorhere (e.g., skip logging or log at debug), while keeping the warning for other transport errors you still want surfaced.
try:
await call_gemini_with_timeout(_retrieve, timeout=SHORT_RPC_TIMEOUT_SEC)
except Exception as e:
# google-genai does not expose structured 404 — fall back to string match.
msg = str(e)
if "404" in msg or "NOT_FOUND" in msg or "not found" in msg.lower():
raise ModelNotAvailableError("gemini", model, msg) from e
logger.warning("Gemini model validation skipped (transport): %s", e)
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Closes #74
Closes #75
Summary
call_gemini_with_timeout()consolidates theasyncio.wait_for(asyncio.to_thread(...))pattern at 7+ call sitesOPENAI_TIMEOUT_SEC=600,OPENAI_CONNECT_TIMEOUT_SEC=10,SHORT_RPC_TIMEOUT_SEC=30;GEMINI_TIMEOUT_SECmoved fromgemini.pytoutils.pycreate_openai_compatible_client()with an optionaltimeoutkwarg, used by/api/settings/.../test_keyfor a 30s validation budgetsrc/api/settings.py:206was callingclient.models.list()synchronously inside an async handler (event-loop blocking) — now wrapped incall_gemini_with_timeoutproperlyImplementation notes
catchphrase.py,quiz.py,refine.pyx3,metadata.pyx2,gemini.py) migrated from inlineasyncio.wait_for(asyncio.to_thread(...))boilerplate to the newcall_gemini_with_timeout(fn, *args, **kwargs)helpergemini.transcribe_with_geminiusesSHORT_RPC_TIMEOUT_SEC(30s) for thefiles.deletecleanup so a stuck delete cannot stall job teardownwhisper.transcribe_with_whispernow routes throughcreate_openai_compatible_client("openai", ...)for consistency with the new timeout policygenai.Client(api_key=...)constructions remain inline — intentionally not consolidated (scope discipline; no current need beyond cosmetic dedup)tests/test_timeout.py— 10 unit tests covering constants policy, OpenAI client timeout (default + ollama + custom override), and the Gemini helper (return value / args / timeout firing / exception propagation / default-value behavioral spy)Pre-PR review summary
Full reviews are saved in the plugin cache:
~/.claude/cache/gh-issue-driven/74-75-batch_explicit-timeouts.gate2.md🤖 Generated via
/gh-issue-driven:ship