Skip to content

fix: batch #74, #75 — explicit timeouts on OpenAI and Gemini API calls#84

Merged
JFK merged 6 commits into
mainfrom
74-75-batch/explicit-timeouts
May 18, 2026
Merged

fix: batch #74, #75 — explicit timeouts on OpenAI and Gemini API calls#84
JFK merged 6 commits into
mainfrom
74-75-batch/explicit-timeouts

Conversation

@JFK
Copy link
Copy Markdown
Owner

@JFK JFK commented May 18, 2026

Closes #74
Closes #75

Summary

  • Add explicit timeouts to every OpenAI and Gemini external API call (CLAUDE.md / .claude/rules/async-first.md compliance)
  • New helper call_gemini_with_timeout() consolidates the asyncio.wait_for(asyncio.to_thread(...)) pattern at 7+ call sites
  • New constants OPENAI_TIMEOUT_SEC=600, OPENAI_CONNECT_TIMEOUT_SEC=10, SHORT_RPC_TIMEOUT_SEC=30; GEMINI_TIMEOUT_SEC moved from gemini.py to utils.py
  • Extends create_openai_compatible_client() with an optional timeout kwarg, used by /api/settings/.../test_key for a 30s validation budget
  • Side fix: src/api/settings.py:206 was calling client.models.list() synchronously inside an async handler (event-loop blocking) — now wrapped in call_gemini_with_timeout properly

Implementation notes

  • All 7 Gemini call sites (catchphrase.py, quiz.py, refine.py x3, metadata.py x2, gemini.py) migrated from inline asyncio.wait_for(asyncio.to_thread(...)) boilerplate to the new call_gemini_with_timeout(fn, *args, **kwargs) helper
  • gemini.transcribe_with_gemini uses SHORT_RPC_TIMEOUT_SEC (30s) for the files.delete cleanup so a stuck delete cannot stall job teardown
  • whisper.transcribe_with_whisper now routes through create_openai_compatible_client("openai", ...) for consistency with the new timeout policy
  • 8 genai.Client(api_key=...) constructions remain inline — intentionally not consolidated (scope discipline; no current need beyond cosmetic dedup)
  • New tests/test_timeout.py — 10 unit tests covering constants policy, OpenAI client timeout (default + ollama + custom override), and the Gemini helper (return value / args / timeout firing / exception propagation / default-value behavioral spy)

Pre-PR review summary

  • gate2 mode: advisor-only
  • audit: skipped
  • binary_gate: (none)
  • cso: green
  • qa-lead: green
  • cto: green
  • review provider: code-review

Full reviews are saved in the plugin cache:

  • ~/.claude/cache/gh-issue-driven/74-75-batch_explicit-timeouts.gate2.md

🤖 Generated via /gh-issue-driven:ship

JFK and others added 2 commits May 18, 2026 23:12
- OpenAI: httpx.Timeout on AsyncOpenAI / openai-compatible client (read=600s, connect=10s)
- Gemini: call_gemini_with_timeout() wraps asyncio.to_thread with wait_for
- utils.py: OPENAI_TIMEOUT_SEC / OPENAI_CONNECT_TIMEOUT_SEC constants aligned with GEMINI_TIMEOUT_SEC
- All refine/verify/suggest/metadata/catchphrase/quiz call sites updated for both providers
- tests/test_timeout.py covers constants, client init, and the new helpers

Closes #74, #75.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- gemini.py: short timeout (30s) for files.delete; a stuck delete must not stall job teardown
- api/settings.py: route OpenAI key-validation through create_openai_compatible_client with short timeout; wrap Gemini models.list with call_gemini_with_timeout (was blocking the event loop)
- utils.py: add SHORT_RPC_TIMEOUT_SEC constant for cheap RPCs
- metadata.py: drop redundant what-comment
- test_timeout.py: replace inspect-based default-timeout test with monkeypatch-based behavioral test

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 18, 2026 14:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an explicit timeout policy for all outbound OpenAI and Gemini LLM calls to comply with the repo’s async-first rule and prevent indefinite hangs, while also consolidating repeated Gemini to_thread patterns into a shared helper.

Changes:

  • Introduces centralized timeout constants and a call_gemini_with_timeout() helper to enforce asyncio.wait_for(...to_thread...) across Gemini call sites.
  • Extends create_openai_compatible_client() to apply explicit httpx.Timeout defaults (and allow overrides) for OpenAI/OpenAI-compatible clients; migrates Whisper to use this helper.
  • Updates API key validation to use short, explicit timeouts and avoids event-loop blocking for Gemini model listing; adds unit tests for timeout behavior.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/test_timeout.py Adds unit tests covering timeout constants, OpenAI client timeout defaults/overrides, and Gemini timeout helper behavior.
src/services/utils.py Defines timeout policy constants, extends OpenAI-compatible client creation with explicit timeouts, and adds call_gemini_with_timeout().
src/services/whisper.py Routes Whisper transcription client creation through the timeout-configured OpenAI-compatible client helper.
src/services/refine.py Switches Gemini calls from raw to_thread to call_gemini_with_timeout() for consistent timeout enforcement.
src/services/quiz.py Switches Gemini calls to call_gemini_with_timeout() for consistent timeout enforcement.
src/services/metadata.py Switches Gemini calls to call_gemini_with_timeout() and hardens response.text handling.
src/services/gemini.py Removes inline timeout boilerplate in favor of call_gemini_with_timeout() and uses a shorter timeout for cleanup deletes.
src/services/catchphrase.py Switches Gemini calls to call_gemini_with_timeout() for consistent timeout enforcement.
src/api/settings.py Applies short explicit timeouts during key validation and avoids blocking the event loop for Gemini model listing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/services/utils.py Outdated
Comment on lines +180 to +196
async def call_gemini_with_timeout(
fn: Callable[..., T],
*args: Any,
timeout: float = GEMINI_TIMEOUT_SEC,
**kwargs: Any,
) -> T:
"""Run a synchronous Gemini SDK call in a worker thread with an asyncio timeout.

The Gemini Python SDK is blocking, so the project wraps every call in
``asyncio.to_thread``. Without ``asyncio.wait_for`` the thread can wait
indefinitely if the API hangs (.claude/rules/async-first.md). Use this
helper for every Gemini call site so the timeout policy stays uniform.
"""
return await asyncio.wait_for(
asyncio.to_thread(fn, *args, **kwargs),
timeout=timeout,
)
Copy link
Copy Markdown
Owner Author

@JFK JFK May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — the analysis is correct on both points. Addressed and deferred respectively in 10b820f:

Adopted: timeout logging. Wrapped asyncio.wait_for in try/except TimeoutError and added logger.warning(...) with the timeout budget and fn.__qualname__ so repeated hangs are surface-able from server logs. Re-raise unchanged so existing caller try/except contracts (e.g. gemini.transcribe_with_gemini's best-effort files.delete) keep working.

Deferred: dedicated bounded executor. The thread-leak risk is real but bounded by this app's operational shape:

  • VoiceSRT is a single-user-ish desktop tool, not a throughput service. Typical concurrency is 1 job at a time; 5–10 is the realistic upper bound.
  • The default ThreadPoolExecutor has min(32, os.cpu_count() + 4) slots — ~36 on a typical machine. Exhaustion needs ≥36 concurrent hung Gemini calls.
  • A dedicated executor adds lifecycle (create/shutdown), config (pool size), and error paths for a hypothetical-but-not-likely scale.

Per the repo's scope-discipline (CLAUDE.md / system-level guidance: "don't introduce abstractions beyond what the task requires"), we'll revisit if/when concurrent job runners land or telemetry shows actual exhaustion. The warning log added here is the operational tripwire that would surface that signal.

Reference: src/services/utils.py:193-208 @ 10b820f

@JFK
Copy link
Copy Markdown
Owner Author

JFK commented May 18, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

Address Copilot PR #84 feedback on src/services/utils.py:196 — the
suggestion to add explicit timeout logging is adopted. The companion
suggestion to introduce a dedicated bounded executor is deferred (see
PR thread): the default thread pool has ~36 slots and exhaustion would
require dozens of concurrent hung Gemini calls, which the single-user
operational shape of this app does not realistically reach.

- Wrap asyncio.wait_for in try/except TimeoutError
- logger.warning includes timeout budget + fn qualname so repeated hangs
  are diagnosable from server logs
- Re-raise unchanged so existing caller try/except contracts are preserved

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Comment thread src/services/gemini.py Outdated
Comment on lines 63 to 67
# Best-effort cleanup; a stuck delete must not stall job teardown.
try:
await asyncio.to_thread(client.files.delete, name=uploaded.name)
await call_gemini_with_timeout(client.files.delete, name=uploaded.name, timeout=SHORT_RPC_TIMEOUT_SEC)
except Exception:
logger.warning("Failed to delete uploaded file: %s", uploaded.name)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 1ccbb12. Restructured into try/finally so the delete runs even on generate_content / parse_json_response failure. CLAUDE.md mandates this exact pattern ("Temp files: Always clean up in finally blocks").

The prompt build + generate-content path moved to a helper _transcribe_uploaded() so the outer function holds a single, narrow finally around just the cleanup — keeps the upload→use→delete lifecycle visible at the function-top level.

src/services/gemini.py:28-40 @ 1ccbb12

Comment thread src/services/utils.py
from collections.abc import Callable
from typing import Any, TypeVar

import httpx
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point — fixed in 1ccbb12. Added httpx>=0.27.0 to [project].dependencies. Production no longer depends on openai's transitive httpx to satisfy our module-level import.

pyproject.toml:9 @ 1ccbb12

Comment thread src/services/utils.py
Comment on lines +188 to +191
The Gemini Python SDK is blocking, so the project wraps every call in
``asyncio.to_thread``. Without ``asyncio.wait_for`` the thread can wait
indefinitely if the API hangs (.claude/rules/async-first.md). Use this
helper for every Gemini call site so the timeout policy stays uniform.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct — fixed in 1ccbb12. Migrated model_validator._check_gemini's await asyncio.to_thread(_retrieve) to await call_gemini_with_timeout(_retrieve, timeout=SHORT_RPC_TIMEOUT_SEC). Validation is a cheap one-shot lookup so it gets the 30s short budget rather than the full 600s.

Now issue #75's "all Gemini calls have explicit asyncio.wait_for timeouts" goal is fully met across src/services/.

src/services/model_validator.py:124-130 @ 1ccbb12

Three findings from Copilot:

1. src/services/model_validator.py:125 — bare `asyncio.to_thread(_retrieve)`
   for Gemini model validation had no timeout, contradicting the helper
   docstring claim of "every Gemini call site". Migrate to
   `call_gemini_with_timeout(_retrieve, timeout=SHORT_RPC_TIMEOUT_SEC)` so
   issue #75's "all Gemini calls timeout-wrapped" goal is fully met.

2. src/services/utils.py imports httpx at module level but pyproject.toml
   only listed it under [project.optional-dependencies].dev — runtime
   relied on openai's transitive httpx dep. Promote httpx>=0.27.0 to
   [project].dependencies so production installs cannot break from an
   openai dependency-graph change.

3. src/services/gemini.py:transcribe_with_gemini — file cleanup ran only
   on the success path; a raise from generate_content / parse_json_response
   leaked the uploaded file on Gemini's side. Restructure into try/finally
   per CLAUDE.md "Temp files: Always clean up in `finally` blocks" — split
   the prompt-build + generate-content into _transcribe_uploaded() so the
   outer function holds a single, narrow finally block around it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Comment thread pyproject.toml
Comment on lines 7 to 11
dependencies = [
"fastapi>=0.115.0,<1.0.0",
"httpx>=0.27.0",
"uvicorn[standard]>=0.34.0,<1.0.0",
"jinja2>=3.1.0,<4.0.0",
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both points adopted in 39cb4bc:

  • Upper bound: httpx>=0.27.0,<1.0.0 — matches the project-wide convention (every other [project].dependencies entry constrains the major version).
  • Dedup: removed httpx>=0.27.0 from [optional-dependencies].dev since runtime deps now cover dev installs too. Prevents drift between the two lists.

pyproject.toml:7-22 @ 39cb4bc

Address Copilot PR #84 round-3 feedback on pyproject.toml:

- Add `<1.0.0` upper bound on `httpx>=0.27.0` to match the project-wide
  convention (every other [project].dependencies entry constrains the
  major version) — guards against an httpx 1.x release silently breaking
  prod installs.
- Remove the now-redundant `httpx>=0.27.0` from [optional-dependencies].dev
  since runtime deps cover dev installs too. Prevents version drift
  between the two lists.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/services/model_validator.py:136

  • call_gemini_with_timeout() emits a warning log when it times out; this handler also logs a warning for all exceptions, so a timeout will produce duplicate warnings for the same event. Consider special-casing TimeoutError/asyncio.TimeoutError here (e.g., skip logging or log at debug), while keeping the warning for other transport errors you still want surfaced.
    try:
        await call_gemini_with_timeout(_retrieve, timeout=SHORT_RPC_TIMEOUT_SEC)
    except Exception as e:
        # google-genai does not expose structured 404 — fall back to string match.
        msg = str(e)
        if "404" in msg or "NOT_FOUND" in msg or "not found" in msg.lower():
            raise ModelNotAvailableError("gemini", model, msg) from e
        logger.warning("Gemini model validation skipped (transport): %s", e)

Comment thread src/services/gemini.py
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@JFK JFK merged commit 57aa7ab into main May 18, 2026
2 checks passed
@JFK JFK deleted the 74-75-batch/explicit-timeouts branch May 18, 2026 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants