Skip to content

fix(proxy): invoke post-call guardrails on pass-through endpoint responses#2

Open
tuhinspatra wants to merge 2 commits into
mainfrom
fix/passthrough-post-call-guardrails
Open

fix(proxy): invoke post-call guardrails on pass-through endpoint responses#2
tuhinspatra wants to merge 2 commits into
mainfrom
fix/passthrough-post-call-guardrails

Conversation

@tuhinspatra
Copy link
Copy Markdown

@tuhinspatra tuhinspatra commented Apr 22, 2026

Summary

Upstream bug fix for BerriAI/litellm — enables post-call guardrail invocation on pass-through endpoints.

What this enables

The Rubrik guardrail plugin (predibase/llmrouter) uses apply_guardrail(input_type="response") to check
tool calls. Without this fix, pass-through routes like /vertex_ai/* skip post-call hooks entirely, so tool
blocking never fires for Gemini models accessed via pass-through.

Changes

  • Wire post_call_success_hook into non-streaming pass-through response path
  • Add call_type fallback in UnifiedLLMGuardrails for pass-through endpoints
  • Gate on explicit guardrail config (opt-in only, no backwards-compat break)
  • Handle ModifyResponseException with proper logging and failure hooks
  • 4 unit tests

Test Plan

  • test_post_call_success_hook_called_when_guardrails_configured — verifies hook fires with guardrails
  • test_post_call_success_hook_skipped_when_no_guardrails — verifies no-op without guardrails (backwards compat)
  • test_modify_response_exception_returns_200 — verifies guardrail violation returns 200 with message, calls post_call_failure_hook, includes usage
  • test_pass_through_call_type_resolved_from_logging_obj — verifies unified guardrail resolves call_type for pass-through endpoints

Run tests:

PYTHONPATH=. python3 -m pytest tests/test_litellm/proxy/pass_through_endpoints/test_passthrough_post_call_guardrails.py -v

Next steps

@tuhinspatra tuhinspatra force-pushed the fix/passthrough-post-call-guardrails branch 3 times, most recently from 032ed13 to a2d1d95 Compare April 22, 2026 21:53
…onses (BerriAI#20270)

Wire post_call_success_hook into non-streaming pass-through response path,
gated on explicit guardrail config (opt-in only, no backwards-compat break).

- Call post_call_success_hook after reading non-streaming response body
- Build enriched hook_data with guardrails metadata and litellm_logging_obj
  at call site (avoids mutation of _parsed_body which is shared by logging)
- Handle ModifyResponseException with provider-agnostic error envelope,
  post_call_failure_hook, and defensive try/except
- Strip stale content-length when guardrail modifies response body
- Move ModifyResponseException to litellm.exceptions to break cyclic import;
  re-export from custom_guardrail for backwards compat
- Add call_type fallback in UnifiedLLMGuardrails for pass-through endpoints
  using CallTypes.pass_through.value enum
5 tests covering the post-call guardrail invocation on pass-through endpoints:
- post_call_success_hook fires when guardrails configured
- post_call_success_hook skipped when no guardrails (backwards compat)
- ModifyResponseException returns 200 with provider-agnostic error
- UnifiedLLMGuardrails resolves call_type from logging_obj for pass-through
- ModifyResponseException re-export from custom_guardrail stays in sync
@tuhinspatra tuhinspatra force-pushed the fix/passthrough-post-call-guardrails branch from a2d1d95 to 32cdace Compare April 23, 2026 20:07
seph-barker pushed a commit that referenced this pull request May 18, 2026
…ps (BerriAI#28028)

* fix(ci): flag codecov uploads and enable carryforward

Coverage uploads from GHA and CircleCI were unflagged. Commits that
receive the push-triggered workflows more than once (re-runs, or branches
cut at the same SHA) accumulated many overlapping flagless sessions, and
Codecov's per-commit merge dropped the largest, ubiquitously-imported
files (router.py, proxy_server.py, main.py, utils.py, cost_calculator.py)
from the report even though the uploaded XMLs contained them.

- codecov.yaml: flag_management.default_rules.carryforward: true
- GHA reusable bases: tag each upload with its workflow/shard name
- CircleCI: tag the combined upload "circleci"; also combine the
  agent / google_generate_content_endpoint / litellm_utils datafiles
  that were produced and required but missing from the combine list

* fix(ci): close coverage gaps in proxy-legacy, router-unit, auth-ui, caching-redis

- test-unit-proxy-legacy: route through _test-unit-base so the full
  proxy_unit_tests suite (incl. comprehensive test_proxy_server*.py) is
  measured and uploaded with per-group flags (was plain pytest, no --cov)
- _test-unit-services-base: declare the enable-redis input + the six
  secrets test-unit-caching-redis passes; that workflow had a workflow_call
  signature mismatch and startup_failed on every push (never ran).
  Changes are additive/optional - proxy-db and security callers unchanged
- circleci: add --cov + persist + combine + upload-coverage requires for
  litellm_router_unit_testing (tests/router_unit_tests) and
  auth_ui_unit_tests (tests/proxy_admin_ui_tests); neither was covered
  anywhere. Redundant -k subset jobs left as-is (local_testing covers them)

* fix(ci): remove dead GHA Redis workflow; keep Redis on CircleCI only

CircleCI redis_caching_unit_tests already runs the exact same files
(tests/local_testing/test_dual_cache.py, test_redis_batch_optimizations.py,
test_router_utils.py) with --cov, and that datafile is already combined
and uploaded. The GHA test-unit-caching-redis workflow was redundant and
had never run (workflow_call signature mismatch -> startup_failure on
every push).

- Delete .github/workflows/test-unit-caching-redis.yml
- Revert _test-unit-services-base.yml to the flag-fix state (drop the
  enable-redis input / secrets / env wiring added only to prop up the
  GHA Redis workflow); the verified per-upload flags line is kept
- The only single-star "litellm_*" branch glob lived in the deleted
  file; no other single-star globs exist, so none remain to widen

* fix(ci): keep proxy-legacy as a standalone job to preserve required check names

Routing proxy-legacy through the reusable workflow renamed each check from
the bare matrix name (e.g. "proxy-response-and-misc") to
"proxy-response-and-misc / Run tests". Those bare names are required status
checks in branch protection, so the old contexts never reported and PRs sat
"Expected — Waiting for status to be reported" indefinitely.

Restore the original standalone matrix job (job name == matrix name, so the
required contexts report again) and add coverage in place: --cov on pytest
plus an OIDC Codecov upload flagged proxy-legacy-<group>. Net effect of the
gap-#2 fix is preserved (flagged coverage for tests/proxy_unit_tests/**)
without changing any check name.

* revert(ci): drop all proxy-legacy changes from this PR

tests/proxy_unit_tests/** is already fully covered by test-unit-proxy-db
(its shard-coverage guard fails CI if any file in that dir is unassigned),
which this PR already flags + carryforwards. Adding --cov and id-token:write
to the legacy pull_request job was redundant and put OIDC on a job that runs
untrusted PR code. Restore the file to the base version verbatim so this PR
no longer touches proxy-legacy at all (also restores its original required
check names). Retiring proxy-legacy in favor of proxy-db on pull_request is
a separate effort that needs a branch-protection change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant