Skip to content

fix(sentry): path rules beat parent_sampled in traces_sampler#925

Merged
thomasrockhu-codecov merged 2 commits into
mainfrom
thomas-hu/sentry-sampler-path-rules-precedence
May 21, 2026
Merged

fix(sentry): path rules beat parent_sampled in traces_sampler#925
thomasrockhu-codecov merged 2 commits into
mainfrom
thomas-hu/sentry-sampler-path-rules-precedence

Conversation

@thomasrockhu-codecov
Copy link
Copy Markdown
Contributor

@thomasrockhu-codecov thomasrockhu-codecov commented May 21, 2026

Summary

Path-based drop and down-sample rules in the api traces_sampler (introduced in #924) are currently being defeated by trace propagation from the shelter ingress. Reorder the sampler so path rules win over parent_sampled, restoring the intended sampling rates on /webhooks/github, /health/, /monitoring/..., and badge URLs.

The problem

shelter/libs/helpers.py initializes the Sentry SDK with traces_sample_rate=1.0:

```python
sentry_sdk.init(
dsn=sentry_dsn,
environment=f"{app_name}-{APP_ENV}",
traces_sample_rate=1.0,
...
)
```

shelter samples every inbound webhook at 100% and forwards the request to codecov-api with sentry-trace headers. On the api side, the current sampler does this first:

```python
parent_sampled = sampling_context.get("parent_sampled")
if parent_sampled is True:
return 1.0 # ← shelter wins; path rules never run
```

So every webhook coming through shelter gets sampled at 100% on api regardless of the route. The 0.1% webhook_github_rate from #924 only applies to requests that hit api without going through shelter (internal callers, k8s probes).

Verification in production

After #924 rolled out to production-release-894811b, I checked the data:

  • /webhooks/github on the new api release: ~50,000 transactions / 15 min (expected ~50 at a 0.1% rate)
  • Every single sampled transaction had a non-empty parent_span:

```
parent_span: be1c69666492f039
parent_span: a036d3cc41de412e
parent_span: ba42426f4dba6c06
parent_span: 97ac1285a0638d39
parent_span: ab8bb9b2833eb42b
```

That's the shelter→api hop overriding the api-side sampler.

Health checks (/, /health/, /monitoring/metrics) and badge URLs are not affected in practice because they don't typically come through shelter — those rules are working correctly today. But the precedence is wrong in principle, and /webhooks/github is the single largest cost center on the org-wide spans bill.

The fix

One-block reorder in codecov.sentry_sampling.make_traces_sampler:

```python
def traces_sampler(sampling_context):
# Path-based rules first: a "drop this route" decision must beat
# parent_sampled, otherwise shelter's traces_sample_rate=1.0 in
# front of api would re-sample every webhook/badge/health probe.
path = _extract_path(sampling_context)
if path:
if path in _HEALTH_PATHS:
return 0.0
if path.startswith(_MONITORING_PREFIX):
return 0.0
if path in _WEBHOOK_GITHUB_PATHS:
return webhook_github_rate
if _BADGE_PATH_RE.search(path):
return badge_rate

# Otherwise honor upstream sampling decisions so distributed traces
# from internal callers (worker → api, etc.) stay coherent.
parent_sampled = sampling_context.get(\"parent_sampled\")
if parent_sampled is True:
    return 1.0
if parent_sampled is False:
    return 0.0

return default_rate

```

The semantic change is precise: a path rule always applies on a route it matches, regardless of what upstream decided. For everything else, parent_sampled continues to govern, so worker→api Celery beats / cron-initiated traces stay coherent.

What about distributed traces?

For routes covered by a path rule we accept a partial distributed trace as a tradeoff: shelter's webhook-ingress span is kept at 100% (under its own SDK init), and the api-side downstream is sampled per the rule. That's the right tradeoff for high-volume, low-variance routes like /webhooks/github — we keep enough samples to debug latency and errors, and we drop the 99.9% noise.

The cleaner long-term fix is to give shelter its own traces_sampler that down-samples /webhooks/github at the trace origin. I'll file a follow-up for that — it touches the shelter repo and is a larger change. This PR is the immediate cost cut.

Tests

TestParentSampled in test_sentry_sampling.py is expanded to cover both directions:

  • parent_sampled=True no longer overrides the rates on /, /health/, /api_health/, /monitoring/metrics, /webhooks/github, /webhooks/github/, default-badge and bundle-badge URLs.
  • parent_sampled=False similarly doesn't override path rules (a slightly stricter interpretation than the alternative — keeps the semantics symmetric and simple).
  • parent_sampled=True still wins on routes without a path rule (e.g. /api/v2/foo) so distributed-trace coherence is preserved where it matters.
  • parent_sampled=False still drops to 0 on non-rule routes.
  • Non-HTTP transactions (Celery/cron) with parent_sampled=True and no path → 1.0 (unchanged).

```
$ pytest apps/codecov-api/codecov/tests/test_sentry_sampling.py -q
62 passed in 0.19s
```

Expected impact after rollout

Re-querying /webhooks/github 30 min after deploy should show:

  • transactions / 15 min on api dropping from ~50,000 to ~50 (1000× reduction)
  • span count proportionally
  • distributed traces remain visible at the shelter level for incident triage; the api-side detail is sampled

That eliminates the last major chunk of the org-wide span bill on api. Combined with #923 (middleware spans off) and #924 (other routes), this should be effectively the end-state for the api project. ecdn-api and slack-app still need their respective deploys to fully roll over.

Refs

Test plan

  • CI green
  • After deploy, verify in Sentry Explore:
    • transaction:/webhooks/github project:api release:production-release-<new> count drops by ~1000× vs the current rate
    • Other webhook providers (/webhooks/gitlab, /webhooks/bitbucket, ...) remain unchanged in volume
    • Default-rate routes (uploads, REST API, GraphQL) remain unchanged
    • Distributed traces that do start in api (no parent) and propagate to worker still link correctly

Made with Cursor


Note

Medium Risk
Changes Sentry tracing sampling precedence for high-volume routes, which can significantly alter production trace volume and distributed-trace continuity. Risk is mitigated by expanded unit coverage across parent-sampled and path-rule combinations.

Overview
Updates codecov.sentry_sampling.make_traces_sampler to evaluate path-based drop/downsample rules before honoring parent_sampled, ensuring /health*, /monitoring/*, badge URLs, and /webhooks/github keep their intended sampling rates even when upstream services propagate a sampled trace.

Expands TestParentSampled to assert that both parent_sampled=True and parent_sampled=False no longer override these path rules, while parent_sampled still controls sampling for non-matching routes.

Reviewed by Cursor Bugbot for commit 1c346f2. Bugbot is set up for automated code reviews on this repo. Configure here.

The shelter ingress sits in front of codecov-api and initializes its own
Sentry SDK with `traces_sample_rate=1.0`. Every request shelter forwards
arrives at api with `parent_sampled=True`, which the current sampler
honors before checking path rules. The result: webhooks, badges, and
health probes coming through shelter are re-sampled at 100% on api,
defeating the purpose of the per-route rates introduced in #924.

Verified in production after #924's rollout: `/webhooks/github` on the
new release was still ingesting ~50K transactions / 15 min on api, every
one of them with a non-empty `parent_span` (i.e. shelter-initiated trace
propagation), instead of the ~50 we'd expect at a 0.001 sample rate.

Reorder the sampler so path rules run first. The intent of a path rule
("we don't want traces for this route") should beat any upstream
sampling decision; for everything else, we still honor `parent_sampled`
so distributed traces from internal callers (worker → api, etc.) stay
coherent.

Tests are extended to cover both directions: `parent_sampled=True` no
longer overrides health/monitoring/webhook/badge rates, but still wins
on routes without a path rule.

Long term we should also stop shelter from starting traces at 100% for
high-volume webhook routes — sampling decisions belong at the trace
origin. That's a separate change that touches the shelter repo and SDK
init.
@sentry
Copy link
Copy Markdown
Contributor

sentry Bot commented May 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.90%. Comparing base (894811b) to head (1c346f2).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #925      +/-   ##
==========================================
- Coverage   91.90%   91.90%   -0.01%     
==========================================
  Files        1317     1317              
  Lines       50699    50698       -1     
  Branches     1625     1625              
==========================================
- Hits        46595    46594       -1     
  Misses       3798     3798              
  Partials      306      306              
Flag Coverage Δ
apiunit 94.95% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@codecov-notifications
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Collapse the multi-paragraph Precedence section in the module docstring
to a single 4-line paragraph and drop the inline comments inside the
sampler function and TestParentSampled. The control flow and test names
are self-describing.
@thomasrockhu-codecov thomasrockhu-codecov added this pull request to the merge queue May 21, 2026
Merged via the queue into main with commit 1e5678c May 21, 2026
40 checks passed
@thomasrockhu-codecov thomasrockhu-codecov deleted the thomas-hu/sentry-sampler-path-rules-precedence branch May 21, 2026 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants