fix(sentry): path rules beat parent_sampled in traces_sampler#925
Merged
thomasrockhu-codecov merged 2 commits intoMay 21, 2026
Merged
Conversation
The shelter ingress sits in front of codecov-api and initializes its own Sentry SDK with `traces_sample_rate=1.0`. Every request shelter forwards arrives at api with `parent_sampled=True`, which the current sampler honors before checking path rules. The result: webhooks, badges, and health probes coming through shelter are re-sampled at 100% on api, defeating the purpose of the per-route rates introduced in #924. Verified in production after #924's rollout: `/webhooks/github` on the new release was still ingesting ~50K transactions / 15 min on api, every one of them with a non-empty `parent_span` (i.e. shelter-initiated trace propagation), instead of the ~50 we'd expect at a 0.001 sample rate. Reorder the sampler so path rules run first. The intent of a path rule ("we don't want traces for this route") should beat any upstream sampling decision; for everything else, we still honor `parent_sampled` so distributed traces from internal callers (worker → api, etc.) stay coherent. Tests are extended to cover both directions: `parent_sampled=True` no longer overrides health/monitoring/webhook/badge rates, but still wins on routes without a path rule. Long term we should also stop shelter from starting traces at 100% for high-volume webhook routes — sampling decisions belong at the trace origin. That's a separate change that touches the shelter repo and SDK init.
Contributor
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #925 +/- ##
==========================================
- Coverage 91.90% 91.90% -0.01%
==========================================
Files 1317 1317
Lines 50699 50698 -1
Branches 1625 1625
==========================================
- Hits 46595 46594 -1
Misses 3798 3798
Partials 306 306
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! 🚀 New features to boost your workflow:
|
Collapse the multi-paragraph Precedence section in the module docstring to a single 4-line paragraph and drop the inline comments inside the sampler function and TestParentSampled. The control flow and test names are self-describing.
jason-ford-codecov
approved these changes
May 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Path-based drop and down-sample rules in the api
traces_sampler(introduced in #924) are currently being defeated by trace propagation from the shelter ingress. Reorder the sampler so path rules win overparent_sampled, restoring the intended sampling rates on/webhooks/github,/health/,/monitoring/..., and badge URLs.The problem
shelter/libs/helpers.pyinitializes the Sentry SDK withtraces_sample_rate=1.0:```python
sentry_sdk.init(
dsn=sentry_dsn,
environment=f"{app_name}-{APP_ENV}",
traces_sample_rate=1.0,
...
)
```
shelter samples every inbound webhook at 100% and forwards the request to codecov-api with
sentry-traceheaders. On the api side, the current sampler does this first:```python
parent_sampled = sampling_context.get("parent_sampled")
if parent_sampled is True:
return 1.0 # ← shelter wins; path rules never run
```
So every webhook coming through shelter gets sampled at 100% on api regardless of the route. The 0.1%
webhook_github_ratefrom #924 only applies to requests that hit api without going through shelter (internal callers, k8s probes).Verification in production
After #924 rolled out to
production-release-894811b, I checked the data:/webhooks/githubon the new api release: ~50,000 transactions / 15 min (expected ~50 at a 0.1% rate)parent_span:```
parent_span: be1c69666492f039
parent_span: a036d3cc41de412e
parent_span: ba42426f4dba6c06
parent_span: 97ac1285a0638d39
parent_span: ab8bb9b2833eb42b
```
That's the shelter→api hop overriding the api-side sampler.
Health checks (
/,/health/,/monitoring/metrics) and badge URLs are not affected in practice because they don't typically come through shelter — those rules are working correctly today. But the precedence is wrong in principle, and/webhooks/githubis the single largest cost center on the org-wide spans bill.The fix
One-block reorder in
codecov.sentry_sampling.make_traces_sampler:```python
def traces_sampler(sampling_context):
# Path-based rules first: a "drop this route" decision must beat
# parent_sampled, otherwise shelter's traces_sample_rate=1.0 in
# front of api would re-sample every webhook/badge/health probe.
path = _extract_path(sampling_context)
if path:
if path in _HEALTH_PATHS:
return 0.0
if path.startswith(_MONITORING_PREFIX):
return 0.0
if path in _WEBHOOK_GITHUB_PATHS:
return webhook_github_rate
if _BADGE_PATH_RE.search(path):
return badge_rate
```
The semantic change is precise: a path rule always applies on a route it matches, regardless of what upstream decided. For everything else,
parent_sampledcontinues to govern, so worker→api Celery beats / cron-initiated traces stay coherent.What about distributed traces?
For routes covered by a path rule we accept a partial distributed trace as a tradeoff: shelter's webhook-ingress span is kept at 100% (under its own SDK init), and the api-side downstream is sampled per the rule. That's the right tradeoff for high-volume, low-variance routes like
/webhooks/github— we keep enough samples to debug latency and errors, and we drop the 99.9% noise.The cleaner long-term fix is to give shelter its own
traces_samplerthat down-samples/webhooks/githubat the trace origin. I'll file a follow-up for that — it touches the shelter repo and is a larger change. This PR is the immediate cost cut.Tests
TestParentSampledintest_sentry_sampling.pyis expanded to cover both directions:parent_sampled=Trueno longer overrides the rates on/,/health/,/api_health/,/monitoring/metrics,/webhooks/github,/webhooks/github/, default-badge and bundle-badge URLs.parent_sampled=Falsesimilarly doesn't override path rules (a slightly stricter interpretation than the alternative — keeps the semantics symmetric and simple).parent_sampled=Truestill wins on routes without a path rule (e.g./api/v2/foo) so distributed-trace coherence is preserved where it matters.parent_sampled=Falsestill drops to 0 on non-rule routes.parent_sampled=Trueand no path → 1.0 (unchanged).```
$ pytest apps/codecov-api/codecov/tests/test_sentry_sampling.py -q
62 passed in 0.19s
```
Expected impact after rollout
Re-querying
/webhooks/github30 min after deploy should show:That eliminates the last major chunk of the org-wide span bill on api. Combined with #923 (middleware spans off) and #924 (other routes), this should be effectively the end-state for the api project. ecdn-api and slack-app still need their respective deploys to fully roll over.
Refs
traces_samplerprecedence: https://docs.sentry.io/platforms/python/configuration/sampling/Test plan
transaction:/webhooks/github project:api release:production-release-<new>count drops by ~1000× vs the current rate/webhooks/gitlab,/webhooks/bitbucket, ...) remain unchanged in volumeMade with Cursor
Note
Medium Risk
Changes Sentry tracing sampling precedence for high-volume routes, which can significantly alter production trace volume and distributed-trace continuity. Risk is mitigated by expanded unit coverage across parent-sampled and path-rule combinations.
Overview
Updates
codecov.sentry_sampling.make_traces_samplerto evaluate path-based drop/downsample rules before honoringparent_sampled, ensuring/health*,/monitoring/*, badge URLs, and/webhooks/githubkeep their intended sampling rates even when upstream services propagate a sampled trace.Expands
TestParentSampledto assert that bothparent_sampled=Trueandparent_sampled=Falseno longer override these path rules, whileparent_sampledstill controls sampling for non-matching routes.Reviewed by Cursor Bugbot for commit 1c346f2. Bugbot is set up for automated code reviews on this repo. Configure here.