fix(verify): one-shot consumption of email confirmation tokens#281
Conversation
Coverage Report for CI Build 25600756364Coverage increased (+0.2%) to 85.165%Details
Uncovered Changes
Coverage RegressionsNo coverage regressions found. Coverage Stats
💛 - Coveralls |
3a2a2d6 to
0e39694
Compare
umputun
left a comment
There was a problem hiding this comment.
a few concerns, the first two worth resolving before merge:
-
MarkUsedinterface should return(bool, error)before this ships. Right now any future Redis/DB adapter has to choose between fail-open (returnfalseon backend timeout, allow replay) and fail-closed (returntrue, lock users out during outages). Both choices are wrong, and onceVerifConfirmationStoreis exported with thebool-only signature, changing it later is a breaking change. Suggest:type VerifConfirmationStore interface { MarkUsed(key string, ttl time.Duration) (alreadyUsed bool, err error) }
LoginHandlerthen fails closed on non-nil err with a logged warning. The in-memory default just returnsnil. (provider/verify.go:53-59,v2/provider/verify.go:65-70) -
(*inMemoryVerifStore).MarkUsedis O(n) under the only mutex. Every redemption walks the entireusedmap for expired-entry eviction, holdingsync.Mutexthe whole time. At sustained traffic the map grows towardrequests/sec × 30minand each handler call serializes on a full-map scan. Cheap fixes: gate the sweep onlen(used) > Nor on alastSweepclock, or skip the loop entirely on the hot path (correctness is preserved by the lookup-then-insert pattern, the cap is justrequests within TTL window). (provider/verify.go:74-86,v2/provider/verify.go:84-98)
design points worth addressing (suggested resolutions inline):
-
Default fail-open behind a load balancer is the right default, but the README warning needs to be sharper. Keep the in-memory store as the default. The library's deployment profile is mostly single-instance services that get free protection. The problem is that "instance A consumed, instance B accepts" is a silent failure with no log line, and the current README reads as a recommendation rather than a hard requirement. Suggest replacing the warning with something like:
Multi-instance deployments behind a load balancer must supply a shared backend. With the default in-memory store, replay protection works only on the instance that originally consumed the token; an attacker who hits any other instance can still replay within the TTL. This is silent: the request succeeds and the auth flow completes normally, with no log indicating the protection was bypassed.
-
MarkUsedruns before the handshake-format check, avatar fetch, andTokenService.Set. That's the right point for security (atomic check-and-mark before side effects), but it means a transient downstream failure (avatar saver flapping, token store hiccup) burns a valid token and the user gets 403 on retry. Suggest keeping the current ordering. Magic-link flows let the user re-request a fresh email link, which is preferable to a non-atomic consume-then-rollback design. Worth adding a one-line note in the godoc onLoginHandleror in the README saying "consumption is final; transient errors require requesting a new confirmation email." (provider/verify.go:147-203)
minor:
-
The integration replay test only fires N=2 sequentially. The atomicity claim of
MarkUsedrests on the mutex serializing concurrent same-key calls, but no test exercises that. At.Parallel/errgroup test (50 goroutines redeem the same token, exactly one returnsfalse) locks in the contract cheaply. -
House style for similar policy hooks (
token.AllowedHosts,token.Audience,token.Validator) provides a*Funcadapter for closure-based config. Consider addingVerifConfirmationStoreFuncfor symmetry, or note that the omission is intentional.
The verify provider sends one-shot magic-link tokens by email; the
Email sender's debug log dumped the full body verbatim:
[DEBUG] send "Confirmation token: <jwt>" to victim@example.com
Anyone with log access (centralized logging, crash bundles, support
tools, mail-gateway-adjacent observability) could redeem that token
within its TTL — independently of the legitimate recipient. The
verify-replay PR (#281) limits reuse to one consumption, but the
log-reader can still race the user for that one consumption.
Log only the recipient and the body length:
[DEBUG] send 142-byte message to victim@example.com
Same fix in v1 (provider/sender/email.go:84) and v2
(v2/provider/sender/email.go:84), single PR.
Test: TestEmail_SendDoesNotLogBody captures logger output, sends a
known-secret body to a non-existent SMTP host, and asserts the body
substring is absent while the recipient is present. Added in both
modules.
a3b60da to
d903497
Compare
|
addressed all blocking and the in-scope non-blocking points in the last push:
go fix + race + lint clean in both modules. |
|
no extra changes this round — the previous batch (1-6 from the original review) covered everything in scope. CI is the only thing left to wait on. |
umputun
left a comment
There was a problem hiding this comment.
all 6 round-1 points addressed cleanly, thx. Round-2 introduced a few new things that need a look, plus two I missed in round 1.
new in round 2:
-
VerifConfirmationStoreFuncis missing a typed-nil guard.Opts{VerifConfirmationStore: provider.VerifConfirmationStoreFunc(nil)}produces a non-nil interface that survives theif e.ConfirmationStore != nilcheck atauth.go:435and panics atMarkUsedon the first redemption. The mirrortoken.AllowedHostsFuncdoes have this guard (seev2/provider/redirect.go:35). Same inprovider/verify.go:65-67and v2. -
fail-closed branch has no test. The
(bool, error)signature was added precisely so a backend outage rejects rather than fails open, and the godoc atprovider/verify.go:55-62makes it a security guarantee, but no test exercisesmarkErr != nil → 403. AVerifConfirmationStoreFunc(func(...) (bool, error) { return false, errors.New("boom") })test that asserts 403 + body would lock the contract cheaply. (provider/verify.go:190-194, v2 mirror) -
amortized sweep is dark code.
inMemoryVerifStoreSweepEvery=256plus the eviction branch atprovider/verify.go:109-115are not exercised by any test. Cheap fix: convert the const to a privatevarso a test can lower it to e.g. 4 and verify a sweep actually runs and prunes expired entries. -
TTL contract godoc allows non-conforming early eviction.
provider/verify.go:55-62says TTL is upper bound and "implementations may evict earlier under memory pressure". A conforming Redis adapter under memory pressure could drop the consumed marker before the JWT expires, reopening the replay window in exactly the multi-instance case the PR is supposed to harden against. Tighten the contract to "must retain markers for at least the supplied TTL, or return an error if it cannot".
things I missed in round 1:
-
verifConfirmStoreOfield name. trailing-Oforsync.Onceis non-idiomatic, house style isonce(seeavatar/localfs.go:19,v2/token/jwt.go:91).auth.go:39,v2/auth.go:39. -
token logged via
r.URL.String()on the new fail-closed branch.rest.SendErrorJSONlogs the full URL including?token=<jwt>. The handler's parse/expired branches already do this, so no regression in baseline behavior, but the new fail-closed branch is materially worse: when a Redis outage triggers it the token is still live (store didn't record consumption), so a single log line equals an unredeemed magic link. Worth either hashing the token in the log message on the two new branches, or scrubbingtoken=from the URL before passing the request toSendErrorJSON. (provider/verify.go:189-193, v2 mirror)
coveralls -0.08% is patch-coverage on the new fail-closed + sweep branches, which is exactly what items 2 and 3 above would fix.
The email confirmation JWT issued by VerifyHandler.sendConfirmation was
only protected by its 30-minute expiry: any party who could read the
confirmation link (forwarded email, mail-gateway logs, mailbox archive)
could redeem it independently within the window, creating a separate
authenticated session for the same address.
Add a VerifConfirmationStore interface (MarkUsed key, ttl -> bool) and a
default in-memory implementation (sync.Mutex-protected map keyed by
SHA-256 of the raw token). When VerifyHandler.ConfirmationStore is
non-nil, LoginHandler records the token as consumed on first redemption
and rejects replays with 403 "confirmation token already consumed".
The store key is the SHA-256 of the raw token rather than a jti claim,
so existing token fixtures (which carry no jti) are still de-dup'd
correctly without changing the wire format.
Wired through Service.AddVerifProvider in both auth.go and v2/auth.go.
The store is selected once via sync.Once, with this precedence:
1. Opts.VerifConfirmationStore if non-nil (caller-supplied; required
for multi-instance deployments — Redis or any shared KV).
2. provider.NewInMemoryVerifStore() default — fine for single
instance, broken for LB-fronted multi-instance deployments where
instance A's "used" set is unknown to instance B. Documented in
README under "Confirmation token replay protection".
Tests:
* TestInMemoryVerifStore — store-level coverage (replay, expiry, key
independence).
* TestVerifyHandler_LoginAcceptConfirm_RejectsReplay — integration:
same token redeemed twice -> 200 then 403.
* TestService_AddVerifProvider_UsesCustomConfirmationStore — Opts
injection works.
* TestService_AddVerifProvider_DefaultsToInMemory — default install
+ sync.Once reuse on subsequent AddVerifProvider calls.
|
addressed all 6 round-2 items in the last push:
Verified the scrub on v1 fail-closed test output: go fix + race + lint clean in both modules. |
umputun
left a comment
There was a problem hiding this comment.
all 6 round-2 items verified clean: typed-nil func guard at the handler boundary, fail-closed branch test, sweep-testable via var, TTL godoc tightened, store-name rename, scrub helper applied only on the live-token branches via r.Clone. Tests + race + lint green in both modules.
one issue worth a follow-up (not blocking, marking approve so this can land):
typed-nil VerifConfirmationStoreFunc in Opts silently disables the default in-memory store. The handler-level guard added in round 2 (provider/verify.go:217) correctly normalizes a typed-nil func to no-store, but the Service-level check at auth.go:449-454 (and v2/auth.go:450-455) is still a plain s.opts.VerifConfirmationStore != nil test. A typed-nil VerifConfirmationStoreFunc is a non-nil interface wrapping a nil func, so it survives that check, provider.NewInMemoryVerifStore() is skipped, and at redemption the handler's typed-nil guard normalizes the func to nil. Net result: a user who writes Opts{VerifConfirmationStore: VerifConfirmationStoreFunc(nil)} gets neither their func nor the default store, and replay protection is silently disabled for that exact configuration.
Same shape as the *avatar.Proxy typed-nil in #286 just with a different consequence (silent loss of protection vs panic). The fix is the same as you used in LoginHandler, applied one layer up:
s.verifConfirmStoreOnce.Do(func() {
store := s.opts.VerifConfirmationStore
if fn, ok := store.(provider.VerifConfirmationStoreFunc); ok && fn == nil {
store = nil
}
if store != nil {
s.verifConfirmStore = store
return
}
s.verifConfirmStore = provider.NewInMemoryVerifStore()
})Worth adding TestService_AddVerifProvider_TypedNilStoreFuncFallsBackToDefault symmetric to the existing handler-level typed-nil test, in both v1 and v2.
minor follow-ups:
-
The
verifConfirmStoreO→verifConfirmStoreOncerename made the field longer than the column other fields are aligned to inService.gofmt -d auth.go v2/auth.goreports a diff. CI doesn't enforce gofmt so this is cosmetic, but a one-linegofmt -wpass before merge avoids a noisy diff in the next contributor's IDE-on-save. -
scrubTokenFromRequest's defensive early-return atprovider/verify.go:148-149(r == nil || r.URL == nil || token == "") is the 2 uncovered lines coveralls is flagging. They're dead code along the only invocation path. Either drop the early-return (function is private, contract is "callers always pass a request with token") or add a 5-line direct unit test. Defensive code is fine to keep, just worth noting why coverage isn't 100%. -
adapter-author guidance for
MarkUsederrors: a poorly-implemented Redis/DB adapter that constructsfmt.Errorf("SET %s ...: %w", key, err)would surface the SHA-256 hash through the err log. Not a defect in the in-tree implementation. Worth a one-line note in theVerifConfirmationStoreinterface godoc telling adapter authors not to embedkey(or any caller-supplied data) in returned errors.
nothing blocking. The remaining round-2 work is solid.
…dance Followups to #281 (verify replay protection) raised on the post-merge review. None blocking, all small. 1. Service-level typed-nil VerifConfirmationStoreFunc guard. The handler-level guard added in round 2 normalizes a typed-nil func to nil, but the AddVerifProvider check at auth.go was still a plain `s.opts.VerifConfirmationStore != nil` test. A typed-nil VerifConfirmationStoreFunc is a non-nil interface wrapping a nil func, so it survived that check, the in-memory default was skipped, and at redemption the handler's typed-nil guard normalized it to nil — net result: a user who wrote `Opts{VerifConfirmationStore: VerifConfirmationStoreFunc(nil)}` got neither their func nor the default, and replay protection was silently disabled for that exact configuration. Same shape as the *avatar.Proxy typed-nil case fixed in #286 with a different consequence (silent loss of protection vs panic). Apply the same shape of guard one layer up. New regression test TestService_AddVerifProvider_TypedNilStoreFuncFallsBackToDefault in both v1 and v2. 2. gofmt -w on auth.go / v2/auth.go. The verifConfirmStoreO -> verifConfirmStoreOnce rename in #281 made the field longer than the surrounding column alignment. Cosmetic; CI doesn't enforce gofmt but a noisy IDE-on-save diff for the next contributor. 3. scrubTokenFromRequest unit test (TestScrubTokenFromRequest) in both v1 and v2 — covers the defensive early-return that coveralls flagged as uncovered after #281 (token-missing returns r unchanged, nil r returns nil, token-present returns redacted clone with other query params preserved). 4. Adapter-author guidance on VerifConfirmationStore.MarkUsed godoc: tells external Redis/DB adapter authors not to embed the supplied key in returned errors, since the handler logs err on the fail-closed branch and the key is the SHA-256 of a still-live JWT.
Summary
The email confirmation JWT issued by
VerifyHandler.sendConfirmationwas protected only by its 30-minute expiry: any party who could read the confirmation link (forwarded email, mail-gateway logs, mailbox archive) could redeem it independently within the window, creating a separate authenticated session for the same address.Change
Add a pluggable
VerifConfirmationStoreinterface and an in-memory default. On first redemption the token is recorded as consumed; replays are rejected with403 confirmation token already consumed.Key design choices:
(bool, error)signature -- a Redis/DB adapter must be able to signal a backend outage so callers can fail closed instead of either fail-open (allow replay) or always-reject (lock users out). The TTL contract is strict: dropping a marker before its ttl while the JWT is still valid reopens the replay window the store is meant to close.jticlaim. Existing tokens issued before this fix carry no jti, but they are still de-dup'd correctly because the signed token bytes are unique per issuance. No wire-format change.VerifConfirmationStoreFuncadapter for closure-based config, mirroring theSenderFunc/token.AllowedHostsFunchouse pattern. Includes a typed-nil guard at the call site (a non-nil interface wrapping a nil func is treated as no store configured, matchingAllowedHostsFunc).inMemoryVerifStoreSweepEvery=256) instead of walking the map under the mutex on every redemption -- the lookup-then-insert pattern keeps correctness; sweeping every Nth insert keeps the lock window short.Opts.VerifConfirmationStore. Multi-instance deployments behind a load balancer MUST supply a shared backend (Redis or any shared KV) -- otherwise instance A's "used" set is unknown to instance B and the protection silently fails. The README change makes this an explicit must-do, not a recommendation.sync.OnceinAddVerifProvider: precedence isOpts.VerifConfirmationStoreif non-nil, elseprovider.NewInMemoryVerifStore()default. SubsequentAddVerifProvidercalls reuse the same store.markErr != nilandalreadyUsedSendErrorJSON sites use ascrubTokenFromRequest(r)helper to set?token=<redacted>before logging. The fail-closed branch is materially worse than the existing parse/expired branches because the store did NOT record consumption -- the JWT in the URL is still live.Both
auth.goandv2/auth.goget the same wiring.Test
TestInMemoryVerifStore-- direct unit coverage:MarkUsed, expiry, key independencealreadyUsed=false. Locks the atomicity contract that the mutex ordering provides.TestVerifConfirmationStoreFunc-- adapter calls f and surfaces both bool and err.TestVerifyHandler_LoginAcceptConfirm_RejectsReplay-- integration: sametestConfirmedTokenredeemed twice; first 200, second 403 with "already" in body.TestVerifyHandler_LoginAcceptConfirm_TypedNilStoreFuncDoesNotPanic--var nilFn VerifConfirmationStoreFunc; ConfirmationStore: nilFnsurvives the!= nilcheck; the handler must treat it as no store, not panic.TestVerifyHandler_LoginAcceptConfirm_FailClosedOnStoreError--VerifConfirmationStoreFunc(func(...) (bool, error) { return false, fmt.Errorf("redis down") })triggers the security-critical fail-closed branch; asserts 403 + "store unavailable".TestService_AddVerifProvider_UsesCustomConfirmationStore--Opts.VerifConfirmationStoreinjection takes precedence over the default.TestService_AddVerifProvider_DefaultsToInMemory-- default install when not configured;sync.Onceensures the same store is reused across multipleAddVerifProvidercalls.Same suite added in v1 and v2. Full
go test -race ./...green in both modules;golangci-lint run --new-from-rev=master0 issues.Multi-instance migration note
Existing single-instance callers transparently get replay rejection on upgrade. LB-fronted callers must opt into a shared store via
Opts.VerifConfirmationStore(e.g. Redis), or replay rejection will be inconsistent across instances. The README change in this PR makes this explicit and adds a sample Redis adapter showing the(bool, error)contract.