fix: upload CLI defaults to replay=True, add --no-replay opt-out#22
fix: upload CLI defaults to replay=True, add --no-replay opt-out#22colombod wants to merge 2 commits into
Conversation
The server /events endpoint accepts ?replay=true to bypass the 7-day
in-memory idempotency cache. The upload CLI now passes this by default
so re-uploading old sessions always lands correctly in Neo4j regardless
of how recently the same events were previously sent.
--no-replay restores the old cache-enforced deduplication for callers
that want explicit deduplication (e.g. uploading a session that is
currently being captured live).
Files changed:
- uploader.py: replay: bool = True param, params={"replay": True} on POST
- cli.py: --no-replay flag, replay=not args.no_replay forwarding,
_DETAILED_HELP IDEMPOTENCY section rewritten to reflect new default
- tests/test_uploader.py: test default sends params={"replay": True}
- tests/test_cli.py: --no-replay behavioural tests
httpx serialises Python bool True to the string "True" (capital T) via str(),
but the server expects lowercase "true". This was causing the replay flag to
be silently ignored on POST /events calls.
Changes:
- uploader.py: Changed {"replay": True} to {"replay": "true"} and updated
the type hint from dict[str, Any] | None to dict[str, str] | None.
- test_uploader.py: Updated the assertion to match the corrected string value
and added an explanatory comment describing why the string form is required.
- cli.py: Corrected the help text reference from replay=True to replay=true
and reformatted [--no-replay] to its own continuation line for consistency.
All 168 tests pass. Ruff clean. Pyright 0 errors.
Generated with Amplifier
Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
|
Reviewed. Diagnosis (7-day cache silently dedup'ing replays so Session.started_at goes missing) and fix (default Could you link or quote the server-side This is a server-repo question, not a critique of this PR. If the server is pure-MERGE: approve immediately. (The new default has no interaction with the real-time hook path or CR-1 parent_id propagation — the upload CLI is offline-batch only.) |
|
Verified against the server repo ( All Neo4j writes go through # Nodes (line 402-404)
"UNWIND $rows AS row "
"MERGE (n:Session {node_id: row.node_id, workspace: row.props.workspace}) "
"SET n += row.props"
# Non-session nodes (line 411-413) — same pattern
"MERGE (n {node_id: row.node_id, workspace: row.props.workspace}) "
"SET n += row.props"
# Edges (line 475-479) — same pattern
"MERGE (src)-[r:{edge_type}]->(dst) "
"SET r += row.props"
Label patches (lines 426, 440, 450): In-memory counters ( No webhooks or external HTTP calls in the handler chain — grepped Blob store ( One edge case to be aware of: Conclusion: The server is pure-MERGE for all durable state. |
Problem
The upload CLI never passed
?replay=trueto the server. The server's in-memoryidempotency cache (7-day TTL) silently dropped
session:startevents on re-upload,leaving Session nodes without
started_at. This caused re-uploaded old sessions toappear in Neo4j without their original timestamps.
Changes
uploader.py—run_upload()gainsreplay: bool = True. WhenTrue(thedefault), every POST is sent with
?replay=true, bypassing the server-side cache.Set to
Falseto restore the old deduplication behaviour.cli.py—--no-replayflag added. Default is replay=True (no flag needed)._DETAILED_HELPIDEMPOTENCY section rewritten to accurately describe the new default:cache is bypassed, Neo4j idempotency is provided by
MERGE + SET n += row.props.Why
--no-replayinstead of--replayThe upload CLI is always a deliberate replay of existing JSONL data. There is no
scenario where silently skipping events is the right default.
--no-replayis theexplicit override for the rare case (e.g. uploading a live in-progress session).
Verification
uv run pytest -q)All checks passed!0 errors, 0 warnings, 0 informationsassert None == {'replay': True}on main,passes on this branch