feat(testing): e2e scenario harness — real boot, echo provider, cross-LLM evaluation#170
feat(testing): e2e scenario harness — real boot, echo provider, cross-LLM evaluation#170ryaker wants to merge 4 commits into
Conversation
SECURITY.md was last updated for v0.6 but Zora is now v0.12.0. Nine undocumented security subsystems have been added since v0.6. New sections added: - Irreversibility Scoring: 0-100 per-action scores, configurable thresholds - HITL Approval Gate: ApprovalQueue routing via Telegram/Signal - Session Risk Forecasting: MemoryRiskForecaster drift/salami/creep signals - Subagent Reputation: AgentCooldown with escalating denial thresholds - Channel Security: CaMeL dual-LLM quarantine, Casbin RBAC, 4 invariants - Per-Project Security Policy: .zora/security-policy.toml with parent ceiling - zora security audit: daemon startup gate - Tool Hook Pipeline: 6 built-in hooks running before every tool call Updated sections: - Security Architecture Summary table: 9 new subsystem rows - OWASP matrix: encoding coverage, channel quarantine, ASI-06 row - Audit event types: 12 new v0.12 event types documented - Implementation status: all previously "in progress" items now Active Closes #159 (PR D from security review plan) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
mv (40) was listed after shell_exec (50), breaking sort order. Fixes Gemini code review suggestion on PR #159. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…og + Meta Graph API social 4-agent Zora team (signal → writer → image → publisher): - SignalAgent: pulls topics-next, aborts if < 3 signals - WriterAgent: StoryBrand MDX with ≥3 cited expert signals - ImageAgent: hero (customer-focused 16:9) + social (Sophia 1:1) - PublisherAgent: git + vercel --prod + Facebook/Instagram Graph API Human gate via Telegram after WriterAgent (2hr timeout → auto-approve). Social posting via Meta Graph API (not browser automation). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s-LLM eval pattern, CI on linux+macos Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
CodeAnt AI is reviewing your PR. Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThe PR introduces end-to-end testing infrastructure with a new Changes
Sequence Diagram(s)sequenceDiagram
participant User as User/Scheduler
participant SignalAgent as Signal Agent
participant WriterAgent as Writer Agent
participant HumanGate as Human Gate<br/>(Telegram)
participant ImageAgent as Image Agent
participant PublisherAgent as Publisher Agent
participant Workspace as ~/.zora/workspace
participant GitHub as GitHub<br/>+ Social APIs
User->>SignalAgent: trigger content-pipeline
SignalAgent->>SignalAgent: sophia-wire topics-next<br/>+ brief/context
SignalAgent->>Workspace: write {TODAY}-brief.json<br/>(signals, domains, audience)
SignalAgent->>WriterAgent: transfer control
WriterAgent->>Workspace: read {TODAY}-brief.json
WriterAgent->>WriterAgent: generate StoryBrand MDX<br/>(3+ signal citations)
WriterAgent->>Workspace: write {slug}.mdx<br/>+ preview.txt
WriterAgent->>HumanGate: approval request
HumanGate->>HumanGate: await Telegram<br/>approve/reject<br/>(120 min timeout)
alt approval granted
HumanGate->>ImageAgent: proceed
else timeout/reject
HumanGate->>User: abort/alert
end
ImageAgent->>Workspace: read {TODAY}-brief.json
ImageAgent->>ImageAgent: NanoBanana MCP:<br/>hero (16:9)<br/>+ social (1:1)
ImageAgent->>Workspace: save images<br/>to content/images/
ImageAgent->>PublisherAgent: transfer
PublisherAgent->>GitHub: copy MDX + hero image<br/>to content/blog/
PublisherAgent->>GitHub: git add/commit/push<br/>+ vercel --prod
PublisherAgent->>GitHub: verify HTTP 200
PublisherAgent->>PublisherAgent: sophia-wire<br/>topics-publish
PublisherAgent->>PublisherAgent: generate soundbites
PublisherAgent->>GitHub: Facebook Graph API<br/>(with retry on fail)
PublisherAgent->>GitHub: Instagram publish<br/>(2-step, 1 retry)
PublisherAgent->>User: Telegram completion<br/>via Claude Ops
sequenceDiagram
participant Harness as E2E Harness<br/>(Vitest)
participant CLI as Zora CLI<br/>(zora ask)
participant PrimaryProvider as Primary Provider<br/>(EchoProvider)
participant SecondaryProvider as Secondary Provider<br/>(EchoProvider)
participant SessionStore as ~/.zora/sessions<br/>(JSONL)
Harness->>Harness: copy e2e-config.toml<br/>to temp .zora/
Harness->>CLI: spawn: zora ask<br/>'reverse: ...'<br/>(ZORA_CONFIG_DIR)
CLI->>PrimaryProvider: route (rank 1)
PrimaryProvider->>PrimaryProvider: keyword match<br/>→ reversed text
PrimaryProvider->>CLI: emit events<br/>(task.start, text, task.end, done)
CLI->>SessionStore: write JSONL<br/>session file
Harness->>SessionStore: parse session<br/>verify event order<br/>+ provider source
Harness->>CLI: spawn: zora ask<br/>'evaluate: <output>'<br/>(disabled primary)
CLI->>SecondaryProvider: failover (rank 2)
SecondaryProvider->>SecondaryProvider: keyword match: evaluate<br/>→ 'EVALUATION: ...'
SecondaryProvider->>CLI: emit events
CLI->>SessionStore: write JSONL<br/>(2nd session file)
Harness->>SessionStore: verify EVALUATION: present<br/>+ timestamp ordering<br/>+ parallel execution
Harness->>Harness: assert all exit 0<br/>+ no prompt leaks
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the project's testing infrastructure by introducing a robust end-to-end scenario harness, complete with a new deterministic EchoProvider and a sophisticated cross-LLM evaluation pattern. Concurrently, it updates the security documentation to reflect a suite of advanced defense-in-depth features for version 0.12.0, moving towards a more layered and resilient security posture. Additionally, a detailed multi-agent content pipeline example is provided, showcasing complex routine orchestration with human-in-the-loop approval. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
Sequence DiagramThis PR adds a deterministic Echo provider and a scenario-based E2E harness that boots the real CLI. The core flow validates provider selection and a two-step generator then evaluator pattern, with each CLI run writing its own session log and running in CI across Linux and macOS. sequenceDiagram
participant CI
participant E2EHarness
participant ZoraCLI
participant ProviderRouter
participant EchoProvider
participant SessionStore
CI->>E2EHarness: Run e2e scenarios
E2EHarness->>ZoraCLI: Ask generator task
ZoraCLI->>ProviderRouter: Load config and choose available provider
ProviderRouter->>EchoProvider: Execute task
EchoProvider-->>ZoraCLI: Deterministic response events
ZoraCLI->>SessionStore: Write session JSONL file
E2EHarness->>ZoraCLI: Ask evaluator task with generated output
ZoraCLI->>ProviderRouter: Route to evaluator provider and execute
ProviderRouter-->>E2EHarness: Evaluator result with second session file
Generated by CodeAnt AI |
Nitpicks 🔍
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant security hardening update (v0.12.0) with a layered defense-in-depth stack, including irreversibility scoring, human-in-the-loop approval, and session risk forecasting. It also refactors the content pipeline with new agents (Signal, Writer, Image, Publisher) and integrates a human approval gate. Additionally, a new EchoProvider and comprehensive end-to-end testing framework have been added. Feedback suggests that the timeout_action in the human approval gate should default to reject to prevent unintended content publication, and the deployment verification process should use a more robust polling mechanism instead of a fixed sleep duration.
| Reply *approve* to publish, *reject* to cancel. | ||
| Auto-publishes in *2 hours* if no response. | ||
| """ | ||
| timeout_action = "approve" # Auto-approve after timeout (not reject) |
There was a problem hiding this comment.
The timeout_action is set to approve, which means content will be automatically published if not reviewed within 2 hours. This could lead to unintended or erroneous content being published without explicit approval. For safety, it's generally better to default to reject on timeout to prevent accidental publications. This ensures that a human must explicitly approve the content before it goes live.
timeout_action = "reject" # Auto-reject after timeout (safer default)
| sleep 120 | ||
| curl -s -o /dev/null -w "%{http_code}" https://www.mymoneycoach.ai/blog/{slug} |
There was a problem hiding this comment.
Using a fixed sleep 120 to wait for deployment propagation is brittle, as deployment times can vary. A more robust approach is to poll the URL in a loop with a timeout until it returns a 200 status code. This avoids both unnecessary waiting and failures due to longer-than-expected deployment times.
| sleep 120 | |
| curl -s -o /dev/null -w "%{http_code}" https://www.mymoneycoach.ai/blog/{slug} | |
| URL="https://www.mymoneycoach.ai/blog/{slug}" | |
| for i in {1..12}; do # Poll for 2 minutes (12 * 10s) | |
| HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "$URL") | |
| if [ "$HTTP_CODE" -eq 200 ]; then | |
| echo "Deployment live!" | |
| break | |
| fi | |
| echo "Attempt $i/12: Not live yet (HTTP $HTTP_CODE). Retrying in 10s..." | |
| sleep 10 | |
| done |
|
Superseded by #171 (clean cherry-pick onto main) |
| const E2E_CONFIG_FIXTURE = path.join(REPO_ROOT, 'tests', 'fixtures', 'e2e-config.toml'); | ||
| const E2E_POLICY_FIXTURE = path.join(REPO_ROOT, 'tests', 'fixtures', 'e2e-policy.toml'); | ||
|
|
||
| // The Orchestrator always writes sessions to ~/.zora/sessions (its baseDir defaults | ||
| // to os.homedir()/.zora regardless of ZORA_CONFIG_DIR). Tests track this dir. |
There was a problem hiding this comment.
Suggestion: ZORA_REAL_PROVIDERS=1 is never read, so test:e2e:real still loads the echo-only fixture and never exercises real providers. Select the fixture file based on ZORA_REAL_PROVIDERS so the real-provider path is actually tested. [logic error]
Severity Level: Major ⚠️
- ⚠️ `test:e2e:real` does not validate Claude/Gemini integration.
- ⚠️ Real-provider regressions can ship undetected.| const E2E_CONFIG_FIXTURE = path.join(REPO_ROOT, 'tests', 'fixtures', 'e2e-config.toml'); | |
| const E2E_POLICY_FIXTURE = path.join(REPO_ROOT, 'tests', 'fixtures', 'e2e-policy.toml'); | |
| // The Orchestrator always writes sessions to ~/.zora/sessions (its baseDir defaults | |
| // to os.homedir()/.zora regardless of ZORA_CONFIG_DIR). Tests track this dir. | |
| const USE_REAL_PROVIDERS = process.env['ZORA_REAL_PROVIDERS'] === '1'; | |
| const E2E_CONFIG_FIXTURE = path.join( | |
| REPO_ROOT, | |
| 'tests', | |
| 'fixtures', | |
| USE_REAL_PROVIDERS ? 'e2e-config-real.toml' : 'e2e-config.toml', | |
| ); |
Steps of Reproduction ✅
1. Run `npm run test:e2e:real`; script sets `ZORA_REAL_PROVIDERS=1` in `package.json:28`.
2. Test constants still hardcode `e2e-config.toml` at
`tests/e2e/scenario-harness.test.ts:31`; no runtime read of `ZORA_REAL_PROVIDERS`.
3. `createTempZoraDir()` copies that fixture into temp config at
`tests/e2e/scenario-harness.test.ts:72`.
4. The copied fixture defines only echo providers (`tests/fixtures/e2e-config.toml:7-20`).
5. CLI provider factory builds `EchoProvider` for `type='echo'` at
`src/cli/index.ts:60-80`, so real-provider route is never exercised.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/e2e/scenario-harness.test.ts
**Line:** 31:35
**Comment:**
*Logic Error: `ZORA_REAL_PROVIDERS=1` is never read, so `test:e2e:real` still loads the echo-only fixture and never exercises real providers. Select the fixture file based on `ZORA_REAL_PROVIDERS` so the real-provider path is actually tested.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| const globalZoraDir = path.join(os.homedir(), '.zora'); | ||
| fs.mkdirSync(globalZoraDir, { recursive: true }); | ||
| fs.copyFileSync(E2E_POLICY_FIXTURE, globalPolicyPath); | ||
| } |
There was a problem hiding this comment.
Suggestion: The test creates ~/.zora/policy.toml when missing but never removes it, leaving a permissive test policy in the developer's real home directory after tests complete. Add cleanup for the file created by the harness to avoid persistent global security-state mutation. [security]
Severity Level: Major ⚠️
- ⚠️ Leaves persistent `~/.zora/policy.toml` after tests.
- ⚠️ Later local runs inherit permissive test policy.| } | |
| process.once('exit', () => { | |
| try { | |
| fs.rmSync(globalPolicyPath, { force: true }); | |
| } catch { | |
| // Best-effort cleanup | |
| } | |
| }); |
Steps of Reproduction ✅
1. Start with no `~/.zora/policy.toml`; `createTempZoraDir()` checks and writes one at
`tests/e2e/scenario-harness.test.ts:75-79`.
2. Written file is permissive test policy (`tests/fixtures/e2e-policy.toml:1,22` allows
broad access).
3. Test teardown only removes temp dirs (`tests/e2e/scenario-harness.test.ts:249-252`),
not global policy.
4. Future CLI runs load global policy as required base from
`src/config/policy-loader.ts:102-107`, so test-created policy persists beyond test
execution.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/e2e/scenario-harness.test.ts
**Line:** 80:80
**Comment:**
*Security: The test creates `~/.zora/policy.toml` when missing but never removes it, leaving a permissive test policy in the developer's real home directory after tests complete. Add cleanup for the file created by the harness to avoid persistent global security-state mutation.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| const newFiles = sessionFilesNewerThan(sinceMs); | ||
| expect(newFiles.length, 'Expected a new session file').toBeGreaterThan(0); | ||
|
|
||
| const events = parseJsonl(newFiles[0]!); |
There was a problem hiding this comment.
Suggestion: Selecting newFiles[0] can read an unrelated session when other Zora runs create files in the same global sessions directory, causing false routing assertions. Match the session file by the scenario prompt before asserting provider source. [logic error]
Severity Level: Major ⚠️
- ⚠️ Scenario 3 can fail from unrelated session writes.
- ⚠️ Provider-routing check becomes flaky and nondeterministic.| const events = parseJsonl(newFiles[0]!); | |
| const targetPrompt = 'summarize this task for the evaluator'; | |
| const matchingFile = newFiles.find((file) => { | |
| const events = parseJsonl(file); | |
| return events.some((e) => { | |
| if (e['type'] !== 'task.start') return false; | |
| const content = e['content'] as { task?: unknown } | undefined; | |
| return typeof content?.task === 'string' && content.task === targetPrompt; | |
| }); | |
| }); | |
| expect(matchingFile, 'Expected a session file for this scenario prompt').toBeDefined(); | |
| const events = parseJsonl(matchingFile!); |
Steps of Reproduction ✅
1. Scenario 3 writes one task via `spawnAsk('summarize...')` at
`tests/e2e/scenario-harness.test.ts:317-320`.
2. It gathers all `~/.zora/sessions/*.jsonl` newer than timestamp
(`sessionFilesNewerThan`, `tests/e2e/scenario-harness.test.ts:169-177`) and sorts by
mtime.
3. Session directory is global (`GLOBAL_SESSIONS_DIR`,
`tests/e2e/scenario-harness.test.ts:36`) and shared with any other `zora-agent ask`
process.
4. Assertion parses only `newFiles[0]` (`tests/e2e/scenario-harness.test.ts:327`), so any
unrelated newer session can be selected.
5. This can mis-assert provider routing at `tests/e2e/scenario-harness.test.ts:333`
against the wrong task file.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/e2e/scenario-harness.test.ts
**Line:** 327:327
**Comment:**
*Logic Error: Selecting `newFiles[0]` can read an unrelated session when other Zora runs create files in the same global sessions directory, causing false routing assertions. Match the session file by the scenario prompt before asserting provider source.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| const newFiles = sessionFilesNewerThan(sinceMs); | ||
| expect(newFiles.length, 'Expected a session file from fallback run').toBeGreaterThan(0); | ||
|
|
||
| const events = parseJsonl(newFiles[0]!); |
There was a problem hiding this comment.
Suggestion: This failover check also parses only the newest session file, which can belong to a different run in the shared global sessions directory and produce false failures. Resolve the session by matching the task payload first. [logic error]
Severity Level: Major ⚠️
- ⚠️ Scenario 4 failover check can target wrong session.
- ⚠️ E2E failover signal becomes flaky on shared machines.| const events = parseJsonl(newFiles[0]!); | |
| const targetPrompt = 'Write a function that counts characters'; | |
| const matchingFile = newFiles.find((file) => { | |
| const events = parseJsonl(file); | |
| return events.some((e) => { | |
| if (e['type'] !== 'task.start') return false; | |
| const content = e['content'] as { task?: unknown } | undefined; | |
| return typeof content?.task === 'string' && content.task === targetPrompt; | |
| }); | |
| }); | |
| expect(matchingFile, 'Expected a session file for fallback scenario prompt').toBeDefined(); | |
| const events = parseJsonl(matchingFile!); |
Steps of Reproduction ✅
1. Scenario 4 runs failover prompt at `tests/e2e/scenario-harness.test.ts:349-352`.
2. It then reads all newer files from shared `~/.zora/sessions` using
`sessionFilesNewerThan()` (`tests/e2e/scenario-harness.test.ts:169-177`).
3. It parses only newest file `newFiles[0]` (`tests/e2e/scenario-harness.test.ts:360`) and
checks for `echo-evaluator`.
4. If another Zora process writes a newer session in the same window, parsed events belong
to another task, producing false failover failures.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/e2e/scenario-harness.test.ts
**Line:** 360:360
**Comment:**
*Logic Error: This failover check also parses only the newest session file, which can belong to a different run in the shared global sessions directory and produce false failures. Resolve the session by matching the task payload first.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
CodeAnt AI finished reviewing your PR. |
User description
Summary
Test plan
🤖 Generated with Claude Code
CodeAnt-AI Description
Add a built-in echo provider and end-to-end scenario harness
What Changed
Impact
✅ Reliable e2e runs without API keys✅ Fewer release regressions in CLI flows✅ Wider CI coverage on macOS and Linux💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.
Summary by CodeRabbit
New Features
Documentation
Tests