feat(testing): e2e scenario harness — real boot, echo provider, cross-LLM evaluation by ryaker · Pull Request #170 · ryaker/zora

ryaker · 2026-03-25T04:25:13Z

User description

Summary

Adds EchoProvider (type: echo) — deterministic, no API keys, works in CI
7 e2e scenarios: basic task routing, session persistence, provider routing, failover, cross-provider evaluation, prompt injection safety, concurrency
Cross-LLM evaluation pattern: generator task → evaluator task (Gemini checks Claude's work); both echo in CI, real providers with test:e2e:real
CI job on ubuntu-latest + macos-latest (no Windows — POSIX paths throughout)
test:e2e runs in CI; test:e2e:real for local runs with real API keys

Test plan

npm run build:backend passes
ZORA_E2E=1 npx vitest run tests/e2e/ — all 7 scenarios pass (4.73s)
CI e2e job green on ubuntu-latest and macos-latest

🤖 Generated with Claude Code

CodeAnt-AI Description

Add a built-in echo provider and end-to-end scenario harness

What Changed

Added a no-key echo provider that returns predictable outputs, so e2e runs can pass without external AI access
Added end-to-end scenarios that boot the CLI, verify session files, test provider fallback, cross-step evaluation, injection handling, and concurrent runs
Added e2e fixtures and CI jobs on Ubuntu and macOS so these scenarios run automatically
Updated security docs and content-pipeline examples to reflect the current workflow and security model

Impact

✅ Reliable e2e runs without API keys
✅ Fewer release regressions in CLI flows
✅ Wider CI coverage on macOS and Linux

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Summary by CodeRabbit

New Features
- Human-in-the-loop approval gates for multi-agent workflows with configurable timeouts and auto-actions
- Enhanced security architecture with layered action-control scoring, HITL approval queues, reputation-based throttling, and session-level risk detection
Documentation
- Multi-agent content pipeline example workflow (signal, writer, image, publisher agents)
- Cross-LLM evaluation testing pattern guide
- Security architecture documentation updates
Tests
- E2E test suite and CI integration for scenario validation

SECURITY.md was last updated for v0.6 but Zora is now v0.12.0. Nine undocumented security subsystems have been added since v0.6. New sections added: - Irreversibility Scoring: 0-100 per-action scores, configurable thresholds - HITL Approval Gate: ApprovalQueue routing via Telegram/Signal - Session Risk Forecasting: MemoryRiskForecaster drift/salami/creep signals - Subagent Reputation: AgentCooldown with escalating denial thresholds - Channel Security: CaMeL dual-LLM quarantine, Casbin RBAC, 4 invariants - Per-Project Security Policy: .zora/security-policy.toml with parent ceiling - zora security audit: daemon startup gate - Tool Hook Pipeline: 6 built-in hooks running before every tool call Updated sections: - Security Architecture Summary table: 9 new subsystem rows - OWASP matrix: encoding coverage, channel quarantine, ASI-06 row - Audit event types: 12 new v0.12 event types documented - Implementation status: all previously "in progress" items now Active Closes #159 (PR D from security review plan) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mv (40) was listed after shell_exec (50), breaking sort order. Fixes Gemini code review suggestion on PR #159. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…og + Meta Graph API social 4-agent Zora team (signal → writer → image → publisher): - SignalAgent: pulls topics-next, aborts if < 3 signals - WriterAgent: StoryBrand MDX with ≥3 cited expert signals - ImageAgent: hero (customer-focused 16:9) + social (Sophia 1:1) - PublisherAgent: git + vercel --prod + Facebook/Instagram Graph API Human gate via Telegram after WriterAgent (2hr timeout → auto-approve). Social posting via Meta Graph API (not browser automation). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…s-LLM eval pattern, CI on linux+macos Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

codeant-ai · 2026-03-25T04:25:17Z

CodeAnt AI is reviewing your PR.

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

coderabbitai · 2026-03-25T04:25:26Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

The PR introduces end-to-end testing infrastructure with a new EchoProvider for deterministic testing, extends CI/CD with an e2e GitHub Actions job, documents a comprehensive multi-agent content pipeline workflow (signal-writer-gate-image-publisher) with human approval gates, and substantially updates security documentation to v0.12.0 architecture including irreversibility scoring, HITL approval, and reputation systems.

Changes

Cohort / File(s)	Summary
E2E Testing Infrastructure `.github/workflows/ci.yml`, `package.json`, `tests/e2e/scenario-harness.test.ts`, `tests/fixtures/e2e-*.toml`	Added GitHub Actions e2e job on ubuntu/macOS matrix, npm scripts for `test:e2e` and `test:e2e:real`, comprehensive scenario harness test (501 lines) spawning real CLI with seven test scenarios including provider routing, failover, cross-provider evaluation, prompt-injection robustness, and concurrency; includes EchoProvider-based and real-provider fixture configs.
Echo Provider Implementation `src/providers/echo-provider.ts`, `src/providers/index.ts`, `src/cli/daemon.ts`, `src/cli/index.ts`, `src/types.ts`	Introduced `EchoProvider` class (191 lines) implementing deterministic, keyword-driven responses for testing; updated provider factory switches in CLI entry points and daemon; extended `KnownProviderType` union to include `'echo'`; re-exported provider from barrel.
Content Pipeline Multi-Agent Workflow `examples/routines/content-pipeline.toml`, `examples/routines/content-pipeline/{signal,writer,image,publisher}-agent.md`	Restructured single-task pipeline into sequential multi-agent orchestration: signal-agent (topic retrieval with hard-abort conditions), writer-agent (StoryBrand MDX generation with citation requirements), human gate (Telegram approval with 120-min timeout), image-agent (NanoBanana MCP parallel image generation), and publisher-agent (multi-platform social publishing via Graph API and Instagram). Updated runtime config (model, cost tier, timeout, one-shot mode) and introduced environment/secrets injection.
E2E Testing Documentation & Fixtures `docs/testing/e2e-cross-llm-evaluation.md`, `tests/fixtures/e2e-config-real.toml.example`, `tests/fixtures/e2e-policy.toml`	Added E2E cross-LLM evaluation pattern documentation (generator→evaluator with `EVALUATION:` output assertion), example real-provider config fixture with Claude + Gemini ranked providers, and minimal permissive e2e policy fixture (filesystem/shell/action/network/budget allowlists).
Security Documentation Update `SECURITY.md`	Expanded v0.6→v0.12 security architecture (311 lines): added irreversibility scoring (warn/flag/auto-deny thresholds), HITL approval queue (Telegram/Signal with 5-min timeout, scoped allow/deny, ceiling windows), session-level risk forecasting, per-subagent reputation escalation (throttle/warn/shutdown with 24h reset), CaMeL quarantine processor, Casbin RBAC-with-domains authorization, per-project security policies (`.zora/security-policy.toml`, parent-ceiling inheritance), `zora security audit` pre-flight gate, and updated event taxonomy and audit log examples.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/Scheduler
    participant SignalAgent as Signal Agent
    participant WriterAgent as Writer Agent
    participant HumanGate as Human Gate<br/>(Telegram)
    participant ImageAgent as Image Agent
    participant PublisherAgent as Publisher Agent
    participant Workspace as ~/.zora/workspace
    participant GitHub as GitHub<br/>+ Social APIs

    User->>SignalAgent: trigger content-pipeline
    SignalAgent->>SignalAgent: sophia-wire topics-next<br/>+ brief/context
    SignalAgent->>Workspace: write {TODAY}-brief.json<br/>(signals, domains, audience)
    SignalAgent->>WriterAgent: transfer control

    WriterAgent->>Workspace: read {TODAY}-brief.json
    WriterAgent->>WriterAgent: generate StoryBrand MDX<br/>(3+ signal citations)
    WriterAgent->>Workspace: write {slug}.mdx<br/>+ preview.txt
    WriterAgent->>HumanGate: approval request

    HumanGate->>HumanGate: await Telegram<br/>approve/reject<br/>(120 min timeout)
    alt approval granted
        HumanGate->>ImageAgent: proceed
    else timeout/reject
        HumanGate->>User: abort/alert
    end

    ImageAgent->>Workspace: read {TODAY}-brief.json
    ImageAgent->>ImageAgent: NanoBanana MCP:<br/>hero (16:9)<br/>+ social (1:1)
    ImageAgent->>Workspace: save images<br/>to content/images/

    ImageAgent->>PublisherAgent: transfer

    PublisherAgent->>GitHub: copy MDX + hero image<br/>to content/blog/
    PublisherAgent->>GitHub: git add/commit/push<br/>+ vercel --prod
    PublisherAgent->>GitHub: verify HTTP 200
    PublisherAgent->>PublisherAgent: sophia-wire<br/>topics-publish
    PublisherAgent->>PublisherAgent: generate soundbites
    PublisherAgent->>GitHub: Facebook Graph API<br/>(with retry on fail)
    PublisherAgent->>GitHub: Instagram publish<br/>(2-step, 1 retry)
    PublisherAgent->>User: Telegram completion<br/>via Claude Ops

sequenceDiagram
    participant Harness as E2E Harness<br/>(Vitest)
    participant CLI as Zora CLI<br/>(zora ask)
    participant PrimaryProvider as Primary Provider<br/>(EchoProvider)
    participant SecondaryProvider as Secondary Provider<br/>(EchoProvider)
    participant SessionStore as ~/.zora/sessions<br/>(JSONL)

    Harness->>Harness: copy e2e-config.toml<br/>to temp .zora/

    Harness->>CLI: spawn: zora ask<br/>'reverse: ...'<br/>(ZORA_CONFIG_DIR)
    CLI->>PrimaryProvider: route (rank 1)
    PrimaryProvider->>PrimaryProvider: keyword match<br/>→ reversed text
    PrimaryProvider->>CLI: emit events<br/>(task.start, text, task.end, done)
    CLI->>SessionStore: write JSONL<br/>session file

    Harness->>SessionStore: parse session<br/>verify event order<br/>+ provider source

    Harness->>CLI: spawn: zora ask<br/>'evaluate: <output>'<br/>(disabled primary)
    CLI->>SecondaryProvider: failover (rank 2)
    SecondaryProvider->>SecondaryProvider: keyword match: evaluate<br/>→ 'EVALUATION: ...'
    SecondaryProvider->>CLI: emit events
    CLI->>SessionStore: write JSONL<br/>(2nd session file)

    Harness->>SessionStore: verify EVALUATION: present<br/>+ timestamp ordering<br/>+ parallel execution
    Harness->>Harness: assert all exit 0<br/>+ no prompt leaks

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Develop #3: Defines core provider interface types and KnownProviderType union in src/types.ts, which this PR directly extends by adding 'echo' literal and implementing a conforming provider class.
Implement full remediation roadmap (R1-R30) across P0, P1, P2 #27: Modifies the provider factory switch statement in daemon/CLI code (createProviders function), which this PR extends with echo provider handling in the same switch.
Add Ollama provider and cost-tier routing constraints #51: Adds a new provider type (ollama) to the provider-registration infrastructure using the same pattern of factory switch updates, type additions, and exports that this PR follows for echo.

Suggested labels

size:XXL, feature:testing, feature:security, feature:examples

🐰 A harness springs to life, with echoes that test,

While agents converse through a pipeline blessed,

From signal to publish, with gates in between,

The swiftest e2e flow we've ever seen! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the three main components: e2e scenario harness, echo provider for deterministic testing, and cross-LLM evaluation pattern.
Docstring Coverage	✅ Passed	Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/e2e-scenario-harness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-25T04:25:36Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's testing infrastructure by introducing a robust end-to-end scenario harness, complete with a new deterministic EchoProvider and a sophisticated cross-LLM evaluation pattern. Concurrently, it updates the security documentation to reflect a suite of advanced defense-in-depth features for version 0.12.0, moving towards a more layered and resilient security posture. Additionally, a detailed multi-agent content pipeline example is provided, showcasing complex routine orchestration with human-in-the-loop approval.

Highlights

E2E Scenario Harness: Introduced a comprehensive end-to-end testing harness with 7 distinct scenarios, including basic task routing, session persistence, provider routing, failover, cross-provider evaluation, prompt injection safety, and concurrency. These tests simulate real-world interactions by booting Zora via the CLI.
EchoProvider for Deterministic Testing: Added a new EchoProvider that offers deterministic responses based on prompt keywords. This allows E2E tests to verify system behavior without requiring real LLM API keys, making tests reliable and suitable for CI environments.
Cross-LLM Evaluation Pattern: Implemented a cross-LLM evaluation pattern within the E2E tests, where a 'generator' LLM produces output and a separate 'evaluator' LLM checks it. This pattern supports independent verification and can use different models or providers.
Updated Security Documentation: The SECURITY.md file was significantly updated to detail new v0.12.0 security hardening features. These include irreversibility scoring, human-in-the-loop approval gates, session risk forecasting, subagent reputation tracking, channel quarantine, Casbin RBAC for channel authorization, per-project security policies, a startup audit gate, and a six-hook tool pipeline.
Expanded Content Pipeline Example: Enhanced the content-pipeline example routine to demonstrate a multi-agent workflow with a human approval gate. This includes SignalAgent, WriterAgent, ImageAgent, and PublisherAgent working sequentially with defined inputs, outputs, and error handling.

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/ci.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

codeant-ai · 2026-03-25T04:26:35Z

Sequence Diagram

This PR adds a deterministic Echo provider and a scenario-based E2E harness that boots the real CLI. The core flow validates provider selection and a two-step generator then evaluator pattern, with each CLI run writing its own session log and running in CI across Linux and macOS.

sequenceDiagram
    participant CI
    participant E2EHarness
    participant ZoraCLI
    participant ProviderRouter
    participant EchoProvider
    participant SessionStore

    CI->>E2EHarness: Run e2e scenarios
    E2EHarness->>ZoraCLI: Ask generator task
    ZoraCLI->>ProviderRouter: Load config and choose available provider
    ProviderRouter->>EchoProvider: Execute task
    EchoProvider-->>ZoraCLI: Deterministic response events
    ZoraCLI->>SessionStore: Write session JSONL file

    E2EHarness->>ZoraCLI: Ask evaluator task with generated output
    ZoraCLI->>ProviderRouter: Route to evaluator provider and execute
    ProviderRouter-->>E2EHarness: Evaluator result with second session file

Generated by CodeAnt AI

codeant-ai · 2026-03-25T04:27:01Z

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review Routing Bug The `code` / `function` / `write` branch is checked before the evaluation branch, so prompts such as `review this code` or `write a review` can return a code snippet instead of the intended evaluation response. Please validate the keyword precedence and tighten the matching so evaluator tasks are classified correctly. Environment Side Effect This setup writes a fixture into the real `~/.zora` directory when `policy.toml` is missing. That makes the test suite non-isolated, can leave persistent state behind after the run, and may fail on systems where the home directory is not writable. Edge Case Word counting should be validated for empty or whitespace-only tasks. The current split logic can report one word for an empty prompt, which can make e2e assertions inconsistent.

gemini-code-assist

Code Review

This pull request introduces a significant security hardening update (v0.12.0) with a layered defense-in-depth stack, including irreversibility scoring, human-in-the-loop approval, and session risk forecasting. It also refactors the content pipeline with new agents (Signal, Writer, Image, Publisher) and integrates a human approval gate. Additionally, a new EchoProvider and comprehensive end-to-end testing framework have been added. Feedback suggests that the timeout_action in the human approval gate should default to reject to prevent unintended content publication, and the deployment verification process should use a more robust polling mechanism instead of a fixed sleep duration.

gemini-code-assist · 2026-03-25T04:27:43Z

+Reply *approve* to publish, *reject* to cancel.
+Auto-publishes in *2 hours* if no response.
 """
+timeout_action = "approve"       # Auto-approve after timeout (not reject)


The timeout_action is set to approve, which means content will be automatically published if not reviewed within 2 hours. This could lead to unintended or erroneous content being published without explicit approval. For safety, it's generally better to default to reject on timeout to prevent accidental publications. This ensures that a human must explicitly approve the content before it goes live.

timeout_action = "reject" # Auto-reject after timeout (safer default)

gemini-code-assist · 2026-03-25T04:27:44Z

+sleep 120
+curl -s -o /dev/null -w "%{http_code}" https://www.mymoneycoach.ai/blog/{slug}


Using a fixed sleep 120 to wait for deployment propagation is brittle, as deployment times can vary. A more robust approach is to poll the URL in a loop with a timeout until it returns a 200 status code. This avoids both unnecessary waiting and failures due to longer-than-expected deployment times.

Suggested change

sleep 120

curl -s -o /dev/null -w "%{http_code}" https://www.mymoneycoach.ai/blog/{slug}

URL="https://www.mymoneycoach.ai/blog/{slug}"

for i in {1..12}; do # Poll for 2 minutes (12 * 10s)

HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "$URL")

if [ "$HTTP_CODE" -eq 200 ]; then

echo "Deployment live!"

break

fi

echo "Attempt $i/12: Not live yet (HTTP $HTTP_CODE). Retrying in 10s..."

sleep 10

done

ryaker · 2026-03-25T04:30:49Z

Superseded by #171 (clean cherry-pick onto main)

codeant-ai · 2026-03-25T04:32:32Z

+const E2E_CONFIG_FIXTURE = path.join(REPO_ROOT, 'tests', 'fixtures', 'e2e-config.toml');
+const E2E_POLICY_FIXTURE = path.join(REPO_ROOT, 'tests', 'fixtures', 'e2e-policy.toml');
+
+// The Orchestrator always writes sessions to ~/.zora/sessions (its baseDir defaults
+// to os.homedir()/.zora regardless of ZORA_CONFIG_DIR). Tests track this dir.


Suggestion: ZORA_REAL_PROVIDERS=1 is never read, so test:e2e:real still loads the echo-only fixture and never exercises real providers. Select the fixture file based on ZORA_REAL_PROVIDERS so the real-provider path is actually tested. [logic error]

Severity Level: Major ⚠️

- ⚠️ `test:e2e:real` does not validate Claude/Gemini integration. - ⚠️ Real-provider regressions can ship undetected.

Suggested change

const E2E_CONFIG_FIXTURE = path.join(REPO_ROOT, 'tests', 'fixtures', 'e2e-config.toml');

const E2E_POLICY_FIXTURE = path.join(REPO_ROOT, 'tests', 'fixtures', 'e2e-policy.toml');

// The Orchestrator always writes sessions to ~/.zora/sessions (its baseDir defaults

// to os.homedir()/.zora regardless of ZORA_CONFIG_DIR). Tests track this dir.

const USE_REAL_PROVIDERS = process.env['ZORA_REAL_PROVIDERS'] === '1';

const E2E_CONFIG_FIXTURE = path.join(

REPO_ROOT,

'tests',

'fixtures',

USE_REAL_PROVIDERS ? 'e2e-config-real.toml' : 'e2e-config.toml',

);

Steps of Reproduction ✅

1. Run `npm run test:e2e:real`; script sets `ZORA_REAL_PROVIDERS=1` in `package.json:28`. 2. Test constants still hardcode `e2e-config.toml` at `tests/e2e/scenario-harness.test.ts:31`; no runtime read of `ZORA_REAL_PROVIDERS`. 3. `createTempZoraDir()` copies that fixture into temp config at `tests/e2e/scenario-harness.test.ts:72`. 4. The copied fixture defines only echo providers (`tests/fixtures/e2e-config.toml:7-20`). 5. CLI provider factory builds `EchoProvider` for `type='echo'` at `src/cli/index.ts:60-80`, so real-provider route is never exercised.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** tests/e2e/scenario-harness.test.ts **Line:** 31:35 **Comment:** *Logic Error: `ZORA_REAL_PROVIDERS=1` is never read, so `test:e2e:real` still loads the echo-only fixture and never exercises real providers. Select the fixture file based on `ZORA_REAL_PROVIDERS` so the real-provider path is actually tested. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

👍 | 👎

codeant-ai · 2026-03-25T04:32:32Z

+    const globalZoraDir = path.join(os.homedir(), '.zora');
+    fs.mkdirSync(globalZoraDir, { recursive: true });
+    fs.copyFileSync(E2E_POLICY_FIXTURE, globalPolicyPath);
+  }


Suggestion: The test creates ~/.zora/policy.toml when missing but never removes it, leaving a permissive test policy in the developer's real home directory after tests complete. Add cleanup for the file created by the harness to avoid persistent global security-state mutation. [security]

Severity Level: Major ⚠️

- ⚠️ Leaves persistent `~/.zora/policy.toml` after tests. - ⚠️ Later local runs inherit permissive test policy.

Suggested change

}

process.once('exit', () => {

try {

fs.rmSync(globalPolicyPath, { force: true });

} catch {

// Best-effort cleanup

}

});

Steps of Reproduction ✅

1. Start with no `~/.zora/policy.toml`; `createTempZoraDir()` checks and writes one at `tests/e2e/scenario-harness.test.ts:75-79`. 2. Written file is permissive test policy (`tests/fixtures/e2e-policy.toml:1,22` allows broad access). 3. Test teardown only removes temp dirs (`tests/e2e/scenario-harness.test.ts:249-252`), not global policy. 4. Future CLI runs load global policy as required base from `src/config/policy-loader.ts:102-107`, so test-created policy persists beyond test execution.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** tests/e2e/scenario-harness.test.ts **Line:** 80:80 **Comment:** *Security: The test creates `~/.zora/policy.toml` when missing but never removes it, leaving a permissive test policy in the developer's real home directory after tests complete. Add cleanup for the file created by the harness to avoid persistent global security-state mutation. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

👍 | 👎

codeant-ai · 2026-03-25T04:32:32Z

+    const newFiles = sessionFilesNewerThan(sinceMs);
+    expect(newFiles.length, 'Expected a new session file').toBeGreaterThan(0);
+
+    const events = parseJsonl(newFiles[0]!);


Suggestion: Selecting newFiles[0] can read an unrelated session when other Zora runs create files in the same global sessions directory, causing false routing assertions. Match the session file by the scenario prompt before asserting provider source. [logic error]

Severity Level: Major ⚠️

- ⚠️ Scenario 3 can fail from unrelated session writes. - ⚠️ Provider-routing check becomes flaky and nondeterministic.

Suggested change

const events = parseJsonl(newFiles[0]!);

const targetPrompt = 'summarize this task for the evaluator';

const matchingFile = newFiles.find((file) => {

const events = parseJsonl(file);

return events.some((e) => {

if (e['type'] !== 'task.start') return false;

const content = e['content'] as { task?: unknown } | undefined;

return typeof content?.task === 'string' && content.task === targetPrompt;

});

});

expect(matchingFile, 'Expected a session file for this scenario prompt').toBeDefined();

const events = parseJsonl(matchingFile!);

Steps of Reproduction ✅

1. Scenario 3 writes one task via `spawnAsk('summarize...')` at `tests/e2e/scenario-harness.test.ts:317-320`. 2. It gathers all `~/.zora/sessions/*.jsonl` newer than timestamp (`sessionFilesNewerThan`, `tests/e2e/scenario-harness.test.ts:169-177`) and sorts by mtime. 3. Session directory is global (`GLOBAL_SESSIONS_DIR`, `tests/e2e/scenario-harness.test.ts:36`) and shared with any other `zora-agent ask` process. 4. Assertion parses only `newFiles[0]` (`tests/e2e/scenario-harness.test.ts:327`), so any unrelated newer session can be selected. 5. This can mis-assert provider routing at `tests/e2e/scenario-harness.test.ts:333` against the wrong task file.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** tests/e2e/scenario-harness.test.ts **Line:** 327:327 **Comment:** *Logic Error: Selecting `newFiles[0]` can read an unrelated session when other Zora runs create files in the same global sessions directory, causing false routing assertions. Match the session file by the scenario prompt before asserting provider source. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

👍 | 👎

codeant-ai · 2026-03-25T04:32:32Z

+      const newFiles = sessionFilesNewerThan(sinceMs);
+      expect(newFiles.length, 'Expected a session file from fallback run').toBeGreaterThan(0);
+
+      const events = parseJsonl(newFiles[0]!);


Suggestion: This failover check also parses only the newest session file, which can belong to a different run in the shared global sessions directory and produce false failures. Resolve the session by matching the task payload first. [logic error]

Severity Level: Major ⚠️

- ⚠️ Scenario 4 failover check can target wrong session. - ⚠️ E2E failover signal becomes flaky on shared machines.

Suggested change

const events = parseJsonl(newFiles[0]!);

const targetPrompt = 'Write a function that counts characters';

const matchingFile = newFiles.find((file) => {

const events = parseJsonl(file);

return events.some((e) => {

if (e['type'] !== 'task.start') return false;

const content = e['content'] as { task?: unknown } | undefined;

return typeof content?.task === 'string' && content.task === targetPrompt;

});

});

expect(matchingFile, 'Expected a session file for fallback scenario prompt').toBeDefined();

const events = parseJsonl(matchingFile!);

Steps of Reproduction ✅

1. Scenario 4 runs failover prompt at `tests/e2e/scenario-harness.test.ts:349-352`. 2. It then reads all newer files from shared `~/.zora/sessions` using `sessionFilesNewerThan()` (`tests/e2e/scenario-harness.test.ts:169-177`). 3. It parses only newest file `newFiles[0]` (`tests/e2e/scenario-harness.test.ts:360`) and checks for `echo-evaluator`. 4. If another Zora process writes a newer session in the same window, parsed events belong to another task, producing false failover failures.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** tests/e2e/scenario-harness.test.ts **Line:** 360:360 **Comment:** *Logic Error: This failover check also parses only the newest session file, which can belong to a different run in the shared global sessions directory and produce false failures. Resolve the session by matching the task payload first. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

👍 | 👎

codeant-ai · 2026-03-25T04:32:36Z

CodeAnt AI finished reviewing your PR.

ryaker-LG and others added 4 commits March 20, 2026 22:33

docs(security): sort action score table by irreversibility score

50a8b8c

mv (40) was listed after shell_exec (50), breaking sort order. Fixes Gemini code review suggestion on PR #159. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(testing): e2e scenario harness — EchoProvider, 7 scenarios, cros…

21bf9a2

…s-LLM eval pattern, CI on linux+macos Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

codeant-ai Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files label Mar 25, 2026

gemini-code-assist Bot reviewed Mar 25, 2026

View reviewed changes

ryaker closed this Mar 25, 2026

codeant-ai Bot reviewed Mar 25, 2026

View reviewed changes

		sleep 120
		curl -s -o /dev/null -w "%{http_code}" https://www.mymoneycoach.ai/blog/{slug}

-sleep 120
-curl -s -o /dev/null -w "%{http_code}" https://www.mymoneycoach.ai/blog/{slug}
+URL="https://www.mymoneycoach.ai/blog/{slug}"
+for i in {1..12}; do # Poll for 2 minutes (12 * 10s)
+  HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "$URL")
+  if [ "$HTTP_CODE" -eq 200 ]; then
+    echo "Deployment live!"
+    break
+  fi
+  echo "Attempt $i/12: Not live yet (HTTP $HTTP_CODE). Retrying in 10s..."
+  sleep 10
+done

-  }
+    process.once('exit', () => {
+      try {
+        fs.rmSync(globalPolicyPath, { force: true });
+      } catch {
+        // Best-effort cleanup
+      }
+    });

-    const events = parseJsonl(newFiles[0]!);
+    const targetPrompt = 'summarize this task for the evaluator';
+    const matchingFile = newFiles.find((file) => {
+      const events = parseJsonl(file);
+      return events.some((e) => {
+        if (e['type'] !== 'task.start') return false;
+        const content = e['content'] as { task?: unknown } | undefined;
+        return typeof content?.task === 'string' && content.task === targetPrompt;
+      });
+    });
+    expect(matchingFile, 'Expected a session file for this scenario prompt').toBeDefined();
+    const events = parseJsonl(matchingFile!);

Conversation

ryaker commented Mar 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

Test plan

CodeAnt-AI Description

What Changed

Impact

Checking Your Pull Request

Talking to CodeAnt AI

Example

Preserve Org Learnings with CodeAnt

Example

Retrigger review

Check Your Repository Health

Summary by CodeRabbit

Uh oh!

codeant-ai Bot commented Mar 25, 2026

Thanks for using CodeAnt! 🎉

Uh oh!

coderabbitai Bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

gemini-code-assist Bot commented Mar 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

codeant-ai Bot commented Mar 25, 2026

Sequence Diagram

Uh oh!

codeant-ai Bot commented Mar 25, 2026

Nitpicks 🔍

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

ryaker commented Mar 25, 2026

Uh oh!

codeant-ai Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ryaker commented Mar 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 25, 2026 •

edited

Loading