Skip to content

feat(testing): e2e scenario harness — real boot, echo provider, cross-LLM evaluation#170

Closed
ryaker wants to merge 4 commits into
mainfrom
feat/e2e-scenario-harness
Closed

feat(testing): e2e scenario harness — real boot, echo provider, cross-LLM evaluation#170
ryaker wants to merge 4 commits into
mainfrom
feat/e2e-scenario-harness

Conversation

@ryaker

@ryaker ryaker commented Mar 25, 2026

Copy link
Copy Markdown
Owner

User description

Summary

  • Adds EchoProvider (type: echo) — deterministic, no API keys, works in CI
  • 7 e2e scenarios: basic task routing, session persistence, provider routing, failover, cross-provider evaluation, prompt injection safety, concurrency
  • Cross-LLM evaluation pattern: generator task → evaluator task (Gemini checks Claude's work); both echo in CI, real providers with test:e2e:real
  • CI job on ubuntu-latest + macos-latest (no Windows — POSIX paths throughout)
  • test:e2e runs in CI; test:e2e:real for local runs with real API keys

Test plan

  • npm run build:backend passes
  • ZORA_E2E=1 npx vitest run tests/e2e/ — all 7 scenarios pass (4.73s)
  • CI e2e job green on ubuntu-latest and macos-latest

🤖 Generated with Claude Code


CodeAnt-AI Description

Add a built-in echo provider and end-to-end scenario harness

What Changed

  • Added a no-key echo provider that returns predictable outputs, so e2e runs can pass without external AI access
  • Added end-to-end scenarios that boot the CLI, verify session files, test provider fallback, cross-step evaluation, injection handling, and concurrent runs
  • Added e2e fixtures and CI jobs on Ubuntu and macOS so these scenarios run automatically
  • Updated security docs and content-pipeline examples to reflect the current workflow and security model

Impact

✅ Reliable e2e runs without API keys
✅ Fewer release regressions in CLI flows
✅ Wider CI coverage on macOS and Linux

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Summary by CodeRabbit

  • New Features

    • Human-in-the-loop approval gates for multi-agent workflows with configurable timeouts and auto-actions
    • Enhanced security architecture with layered action-control scoring, HITL approval queues, reputation-based throttling, and session-level risk detection
  • Documentation

    • Multi-agent content pipeline example workflow (signal, writer, image, publisher agents)
    • Cross-LLM evaluation testing pattern guide
    • Security architecture documentation updates
  • Tests

    • E2E test suite and CI integration for scenario validation

ryaker-LG and others added 4 commits March 20, 2026 22:33
SECURITY.md was last updated for v0.6 but Zora is now v0.12.0.
Nine undocumented security subsystems have been added since v0.6.

New sections added:
- Irreversibility Scoring: 0-100 per-action scores, configurable thresholds
- HITL Approval Gate: ApprovalQueue routing via Telegram/Signal
- Session Risk Forecasting: MemoryRiskForecaster drift/salami/creep signals
- Subagent Reputation: AgentCooldown with escalating denial thresholds
- Channel Security: CaMeL dual-LLM quarantine, Casbin RBAC, 4 invariants
- Per-Project Security Policy: .zora/security-policy.toml with parent ceiling
- zora security audit: daemon startup gate
- Tool Hook Pipeline: 6 built-in hooks running before every tool call

Updated sections:
- Security Architecture Summary table: 9 new subsystem rows
- OWASP matrix: encoding coverage, channel quarantine, ASI-06 row
- Audit event types: 12 new v0.12 event types documented
- Implementation status: all previously "in progress" items now Active

Closes #159 (PR D from security review plan)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
mv (40) was listed after shell_exec (50), breaking sort order.
Fixes Gemini code review suggestion on PR #159.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…og + Meta Graph API social

4-agent Zora team (signal → writer → image → publisher):
- SignalAgent: pulls topics-next, aborts if < 3 signals
- WriterAgent: StoryBrand MDX with ≥3 cited expert signals
- ImageAgent: hero (customer-focused 16:9) + social (Sophia 1:1)
- PublisherAgent: git + vercel --prod + Facebook/Instagram Graph API

Human gate via Telegram after WriterAgent (2hr timeout → auto-approve).
Social posting via Meta Graph API (not browser automation).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s-LLM eval pattern, CI on linux+macos

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codeant-ai

codeant-ai Bot commented Mar 25, 2026

Copy link
Copy Markdown

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@coderabbitai

coderabbitai Bot commented Mar 25, 2026

Copy link
Copy Markdown

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

The PR introduces end-to-end testing infrastructure with a new EchoProvider for deterministic testing, extends CI/CD with an e2e GitHub Actions job, documents a comprehensive multi-agent content pipeline workflow (signal-writer-gate-image-publisher) with human approval gates, and substantially updates security documentation to v0.12.0 architecture including irreversibility scoring, HITL approval, and reputation systems.

Changes

Cohort / File(s) Summary
E2E Testing Infrastructure
.github/workflows/ci.yml, package.json, tests/e2e/scenario-harness.test.ts, tests/fixtures/e2e-*.toml
Added GitHub Actions e2e job on ubuntu/macOS matrix, npm scripts for test:e2e and test:e2e:real, comprehensive scenario harness test (501 lines) spawning real CLI with seven test scenarios including provider routing, failover, cross-provider evaluation, prompt-injection robustness, and concurrency; includes EchoProvider-based and real-provider fixture configs.
Echo Provider Implementation
src/providers/echo-provider.ts, src/providers/index.ts, src/cli/daemon.ts, src/cli/index.ts, src/types.ts
Introduced EchoProvider class (191 lines) implementing deterministic, keyword-driven responses for testing; updated provider factory switches in CLI entry points and daemon; extended KnownProviderType union to include 'echo'; re-exported provider from barrel.
Content Pipeline Multi-Agent Workflow
examples/routines/content-pipeline.toml, examples/routines/content-pipeline/{signal,writer,image,publisher}-agent.md
Restructured single-task pipeline into sequential multi-agent orchestration: signal-agent (topic retrieval with hard-abort conditions), writer-agent (StoryBrand MDX generation with citation requirements), human gate (Telegram approval with 120-min timeout), image-agent (NanoBanana MCP parallel image generation), and publisher-agent (multi-platform social publishing via Graph API and Instagram). Updated runtime config (model, cost tier, timeout, one-shot mode) and introduced environment/secrets injection.
E2E Testing Documentation & Fixtures
docs/testing/e2e-cross-llm-evaluation.md, tests/fixtures/e2e-config-real.toml.example, tests/fixtures/e2e-policy.toml
Added E2E cross-LLM evaluation pattern documentation (generator→evaluator with EVALUATION: output assertion), example real-provider config fixture with Claude + Gemini ranked providers, and minimal permissive e2e policy fixture (filesystem/shell/action/network/budget allowlists).
Security Documentation Update
SECURITY.md
Expanded v0.6→v0.12 security architecture (311 lines): added irreversibility scoring (warn/flag/auto-deny thresholds), HITL approval queue (Telegram/Signal with 5-min timeout, scoped allow/deny, ceiling windows), session-level risk forecasting, per-subagent reputation escalation (throttle/warn/shutdown with 24h reset), CaMeL quarantine processor, Casbin RBAC-with-domains authorization, per-project security policies (.zora/security-policy.toml, parent-ceiling inheritance), zora security audit pre-flight gate, and updated event taxonomy and audit log examples.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/Scheduler
    participant SignalAgent as Signal Agent
    participant WriterAgent as Writer Agent
    participant HumanGate as Human Gate<br/>(Telegram)
    participant ImageAgent as Image Agent
    participant PublisherAgent as Publisher Agent
    participant Workspace as ~/.zora/workspace
    participant GitHub as GitHub<br/>+ Social APIs

    User->>SignalAgent: trigger content-pipeline
    SignalAgent->>SignalAgent: sophia-wire topics-next<br/>+ brief/context
    SignalAgent->>Workspace: write {TODAY}-brief.json<br/>(signals, domains, audience)
    SignalAgent->>WriterAgent: transfer control

    WriterAgent->>Workspace: read {TODAY}-brief.json
    WriterAgent->>WriterAgent: generate StoryBrand MDX<br/>(3+ signal citations)
    WriterAgent->>Workspace: write {slug}.mdx<br/>+ preview.txt
    WriterAgent->>HumanGate: approval request

    HumanGate->>HumanGate: await Telegram<br/>approve/reject<br/>(120 min timeout)
    alt approval granted
        HumanGate->>ImageAgent: proceed
    else timeout/reject
        HumanGate->>User: abort/alert
    end

    ImageAgent->>Workspace: read {TODAY}-brief.json
    ImageAgent->>ImageAgent: NanoBanana MCP:<br/>hero (16:9)<br/>+ social (1:1)
    ImageAgent->>Workspace: save images<br/>to content/images/

    ImageAgent->>PublisherAgent: transfer

    PublisherAgent->>GitHub: copy MDX + hero image<br/>to content/blog/
    PublisherAgent->>GitHub: git add/commit/push<br/>+ vercel --prod
    PublisherAgent->>GitHub: verify HTTP 200
    PublisherAgent->>PublisherAgent: sophia-wire<br/>topics-publish
    PublisherAgent->>PublisherAgent: generate soundbites
    PublisherAgent->>GitHub: Facebook Graph API<br/>(with retry on fail)
    PublisherAgent->>GitHub: Instagram publish<br/>(2-step, 1 retry)
    PublisherAgent->>User: Telegram completion<br/>via Claude Ops
Loading
sequenceDiagram
    participant Harness as E2E Harness<br/>(Vitest)
    participant CLI as Zora CLI<br/>(zora ask)
    participant PrimaryProvider as Primary Provider<br/>(EchoProvider)
    participant SecondaryProvider as Secondary Provider<br/>(EchoProvider)
    participant SessionStore as ~/.zora/sessions<br/>(JSONL)

    Harness->>Harness: copy e2e-config.toml<br/>to temp .zora/

    Harness->>CLI: spawn: zora ask<br/>'reverse: ...'<br/>(ZORA_CONFIG_DIR)
    CLI->>PrimaryProvider: route (rank 1)
    PrimaryProvider->>PrimaryProvider: keyword match<br/>→ reversed text
    PrimaryProvider->>CLI: emit events<br/>(task.start, text, task.end, done)
    CLI->>SessionStore: write JSONL<br/>session file

    Harness->>SessionStore: parse session<br/>verify event order<br/>+ provider source

    Harness->>CLI: spawn: zora ask<br/>'evaluate: <output>'<br/>(disabled primary)
    CLI->>SecondaryProvider: failover (rank 2)
    SecondaryProvider->>SecondaryProvider: keyword match: evaluate<br/>→ 'EVALUATION: ...'
    SecondaryProvider->>CLI: emit events
    CLI->>SessionStore: write JSONL<br/>(2nd session file)

    Harness->>SessionStore: verify EVALUATION: present<br/>+ timestamp ordering<br/>+ parallel execution
    Harness->>Harness: assert all exit 0<br/>+ no prompt leaks
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • Develop #3: Defines core provider interface types and KnownProviderType union in src/types.ts, which this PR directly extends by adding 'echo' literal and implementing a conforming provider class.
  • Implement full remediation roadmap (R1-R30) across P0, P1, P2 #27: Modifies the provider factory switch statement in daemon/CLI code (createProviders function), which this PR extends with echo provider handling in the same switch.
  • Add Ollama provider and cost-tier routing constraints #51: Adds a new provider type (ollama) to the provider-registration infrastructure using the same pattern of factory switch updates, type additions, and exports that this PR follows for echo.

Suggested labels

size:XXL, feature:testing, feature:security, feature:examples


🐰 A harness springs to life, with echoes that test,

While agents converse through a pipeline blessed,

From signal to publish, with gates in between,

The swiftest e2e flow we've ever seen! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the three main components: e2e scenario harness, echo provider for deterministic testing, and cross-LLM evaluation pattern.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/e2e-scenario-harness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's testing infrastructure by introducing a robust end-to-end scenario harness, complete with a new deterministic EchoProvider and a sophisticated cross-LLM evaluation pattern. Concurrently, it updates the security documentation to reflect a suite of advanced defense-in-depth features for version 0.12.0, moving towards a more layered and resilient security posture. Additionally, a detailed multi-agent content pipeline example is provided, showcasing complex routine orchestration with human-in-the-loop approval.

Highlights

  • E2E Scenario Harness: Introduced a comprehensive end-to-end testing harness with 7 distinct scenarios, including basic task routing, session persistence, provider routing, failover, cross-provider evaluation, prompt injection safety, and concurrency. These tests simulate real-world interactions by booting Zora via the CLI.
  • EchoProvider for Deterministic Testing: Added a new EchoProvider that offers deterministic responses based on prompt keywords. This allows E2E tests to verify system behavior without requiring real LLM API keys, making tests reliable and suitable for CI environments.
  • Cross-LLM Evaluation Pattern: Implemented a cross-LLM evaluation pattern within the E2E tests, where a 'generator' LLM produces output and a separate 'evaluator' LLM checks it. This pattern supports independent verification and can use different models or providers.
  • Updated Security Documentation: The SECURITY.md file was significantly updated to detail new v0.12.0 security hardening features. These include irreversibility scoring, human-in-the-loop approval gates, session risk forecasting, subagent reputation tracking, channel quarantine, Casbin RBAC for channel authorization, per-project security policies, a startup audit gate, and a six-hook tool pipeline.
  • Expanded Content Pipeline Example: Enhanced the content-pipeline example routine to demonstrate a multi-agent workflow with a human approval gate. This includes SignalAgent, WriterAgent, ImageAgent, and PublisherAgent working sequentially with defined inputs, outputs, and error handling.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/ci.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codeant-ai codeant-ai Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files label Mar 25, 2026
@codeant-ai

codeant-ai Bot commented Mar 25, 2026

Copy link
Copy Markdown

Sequence Diagram

This PR adds a deterministic Echo provider and a scenario-based E2E harness that boots the real CLI. The core flow validates provider selection and a two-step generator then evaluator pattern, with each CLI run writing its own session log and running in CI across Linux and macOS.

sequenceDiagram
    participant CI
    participant E2EHarness
    participant ZoraCLI
    participant ProviderRouter
    participant EchoProvider
    participant SessionStore

    CI->>E2EHarness: Run e2e scenarios
    E2EHarness->>ZoraCLI: Ask generator task
    ZoraCLI->>ProviderRouter: Load config and choose available provider
    ProviderRouter->>EchoProvider: Execute task
    EchoProvider-->>ZoraCLI: Deterministic response events
    ZoraCLI->>SessionStore: Write session JSONL file

    E2EHarness->>ZoraCLI: Ask evaluator task with generated output
    ZoraCLI->>ProviderRouter: Route to evaluator provider and execute
    ProviderRouter-->>E2EHarness: Evaluator result with second session file
Loading

Generated by CodeAnt AI

@codeant-ai

codeant-ai Bot commented Mar 25, 2026

Copy link
Copy Markdown

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • Routing Bug
    The code / function / write branch is checked before the evaluation branch, so prompts such as review this code or write a review can return a code snippet instead of the intended evaluation response. Please validate the keyword precedence and tighten the matching so evaluator tasks are classified correctly.

  • Environment Side Effect
    This setup writes a fixture into the real ~/.zora directory when policy.toml is missing. That makes the test suite non-isolated, can leave persistent state behind after the run, and may fail on systems where the home directory is not writable.

  • Edge Case
    Word counting should be validated for empty or whitespace-only tasks. The current split logic can report one word for an empty prompt, which can make e2e assertions inconsistent.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant security hardening update (v0.12.0) with a layered defense-in-depth stack, including irreversibility scoring, human-in-the-loop approval, and session risk forecasting. It also refactors the content pipeline with new agents (Signal, Writer, Image, Publisher) and integrates a human approval gate. Additionally, a new EchoProvider and comprehensive end-to-end testing framework have been added. Feedback suggests that the timeout_action in the human approval gate should default to reject to prevent unintended content publication, and the deployment verification process should use a more robust polling mechanism instead of a fixed sleep duration.

Reply *approve* to publish, *reject* to cancel.
Auto-publishes in *2 hours* if no response.
"""
timeout_action = "approve" # Auto-approve after timeout (not reject)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The timeout_action is set to approve, which means content will be automatically published if not reviewed within 2 hours. This could lead to unintended or erroneous content being published without explicit approval. For safety, it's generally better to default to reject on timeout to prevent accidental publications. This ensures that a human must explicitly approve the content before it goes live.

timeout_action = "reject"       # Auto-reject after timeout (safer default)

Comment on lines +56 to +57
sleep 120
curl -s -o /dev/null -w "%{http_code}" https://www.mymoneycoach.ai/blog/{slug}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a fixed sleep 120 to wait for deployment propagation is brittle, as deployment times can vary. A more robust approach is to poll the URL in a loop with a timeout until it returns a 200 status code. This avoids both unnecessary waiting and failures due to longer-than-expected deployment times.

Suggested change
sleep 120
curl -s -o /dev/null -w "%{http_code}" https://www.mymoneycoach.ai/blog/{slug}
URL="https://www.mymoneycoach.ai/blog/{slug}"
for i in {1..12}; do # Poll for 2 minutes (12 * 10s)
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "$URL")
if [ "$HTTP_CODE" -eq 200 ]; then
echo "Deployment live!"
break
fi
echo "Attempt $i/12: Not live yet (HTTP $HTTP_CODE). Retrying in 10s..."
sleep 10
done

@ryaker

ryaker commented Mar 25, 2026

Copy link
Copy Markdown
Owner Author

Superseded by #171 (clean cherry-pick onto main)

@ryaker ryaker closed this Mar 25, 2026
Comment on lines +31 to +35
const E2E_CONFIG_FIXTURE = path.join(REPO_ROOT, 'tests', 'fixtures', 'e2e-config.toml');
const E2E_POLICY_FIXTURE = path.join(REPO_ROOT, 'tests', 'fixtures', 'e2e-policy.toml');

// The Orchestrator always writes sessions to ~/.zora/sessions (its baseDir defaults
// to os.homedir()/.zora regardless of ZORA_CONFIG_DIR). Tests track this dir.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: ZORA_REAL_PROVIDERS=1 is never read, so test:e2e:real still loads the echo-only fixture and never exercises real providers. Select the fixture file based on ZORA_REAL_PROVIDERS so the real-provider path is actually tested. [logic error]

Severity Level: Major ⚠️
- ⚠️ `test:e2e:real` does not validate Claude/Gemini integration.
- ⚠️ Real-provider regressions can ship undetected.
Suggested change
const E2E_CONFIG_FIXTURE = path.join(REPO_ROOT, 'tests', 'fixtures', 'e2e-config.toml');
const E2E_POLICY_FIXTURE = path.join(REPO_ROOT, 'tests', 'fixtures', 'e2e-policy.toml');
// The Orchestrator always writes sessions to ~/.zora/sessions (its baseDir defaults
// to os.homedir()/.zora regardless of ZORA_CONFIG_DIR). Tests track this dir.
const USE_REAL_PROVIDERS = process.env['ZORA_REAL_PROVIDERS'] === '1';
const E2E_CONFIG_FIXTURE = path.join(
REPO_ROOT,
'tests',
'fixtures',
USE_REAL_PROVIDERS ? 'e2e-config-real.toml' : 'e2e-config.toml',
);
Steps of Reproduction ✅
1. Run `npm run test:e2e:real`; script sets `ZORA_REAL_PROVIDERS=1` in `package.json:28`.

2. Test constants still hardcode `e2e-config.toml` at
`tests/e2e/scenario-harness.test.ts:31`; no runtime read of `ZORA_REAL_PROVIDERS`.

3. `createTempZoraDir()` copies that fixture into temp config at
`tests/e2e/scenario-harness.test.ts:72`.

4. The copied fixture defines only echo providers (`tests/fixtures/e2e-config.toml:7-20`).

5. CLI provider factory builds `EchoProvider` for `type='echo'` at
`src/cli/index.ts:60-80`, so real-provider route is never exercised.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/e2e/scenario-harness.test.ts
**Line:** 31:35
**Comment:**
	*Logic Error: `ZORA_REAL_PROVIDERS=1` is never read, so `test:e2e:real` still loads the echo-only fixture and never exercises real providers. Select the fixture file based on `ZORA_REAL_PROVIDERS` so the real-provider path is actually tested.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

const globalZoraDir = path.join(os.homedir(), '.zora');
fs.mkdirSync(globalZoraDir, { recursive: true });
fs.copyFileSync(E2E_POLICY_FIXTURE, globalPolicyPath);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The test creates ~/.zora/policy.toml when missing but never removes it, leaving a permissive test policy in the developer's real home directory after tests complete. Add cleanup for the file created by the harness to avoid persistent global security-state mutation. [security]

Severity Level: Major ⚠️
- ⚠️ Leaves persistent `~/.zora/policy.toml` after tests.
- ⚠️ Later local runs inherit permissive test policy.
Suggested change
}
process.once('exit', () => {
try {
fs.rmSync(globalPolicyPath, { force: true });
} catch {
// Best-effort cleanup
}
});
Steps of Reproduction ✅
1. Start with no `~/.zora/policy.toml`; `createTempZoraDir()` checks and writes one at
`tests/e2e/scenario-harness.test.ts:75-79`.

2. Written file is permissive test policy (`tests/fixtures/e2e-policy.toml:1,22` allows
broad access).

3. Test teardown only removes temp dirs (`tests/e2e/scenario-harness.test.ts:249-252`),
not global policy.

4. Future CLI runs load global policy as required base from
`src/config/policy-loader.ts:102-107`, so test-created policy persists beyond test
execution.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/e2e/scenario-harness.test.ts
**Line:** 80:80
**Comment:**
	*Security: The test creates `~/.zora/policy.toml` when missing but never removes it, leaving a permissive test policy in the developer's real home directory after tests complete. Add cleanup for the file created by the harness to avoid persistent global security-state mutation.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

const newFiles = sessionFilesNewerThan(sinceMs);
expect(newFiles.length, 'Expected a new session file').toBeGreaterThan(0);

const events = parseJsonl(newFiles[0]!);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Selecting newFiles[0] can read an unrelated session when other Zora runs create files in the same global sessions directory, causing false routing assertions. Match the session file by the scenario prompt before asserting provider source. [logic error]

Severity Level: Major ⚠️
- ⚠️ Scenario 3 can fail from unrelated session writes.
- ⚠️ Provider-routing check becomes flaky and nondeterministic.
Suggested change
const events = parseJsonl(newFiles[0]!);
const targetPrompt = 'summarize this task for the evaluator';
const matchingFile = newFiles.find((file) => {
const events = parseJsonl(file);
return events.some((e) => {
if (e['type'] !== 'task.start') return false;
const content = e['content'] as { task?: unknown } | undefined;
return typeof content?.task === 'string' && content.task === targetPrompt;
});
});
expect(matchingFile, 'Expected a session file for this scenario prompt').toBeDefined();
const events = parseJsonl(matchingFile!);
Steps of Reproduction ✅
1. Scenario 3 writes one task via `spawnAsk('summarize...')` at
`tests/e2e/scenario-harness.test.ts:317-320`.

2. It gathers all `~/.zora/sessions/*.jsonl` newer than timestamp
(`sessionFilesNewerThan`, `tests/e2e/scenario-harness.test.ts:169-177`) and sorts by
mtime.

3. Session directory is global (`GLOBAL_SESSIONS_DIR`,
`tests/e2e/scenario-harness.test.ts:36`) and shared with any other `zora-agent ask`
process.

4. Assertion parses only `newFiles[0]` (`tests/e2e/scenario-harness.test.ts:327`), so any
unrelated newer session can be selected.

5. This can mis-assert provider routing at `tests/e2e/scenario-harness.test.ts:333`
against the wrong task file.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/e2e/scenario-harness.test.ts
**Line:** 327:327
**Comment:**
	*Logic Error: Selecting `newFiles[0]` can read an unrelated session when other Zora runs create files in the same global sessions directory, causing false routing assertions. Match the session file by the scenario prompt before asserting provider source.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

const newFiles = sessionFilesNewerThan(sinceMs);
expect(newFiles.length, 'Expected a session file from fallback run').toBeGreaterThan(0);

const events = parseJsonl(newFiles[0]!);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This failover check also parses only the newest session file, which can belong to a different run in the shared global sessions directory and produce false failures. Resolve the session by matching the task payload first. [logic error]

Severity Level: Major ⚠️
- ⚠️ Scenario 4 failover check can target wrong session.
- ⚠️ E2E failover signal becomes flaky on shared machines.
Suggested change
const events = parseJsonl(newFiles[0]!);
const targetPrompt = 'Write a function that counts characters';
const matchingFile = newFiles.find((file) => {
const events = parseJsonl(file);
return events.some((e) => {
if (e['type'] !== 'task.start') return false;
const content = e['content'] as { task?: unknown } | undefined;
return typeof content?.task === 'string' && content.task === targetPrompt;
});
});
expect(matchingFile, 'Expected a session file for fallback scenario prompt').toBeDefined();
const events = parseJsonl(matchingFile!);
Steps of Reproduction ✅
1. Scenario 4 runs failover prompt at `tests/e2e/scenario-harness.test.ts:349-352`.

2. It then reads all newer files from shared `~/.zora/sessions` using
`sessionFilesNewerThan()` (`tests/e2e/scenario-harness.test.ts:169-177`).

3. It parses only newest file `newFiles[0]` (`tests/e2e/scenario-harness.test.ts:360`) and
checks for `echo-evaluator`.

4. If another Zora process writes a newer session in the same window, parsed events belong
to another task, producing false failover failures.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/e2e/scenario-harness.test.ts
**Line:** 360:360
**Comment:**
	*Logic Error: This failover check also parses only the newest session file, which can belong to a different run in the shared global sessions directory and produce false failures. Resolve the session by matching the task payload first.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

@codeant-ai

codeant-ai Bot commented Mar 25, 2026

Copy link
Copy Markdown

CodeAnt AI finished reviewing your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants