Skip to content

feat: add OWASP Agentic Top 10 automated test harness (closes #152)#436

Open
arcgod-design wants to merge 4 commits into
sreerevanth:mainfrom
arcgod-design:feat/issue-152-owasp-test
Open

feat: add OWASP Agentic Top 10 automated test harness (closes #152)#436
arcgod-design wants to merge 4 commits into
sreerevanth:mainfrom
arcgod-design:feat/issue-152-owasp-test

Conversation

@arcgod-design

Copy link
Copy Markdown
Contributor

Summary

Adds a standalone CLI test harness that validates AgentWatch against OWASP Agentic Top 10 exploits, running prompt injections, path traversals, and shell exploit commands through the detection layer.

Changes

  • Created scripts/owasp_test_harness.py — standalone CLI script with 19 OWASP attack vectors
  • Created tests/test_owasp_harness.py — 12 tests covering scanner, red-team, and CLI integration
  • Supports --json output for CI consumption
  • Supports --fail-fast to stop on first critical finding
  • Exits code 0 on pass, code 1 on critical findings

Usage

# Text report
python scripts/owasp_test_harness.py

# JSON for CI
python scripts/owasp_test_harness.py --json

# Stop on first critical finding
python scripts/owasp_test_harness.py --fail-fast

Testing

  • 12/12 harness tests passed
  • 25/25 cost tests passed (no regressions)
  • Script produces clean CLI report and valid JSON output

OWASP Vectors Covered

Vector Category Attack Count
A01 Prompt Injection 2
A02 Tool Abuse 2
A03 Excessive Permissions 2
A04 Unsafe Code Execution 2
A05 Data Exfiltration 2
A06 Goal Hijacking 1
A07 Context Poisoning 2
A08 Trust Boundary 2
A09 Insecure Memory 2
A10 Supply Chain 2

Impact

Provides a repeatable, automated way to validate AgentWatch's security defenses against the OWASP Agentic Top 10. Can be integrated into CI pipelines to catch regressions in safety detection coverage.

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

@arcgod-design, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 10 minutes and 32 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a04d8e7c-440e-4d7b-b446-8add959c9025

📥 Commits

Reviewing files that changed from the base of the PR and between 3b1f4b5 and 4b6cd50.

📒 Files selected for processing (4)
  • agentwatch/cli/main.py
  • scripts/owasp_test_harness.py
  • tests/test_cost.py
  • tests/test_owasp_harness.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

🧪 PR Test Results

Check Result
Tests (pytest tests/) ✅ success
Lint (ruff check .) ✅ success
Coverage (agentwatch) 73.28%

Python 3.12 · commit 4b6cd50

@sreerevanth

Copy link
Copy Markdown
Owner

@arcgod-design merge ready resolve conflicts

@arcgod-design arcgod-design force-pushed the feat/issue-152-owasp-test branch from 9aa027e to 16b6be4 Compare June 18, 2026 20:32
@sreerevanth

Copy link
Copy Markdown
Owner

@arcgod-design fix ci please and join discord

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants