From 198612cf6a8a7ec046e8de420b5cc5ef1edb95b7 Mon Sep 17 00:00:00 2001 From: Nick Beaugeard Date: Thu, 23 Apr 2026 10:33:10 +1000 Subject: [PATCH 1/4] Updated Nick's Notes --- AGENTS.md | 281 ++++++++++++++++++++++++------------------- docs/requirements.md | 207 +++++++++++++++++++++++++++---- 2 files changed, 335 insertions(+), 153 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index e0a43d0..520502e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,127 +1,154 @@ -# AGENTS.md - -## Mission - -Implement and maintain Symphony according to `SPEC.md` using: - -- `.NET 10` (`net10.0`) -- SQLite for durable orchestrator state -- GitHub as the source of truth for issues, pull requests, milestones, and repository versioning - -## Source of Truth - -- Functional behavior: `SPEC.md` -- Implementation sequencing: `IMPLEMENTATION_PLAN.md` -- If plan and spec conflict, follow `SPEC.md` and update the plan. - -## Current Product Decisions - -Locked on 2026-03-05: - -1. Ship Worker + HTTP API in v1. -2. Use EF Core + SQLite. -3. Design for multi-instance safety. -4. Use GitHub PAT auth in v1. -5. Candidate filter includes state + label + milestone. -6. Dispatch only issues (no PR-only work items). -7. Success state uses GitHub issue `Closed`. -8. Use shared clone + Git worktrees for per-issue workspaces. -9. Use permissive auto-approve policy in v1. -10. Ship `github_graphql` extension in v1. -11. Default capacity: `5` agents, `10` minute poll interval. -12. Run as Windows Service in target deployment. - -## Non-Negotiable Constraints - -1. Do not weaken safety constraints from sections 9, 10, and 15 of the spec. -2. Keep workspace path containment checks mandatory. -3. Keep protocol parsing strict on stdout; never parse stderr as protocol events. -4. Never log secrets (`GITHUB_TOKEN`, workflow secrets, auth headers). -5. Track writes are agent/tool driven; do not add hidden orchestrator-side business writes unless explicitly requested. - -## Architecture Rules - -1. Keep clear boundaries: -- Core domain/orchestration logic must not depend on concrete infra APIs. -- Infrastructure adapters implement interfaces from core. - -2. Prefer composition over shared mutable globals: -- Orchestrator state must be explicit and testable. -- Background services should use scoped dependencies per tick/run. - -3. Persistence: -- Use SQLite with migrations. -- Persist only state needed for recovery, observability, and debugging. -- Include DB-backed lease/claim semantics for multi-instance safety. - -4. GitHub integration: -- Use GraphQL endpoint by default (`https://api.github.com/graphql`). -- Normalize all tracker payloads to the spec domain model before use. -- Use PAT auth for v1. -- Filter candidates by configured state + label + milestone. -- Exclude PR-only items from dispatch. - -## Coding Standards - -- C# latest language version supported by .NET 10 SDK. -- Nullable reference types enabled. -- Async all the way for I/O paths. -- Cancellation tokens respected in polling, subprocess, and HTTP calls. -- Keep classes focused and small; split large orchestration behaviors into feature services. -- Prefer built-in ASP.NET Core and .NET primitives over third-party packages unless justified. - -## Config and Options - -- Read workflow from `WORKFLOW.md`. -- Resolve `$ENV_VAR` values in config. -- Fail fast on invalid required config. -- Validate options at startup and before dispatch cycles where required by spec. -- Default to `max_concurrent_agents=5` and `polling.interval_ms=600000` unless explicitly overridden. - -## Testing Expectations - -Minimum for any non-trivial change: - -1. Unit tests for business logic and state transitions. -2. Integration tests for infra boundaries touched (SQLite, GitHub adapter, protocol client). -3. Update/add conformance tests mapped to section 17 of `SPEC.md`. - -Prefer SQLite-backed integration tests over fake in-memory DB providers. - -## Observability Expectations - -- Structured logs with issue/session correlation fields. -- Clear event names for dispatch, retry, stop, cleanup, and protocol errors. -- Snapshot/status output must be derived from orchestrator state, not ad hoc caches. -- Include lease ownership/heartbeat visibility for multi-instance troubleshooting. - -## Delivery Workflow - -1. Start by citing relevant spec section(s) in PR description. -2. Implement smallest vertical slice that can be validated. -3. Add or update tests. -4. Run build + tests locally before handing off. -5. Document behavior changes in `README.md` or `docs/` when applicable. - -## Suggested Commands - -```powershell -dotnet restore -dotnet build -dotnet test -``` - -If migrations are used: - -```powershell -dotnet ef migrations add --project src/Symphony.Infrastructure/Persistence.Sqlite -dotnet ef database update --project src/Symphony.Infrastructure/Persistence.Sqlite -``` - -## Definition of Done (Per Change) - -1. Behavior aligns with `SPEC.md`. -2. Tests prove the behavior or failure mode. -3. Logging and error paths are explicit. -4. No secrets exposed. -5. Reviewer can trace change from spec clause to implementation. +# AGENTS.md + +This file is the operating contract for any agent — human or AI — making changes in this repository. Read it before making a change. If it conflicts with a user instruction, surface the conflict and ask before proceeding. + +## 1. Mission + +This repository hosts two concerns that must not be conflated: + +1. **Primary product:** the semantic test mining platform specified in [docs/requirements.md](docs/requirements.md), with conceptual background in [docs/concept.md](docs/concept.md). New application work targets this product. +2. **Retained tooling:** Symphony (`SPEC.md`, `IMPLEMENTATION_PLAN.md`, `WORKFLOW.md`, `src/Symphony.*`, `tests/Symphony.*`, `symphony_docs/`). Symphony is preserved because it is used *by* this project; it is not the product described by `docs/requirements.md`. + +Every change must declare, in its pull request description, which of the two concerns it targets. Cross-boundary changes are split into separate pull requests. + +## 2. Sources of Truth + +In order of precedence, for test mining platform work: + +1. `docs/requirements.md` (product specification) +2. `docs/concept.md` (design rationale; informative, not normative) +3. `README.md` (user-facing framing) + +For Symphony work: + +1. `SPEC.md` +2. `IMPLEMENTATION_PLAN.md` +3. `WORKFLOW.md` + +If a spec and a plan disagree, the spec wins and the plan is updated. If requirements and Symphony docs disagree about repository direction, `docs/requirements.md` wins for product work; Symphony docs continue to govern Symphony work. + +## 3. Non-Negotiable Guardrails + +These are hard stops. Do not weaken them without an explicit, documented decision. + +1. **Never persist, log, or emit captured secrets.** This includes `GITHUB_TOKEN`, workflow secrets, target-application credentials, session cookies, Playwright storage states, bearer tokens, and anything classified as `Sensitive`. If you are unsure whether a value is sensitive, treat it as sensitive. +2. **Never launch a Playwright browser context against a target URL absent from the administrator-managed allow-list** (requirements §12.4). The allow-list check is server-side and runs before context creation. +3. **Never encrypt-at-rest bypass.** Storage states, cookies, cached auth material, and sensitive scenario variables are encrypted at rest (requirements §12.5). No code path may write them plaintext. +4. **Never edit generated source files by hand and commit the result.** Generated output is a derived artefact. Regenerate from scenario data and templates (requirements FR-GEN-001). +5. **Never auto-apply healing to a persisted scenario in v1.** Healing produces proposals; humans approve (requirements FR-HEAL-003). +6. **Never let AI be the runtime-critical path.** AI assists; deterministic logic decides (requirements principle 3). +7. **Never mutate Symphony files from a test-mining task, or vice versa.** Scope discipline is enforced at PR review. +8. **Never parse stderr as protocol events, and never parse secrets out of logs.** (Inherited Symphony constraint retained for Symphony work.) + +## 4. Architecture Rules + +1. Clear boundaries. Core domain and scenario contracts must not depend on concrete infrastructure APIs. Infrastructure adapters implement interfaces declared in core. +2. Ports and adapters for the big four external systems: Playwright, persistence, artefact storage, identity provider. Each has a core-defined interface and an isolated adapter project. +3. Persistence: + - EF Core with migrations. + - PostgreSQL is the v1 default provider (requirements §6.2). SQL Server is a deferred option behind the same abstraction. + - No provider-specific SQL in domain or application layers. Isolated, gated, and justified in infrastructure only. + - Optimistic concurrency (`RowVersion`) on mutable aggregates; soft delete on `Scenario`, `RecordingSession`, `GenerationArtifact`, and artefact records (requirements §7.2.0). +4. Recording engine: + - Playwright-native instrumentation hooks (`AddInitScriptAsync` and exposed bindings) are the primary event transport. Console scraping is a bounded fallback only (FR-REC-009). + - Recorder script ships from source control with a pinned `RecorderVersion` surfaced in captures (requirements §22.4). No runtime fetch from third-party origins. +5. Replay engine: + - Isolated browser contexts per run (FR-REP-005). + - Deterministic heuristics first; AI advisory at most. +6. Generation: + - Scenarios are the source of truth. Output is reproducible (FR-GEN-001, FR-GEN-010). + - Snapshot/approval tests for generator output (FR-GEN, §14.4). + +## 5. Coding Standards + +- C# language version supported by the SDK pinned in `Directory.Build.props` / `dotnet-tools.json`. Do not bump the SDK without an explicit, reviewed change. +- Nullable reference types enabled. +- Async all the way for I/O; cancellation tokens propagated through polling, subprocess, and HTTP. +- Analyzer warnings treated as errors in core projects. +- Keep classes focused and small. Split large orchestration into feature services. +- Prefer built-in ASP.NET Core and .NET primitives over third-party packages unless a concrete justification is recorded. + +## 6. Security Hygiene + +1. Secret-like strings must not appear in diffs. CI secret-scanning runs on every PR (requirements §22.3). +2. Any new captured data field or outbound network call is called out explicitly in the PR description. +3. Dependency vulnerability checks run before merge. +4. Structured logs include correlation IDs (recording session, scenario version, replay run, user) but never raw event payloads that could contain sensitive values. +5. Masking policy changes are retroactive: changing a classification to `Sensitive` must re-mask prior previews and re-generation (requirements §12.5). + +## 7. Testing Expectations + +Minimum bar for any non-trivial change: + +1. Unit tests for business logic and state transitions. +2. Integration tests for infrastructure boundaries touched. Use real PostgreSQL (Testcontainers) for persistence integration — not an in-memory provider. +3. Conformance tests tagged with the requirement identifier (e.g. `FR-REC-004`) they cover. +4. Generator changes require snapshot tests (FR-GEN, §14.4). +5. Flaky tests are fixed or quarantined within one working day; persistent quarantine is not acceptable. +6. For Symphony work, keep or extend the `SPEC.md` section-17 conformance tests. + +## 8. Configuration and Options + +- Read target URL allow-lists, retention policies, and generation defaults from the configuration surface described in requirements §11. +- Resolve `$ENV_VAR` values in configuration. +- Fail fast on invalid required configuration at startup or on administrative save. +- Keep configurable defaults aligned with requirements §22 (build) and §7.4 (retention). + +## 9. Observability + +- Structured logs with recording session id, scenario id, scenario version id, replay run id, user id, browser context id (requirements §13.1). +- Operational metrics per §13.2 — sufficient to verify the performance targets in §15.2. +- Snapshot/status surfaces derived from orchestrator or scenario engine state, not ad hoc caches. +- Artefact traceability: every screenshot, trace, DOM snapshot, and generated bundle links back to the session, version, or replay run that produced it. + +## 10. Delivery Workflow + +1. Cite the relevant requirement identifier(s) (e.g. `FR-REC-004`) in the PR description. +2. State which of the two concerns from §1 this change targets. +3. Implement the smallest vertical slice that demonstrates the requirement. +4. Add or update tests mapped to the cited requirements. +5. Run `dotnet build` and `dotnet test` locally before handing off. +6. Update `README.md` or `docs/` if user-visible behaviour changed. +7. Keep PRs scoped. Refactors that cross the Symphony/test-mining boundary are split. + +## 11. Suggested Commands + +```powershell +dotnet restore +dotnet build +dotnet test +``` + +Database migrations (platform persistence project, once introduced): + +```powershell +dotnet ef migrations add --project src/TestMining.Platform.Persistence +dotnet ef database update --project src/TestMining.Platform.Persistence +``` + +Symphony migrations (retained tooling): + +```powershell +dotnet ef migrations add --project src/Symphony.Infrastructure/Persistence.Sqlite +dotnet ef database update --project src/Symphony.Infrastructure/Persistence.Sqlite +``` + +## 12. Definition of Done (Per Change) + +1. Behaviour aligns with `docs/requirements.md` (or `SPEC.md` for Symphony work). +2. Tests prove the behaviour or failure mode, tagged with requirement identifiers. +3. Logging and error paths are explicit. No raw sensitive values in logs. +4. No secrets, keys, or tokens in diffs or output. +5. Reviewer can trace the change from requirement clause to implementation to test. +6. Retention, security, or captured-data surface changes are explicitly called out in the PR. +7. Generated code is reproducible from scenario data plus template version; no hand-edits committed. + +## 13. Escalation + +If a requirement is ambiguous, silent on an edge case, or appears to contradict another section: + +1. Stop. Do not guess. +2. Raise the question in the PR description or as a clarifying note. +3. Record the resolution either by updating `docs/requirements.md` under the originating section or by adding an architecture decision record under `docs/adr/` once that directory exists. + +Shortcuts that trade safety, auditability, or reproducibility for speed are not acceptable. When an obstacle appears, identify the root cause rather than bypassing a guardrail. diff --git a/docs/requirements.md b/docs/requirements.md index 30aa532..8979e47 100644 --- a/docs/requirements.md +++ b/docs/requirements.md @@ -249,14 +249,16 @@ The implementation shall use: - PostgreSQL or SQL Server as the durable database - Roslyn and/or Scriban for code generation -### 6.2 Database Provider Portability +### 6.2 Database Provider Strategy -The application shall support PostgreSQL and SQL Server as first-class database providers. To make this feasible: +To make v1 deliverable without doubling persistence cost, the platform shall adopt the following strategy: +- PostgreSQL shall be the **primary and only required provider for v1**. Local development, shared non-production, and production-like deployments shall all use PostgreSQL unless explicitly reconfigured. +- SQL Server shall be a **deferred-but-supported secondary provider**. The persistence layer shall be designed so that SQL Server can be added later without redesigning the domain or repositories, but v1 acceptance does not require a working SQL Server deployment. - The persistence layer shall use EF Core abstractions and migrations. -- Provider-specific SQL shall be avoided unless isolated behind clearly bounded infrastructure components. -- Data types, indexing strategies, JSON storage, and concurrency rules shall be designed to work on both supported providers. -- One provider may be chosen as the default local-development profile, but the schema and data-access patterns must remain portable. +- Provider-specific SQL shall be avoided unless isolated behind clearly bounded infrastructure components and gated by a provider capability check. +- Data types, indexing strategies, JSON storage, and concurrency rules shall be designed to remain portable to SQL Server. Use of PostgreSQL-only features (for example `jsonb` operators, array columns, partial indexes) is permitted in v1 provided the feature can be substituted or approximated on SQL Server later. +- Integration tests shall run against real PostgreSQL (for example Testcontainers) rather than an in-memory provider, because in-memory providers do not exercise migration or JSON behaviour. ### 6.3 Browser Support @@ -272,6 +274,18 @@ The canonical persisted business artefact shall be `Scenario`. Generated tests, The persistence model shall include, at minimum, these entities. +#### 7.2.0 Common Entity Requirements + +Unless explicitly exempted, every persistent entity defined in this section shall carry: + +- A stable primary key (`Id`) that is globally unique (GUID/UUID) and assigned by the application, not the database. +- Audit fields: `CreatedAtUtc`, `CreatedByUserId`, `UpdatedAtUtc`, `UpdatedByUserId` (the last two may be nullable where the entity is immutable by design, such as `ScenarioVersion` and `RecordedStep`). +- A `RowVersion` (optimistic concurrency token) column on entities that are editable by more than one workflow, including `Scenario`, `RecordingSession`, `HealingSuggestion`, and administrative configuration entities. +- A soft-delete marker (`IsDeleted`, `DeletedAtUtc`, `DeletedByUserId`) on `Scenario`, `RecordingSession`, `GenerationArtifact`, and artefact records. Immutable versions (e.g. `ScenarioVersion`, `RecordedStep`) shall not support soft delete; deletion of a scenario shall be a cascade-tombstoning operation documented in the data model. +- Timestamps stored in UTC; the domain shall not depend on database-local clocks for ordering decisions beyond audit. + +Field lists in the subsections below name the entity-specific fields and may omit the common fields above for brevity. + #### 7.2.1 RecordingSession Represents an interactive capture session started by a user. @@ -470,7 +484,23 @@ Required fields: - `AppliedAtUtc` - `AppliedByUserId` -#### 7.2.10 GenerationArtifact +#### 7.2.10 UserAccount + +Represents an authenticated principal of the platform. Identity data may be federated from an external provider, but the platform shall maintain a local record to support authorship attribution, audit trails, and role assignment. + +Required fields: + +- `Id` +- `ExternalSubjectId` (nullable; populated when federated) +- `DisplayName` +- `Email` +- `Role` (at minimum one of: `Administrator`, `Author`, `Viewer`) +- `IsDisabled` +- `LastLoginAtUtc` + +All audit references in other entities (for example `StartedByUserId`, `CreatedByUserId`, `TriggeredByUserId`) shall resolve to `UserAccount.Id`. + +#### 7.2.11 GenerationArtifact Represents a generated bundle or file set. @@ -534,6 +564,16 @@ Each artefact shall have: - Storage path or provider-specific handle - Created timestamp - Optional checksum +- A size-in-bytes record + +The platform shall enforce default retention and size ceilings so artefact storage does not grow unbounded in the absence of administrative action: + +- Screenshots and DOM snapshots: retained for at least 30 days after the owning session or replay run completes; individual artefact size cap of 10 MB unless an administrator raises it. +- Playwright traces: retained for at least 14 days; individual trace cap of 200 MB. +- Session logs and replay diagnostic bundles: retained for at least 30 days. +- Generated source bundles: retained until the owning scenario is deleted. + +Administrators may extend any retention window or raise any size cap through the configuration surface defined in Section 11. A background cleanup process shall delete artefacts that have exceeded their retention window and record the deletion in the audit trail. ## 7.5 State Model Expectations @@ -937,7 +977,15 @@ Every healing proposal shall include evidence showing: #### FR-HEAL-003 Human Approval -Healing changes that alter the persisted scenario shall require explicit approval unless an administrator enables a controlled auto-apply policy for deterministic high-confidence matches. +In v1, every healing change that alters the persisted scenario shall require explicit human approval. Automatic application of healing proposals to persisted scenarios shall not ship in v1. + +A controlled auto-apply policy for deterministic high-confidence matches may be introduced in a later phase, subject to all of the following: + +- The feature shall be disabled by default. +- Auto-apply shall be gated behind an administrator-only setting scoped per environment (never globally by default). +- Auto-apply shall be restricted to `SuggestionType` values classified as locator-only, deterministic, and non-destructive. Changes to assertion semantics, step ordering, variable classification, or sensitive-data policy shall never be auto-applied. +- A dry-run mode shall be available that records what would have been applied without mutating the scenario. +- Every auto-applied change shall still create a new immutable `ScenarioVersion` with a traceable `HealingSuggestion` link. #### FR-HEAL-004 AI as Adviser @@ -1129,11 +1177,24 @@ The system shall allow masking rules by field name, selector metadata, or manual - Stored event payloads - Generation previews -### 12.4 Browser Session Isolation +### 12.4 Target URL Allow-List Enforcement + +The platform shall enforce an administrator-managed allow-list of target base URLs. A recording or replay session shall fail to start if its requested target URL does not match an entry on the allow-list for the selected environment. The allow-list check shall be performed server-side before any Playwright browser context is launched. The allow-list shall be auditable and changes shall be captured in the audit trail defined in Section 12.6. + +### 12.5 Encryption of Captured Session Material + +The platform captures material that is sensitive by construction: authentication cookies, Playwright storage states, raw event payloads containing user input, and user-authored scenario variables marked sensitive. The following protections shall apply: + +- Playwright storage states, cookies, and any cached authentication material shall be encrypted at rest using platform-managed keys sourced from a secure secret provider. +- Scenario variables classified as `Sensitive` shall be encrypted at rest. Their plaintext shall never be written to generated source, generation previews, logs, or UI previews. +- Raw event payloads shall have sensitive fields masked or encrypted at rest according to the `MaskedFieldPolicyJson` in force at recording time. A field whose classification is changed to sensitive after recording shall be retroactively masked in subsequent previews and re-generation. +- Key material shall not be checked into source control. Key rotation shall be supported at least by re-encrypting affected records during a controlled administrative operation. + +### 12.6 Browser Session Isolation Each recording or replay session shall execute in an isolated Playwright browser context unless explicitly configured otherwise by an administrator. -### 12.5 Audit Logging +### 12.7 Audit Logging The system shall maintain an audit trail for: @@ -1218,11 +1279,14 @@ The platform shall favour deterministic execution. Generated tests and replay fl ### 15.2 Performance -The system shall feel responsive for interactive authoring. Initial targets: +The system shall feel responsive for interactive authoring. Initial targets, measured on a reference developer workstation (quad-core modern CPU, SSD, local PostgreSQL), expressed as percentiles over a rolling sample of at least 20 operations of the same type: + +- Time from "Start recording" click to Playwright page navigation ready: p50 ≤ 5 s, p95 ≤ 10 s. +- End-to-end latency from a captured browser event to its corresponding step appearing in the timeline: p50 ≤ 750 ms, p95 ≤ 2 s. +- Code generation for a scenario of up to 100 meaningful steps: p50 ≤ 8 s, p95 ≤ 15 s. +- Replay start-up overhead (time from replay trigger to first step execution) shall not exceed 10 s at p95. -- Start a recording session within 10 seconds under normal local conditions -- Render timeline updates to the UI within 2 seconds of capture under normal load -- Generate code for a typical scenario of up to 100 meaningful steps within 15 seconds +These are product-quality targets, not contractual SLAs. Observability required under Section 13 shall be sufficient to verify them post-deployment. ### 15.3 Scalability @@ -1342,20 +1406,47 @@ Phase 4 may add: - Healing proposal assistance - Page object refactoring suggestions -## 19. Acceptance Criteria +### 18.5 Phase Exit Criteria + +A phase shall not be declared complete until the criteria below are demonstrably met. These are conformance gates, not aspirational targets. + +Phase 1 exit: +- A user can authenticate, record a simple CRUD workflow end-to-end against a fixture web application, and see the persisted scenario. +- Locator candidates are ranked and persisted for every targetable step. +- Generated C# Playwright output compiles against the emitted helper library. +- Replay executes the generated scenario against the same fixture and reports pass/fail per step. +- URL allow-list enforcement (Section 12.4) and encryption of storage state (Section 12.5) are enforced. -The implementation shall be considered to satisfy this specification only when all of the following are demonstrably true: +Phase 2 exit: +- Assertion inference produces at least one outcome-oriented assertion suggestion for every save/submit/navigate step in the Phase 1 fixture library. +- Variable classification is editable in the UI and round-trips through regeneration without loss. +- Replay diagnostics bundles include screenshots at each failure, locator resolution attempts, and a failure category. -1. A user can record a real browser workflow through the UI and the system persists a structured scenario. -2. The scenario can be reviewed, edited, and approved without direct database manipulation. -3. The system generates readable C# Playwright artefacts from the approved scenario. -4. The generated or scenario-derived replay executes through Playwright and produces useful diagnostics. -5. Locator candidates are ranked and persisted rather than reduced to a single opaque selector. -6. Sensitive values can be masked and remain masked across previews and logs. -7. Assertions are outcome-oriented and can be approved or rejected before generation. -8. A locator drift failure can produce a deterministic healing proposal with evidence. -9. The application runs on ASP.NET Core with a Blazor Server frontend and a PostgreSQL or SQL Server-backed persistence layer. -10. Generated output remains reproducible from scenario data and generation settings. +Phase 3 exit: +- Login bootstrap options (Section FR-ADM-002) are demonstrable end-to-end. +- Healing review workflow can approve a deterministic locator healing proposal and create a new immutable scenario version referencing the originating replay run. +- Artefact retention cleanup (Section 7.4) runs on schedule and is observable. + +Phase 4 exit (if undertaken): +- Any AI-assisted suggestion is always advisory, never auto-applied, and is labelled as AI-sourced in the UI and audit trail. +- Deterministic diagnostics continue to run and are presented alongside any AI suggestion. + +## 19. Acceptance Criteria + +The implementation shall be considered to satisfy this specification only when all of the following are demonstrably true and covered by at least one automated test traceable to the cited requirement identifiers. + +1. A user can record a real browser workflow through the UI and the system persists a structured scenario. [FR-REC-001, FR-REC-004, FR-REC-010] +2. The scenario can be reviewed, edited, and approved without direct database manipulation. [FR-AUTH-001 through FR-AUTH-006] +3. The system generates readable C# Playwright artefacts from the approved scenario. [FR-GEN-001 through FR-GEN-010] +4. The generated or scenario-derived replay executes through Playwright and produces useful diagnostics. [FR-REP-001, FR-REP-002, FR-REP-003] +5. Locator candidates are ranked and persisted rather than reduced to a single opaque selector. [FR-INF-004, FR-INF-005] +6. Sensitive values can be masked and remain masked across previews and logs, and are encrypted at rest where stored. [FR-REC-007, 12.3, 12.5] +7. Assertions are outcome-oriented and can be approved or rejected before generation. [FR-INF-007, FR-AUTH-003] +8. A locator drift failure can produce a deterministic healing proposal with evidence and require human approval before altering a scenario in v1. [FR-HEAL-001 through FR-HEAL-005] +9. The application runs on ASP.NET Core with a Blazor Server frontend and a PostgreSQL-backed persistence layer, with SQL Server pluggability preserved as a deferred option. [6.1, 6.2] +10. Generated output remains reproducible bit-for-bit (modulo timestamps declared as non-deterministic) from scenario data plus generation profile plus template version. [FR-GEN-001, FR-GEN-010] +11. Recording and replay refuse to start against target URLs not present on the administrator-managed allow-list. [12.4] +12. Observability emits the identifiers listed in 13.1 and permits verification of the performance targets in 15.2. ## 20. Risks and Constraints @@ -1390,6 +1481,70 @@ The following decisions remain implementation-level choices unless later locked - Which authentication provider is used for the host application - Which database provider is used as the primary development default -## 22. Final Requirement Statement +## 22. Build, Test, and Delivery Requirements + +### 22.1 Repository Boundaries + +The test mining platform shall be introduced into this repository as a new vertical slice using the `TestMining.Platform.*` naming convention described in Section 17. Existing Symphony projects (`src/Symphony.*`, `tests/Symphony.*`, `symphony_docs/`, and `SPEC.md`) are retained tooling assets and shall not be mutated by work targeting this specification unless a task explicitly requires it. + +### 22.2 Build System + +- The solution shall build with the .NET SDK pinned by the repository's existing tooling manifests (`Directory.Build.props`, `dotnet-tools.json`). Any SDK upgrade shall be an explicit, documented change. +- `dotnet restore`, `dotnet build`, and `dotnet test` shall succeed from a clean checkout with no unresolved warnings treated as errors in core projects. +- Projects shall enable nullable reference types and treat analyzer warnings as errors where practical. + +### 22.3 Continuous Integration + +The repository shall include a CI pipeline (GitHub Actions is the expected host) that, on every pull request touching the platform: + +1. Restores and builds the full solution. +2. Runs unit and integration tests, including PostgreSQL-backed integration tests using a containerised database. +3. Runs generator snapshot tests (Section 14.4). +4. Publishes test results and code coverage summaries. +5. Fails the pipeline if any new secret-looking string appears in diffs or if retention/policy violations are detected by linting rules once established. + +### 22.4 Recorder Script Delivery + +The browser-side recorder script is a first-class build artefact: + +- It shall live in source control under a clearly named project directory and shall be built deterministically as part of the platform build. +- Its version shall be pinned and surfaced in captured recordings as `RecorderVersion` metadata so replay and healing can detect incompatible captures. +- It shall be injected via Playwright-supported initialisation hooks. Runtime fetching of the recorder from an external origin is prohibited. + +### 22.5 Test Layering and Flake Budget + +- Unit tests shall be the default: deterministic, no network, no browser. +- Integration tests shall exercise real infrastructure (PostgreSQL, Playwright against fixture web applications shipped with the repo). +- End-to-end tests shall be kept small in number and tagged so they can be excluded from fast developer loops. +- Any test that becomes flaky shall be either fixed or quarantined within one working day; persistent quarantine is not acceptable. + +## 23. Implementer Guardrails + +These guardrails exist to prevent the most likely failure modes of agentic or human implementers working on this specification. + +### 23.1 Prohibited Actions + +1. Writing captured credentials, cookies, storage states, tokens, or any value classified as `Sensitive` to logs, screenshots metadata, generation previews, or generated code. +2. Launching a Playwright browser context against a target URL that does not match the configured allow-list. +3. Introducing provider-specific SQL or raw ADO.NET calls into domain or application layers. Provider-specific code shall live in clearly isolated infrastructure components. +4. Editing generated source files by hand and checking the result in. Generated output shall be reproduced from scenario data and templates only. +5. Promoting an AI-assisted suggestion to a runtime-critical path. AI may advise; deterministic logic decides. +6. Deleting or restructuring Symphony-related files while working on test mining platform tasks. +7. Silently relaxing any non-functional requirement in Section 15, security requirement in Section 12, or acceptance criterion in Section 19. + +### 23.2 Required Behaviours + +1. Cite the requirement identifier (e.g. `FR-REC-004`, `FR-HEAL-003`) that motivates a code change in the pull request description. +2. For any behaviour change, update or add a conformance test traceable to the cited requirement. +3. Use feature-flagged rollout for anything that touches healing auto-apply, AI assistance, or cross-environment configuration. +4. When a requirement is ambiguous, raise a clarification rather than guessing. Record the resolution in this document or in an architecture decision record. + +### 23.3 Pull Request Discipline + +1. Pull requests shall be scoped to a single vertical slice. Drive-by refactors, especially across the Symphony/test-mining boundary, shall be split into separate changes. +2. Every PR shall identify new or changed artefact retention characteristics, new captured data fields, or new outbound network calls. +3. Secret-scanning and dependency vulnerability checks shall run before merge. + +## 24. Final Requirement Statement This repository’s new application shall be a C#-based semantic test mining platform built on ASP.NET Core, Blazor Server, Microsoft.Playwright for .NET, and PostgreSQL or SQL Server. It shall record browser interactions, infer structured scenarios, rank resilient locators, generate maintainable Playwright C# tests, replay them with strong diagnostics, and support deterministic healing while keeping structured scenarios as the enduring source of truth. From b9cac1e6260608223c6fb6af5d0256cd5eb12660 Mon Sep 17 00:00:00 2001 From: Nick Beaugeard Date: Thu, 23 Apr 2026 10:38:54 +1000 Subject: [PATCH 2/4] Update AGENTS.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- AGENTS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index 520502e..8dab768 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -61,7 +61,7 @@ These are hard stops. Do not weaken them without an explicit, documented decisio ## 5. Coding Standards -- C# language version supported by the SDK pinned in `Directory.Build.props` / `dotnet-tools.json`. Do not bump the SDK without an explicit, reviewed change. +- C# language version must remain compatible with the repository's configured .NET build settings. Do not change SDK expectations without an explicit, reviewed change. - Nullable reference types enabled. - Async all the way for I/O; cancellation tokens propagated through polling, subprocess, and HTTP. - Analyzer warnings treated as errors in core projects. From 3a2f11a91075871a62b5f7c7fa69bd8614ce9afd Mon Sep 17 00:00:00 2001 From: Nick Beaugeard Date: Thu, 23 Apr 2026 10:39:07 +1000 Subject: [PATCH 3/4] Update docs/requirements.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- docs/requirements.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/requirements.md b/docs/requirements.md index 8979e47..d71df61 100644 --- a/docs/requirements.md +++ b/docs/requirements.md @@ -1179,7 +1179,7 @@ The system shall allow masking rules by field name, selector metadata, or manual ### 12.4 Target URL Allow-List Enforcement -The platform shall enforce an administrator-managed allow-list of target base URLs. A recording or replay session shall fail to start if its requested target URL does not match an entry on the allow-list for the selected environment. The allow-list check shall be performed server-side before any Playwright browser context is launched. The allow-list shall be auditable and changes shall be captured in the audit trail defined in Section 12.6. +The platform shall enforce an administrator-managed allow-list of target base URLs. A recording or replay session shall fail to start if its requested target URL does not match an entry on the allow-list for the selected environment. The allow-list check shall be performed server-side before any Playwright browser context is launched. The allow-list shall be auditable and changes shall be captured in the audit trail defined in Section 12.7. ### 12.5 Encryption of Captured Session Material From c9ccaf7e14506a40991d818de21cca7300ba6f30 Mon Sep 17 00:00:00 2001 From: Nick Beaugeard Date: Thu, 23 Apr 2026 10:39:16 +1000 Subject: [PATCH 4/4] Update docs/requirements.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- docs/requirements.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/requirements.md b/docs/requirements.md index d71df61..92fbddb 100644 --- a/docs/requirements.md +++ b/docs/requirements.md @@ -1489,7 +1489,7 @@ The test mining platform shall be introduced into this repository as a new verti ### 22.2 Build System -- The solution shall build with the .NET SDK pinned by the repository's existing tooling manifests (`Directory.Build.props`, `dotnet-tools.json`). Any SDK upgrade shall be an explicit, documented change. +- The solution shall build with the repository's documented required .NET SDK version. Any SDK upgrade shall be an explicit, documented change. - `dotnet restore`, `dotnet build`, and `dotnet test` shall succeed from a clean checkout with no unresolved warnings treated as errors in core projects. - Projects shall enable nullable reference types and treat analyzer warnings as errors where practical.