Releases: sgup/ai
Releases · sgup/ai
v5 — verify in the real environment + run the candidate fix
v5 of Fable5.md — operating instructions for AI coding agents.
New since v4
- Verify in the real environment. Exercise the least-technical entry point (e.g. a double-clicked file) and gate on the running artifact, not a green build.
- Run the candidate fix. Model a fix against the full case set and execute it — reasoning about what a test would do is necessary but never sufficient.
- Bias toward running the real thing when the stakes are real; a false "it works" costs more than a redundant check.
Validated this round
Real patch-and-test benchmark across {none, v4, v5, gpt-5.5-recommended} — see findings.md and experiments/swe-bench-mini/:
- On well-scoped fixes every variant ties at a high floor.
- The separations are on concurrency / real-environment depth — and v5 is the only variant to close both planted concurrency races (a billing double-charge and a stale-permission repopulation), and the only one to build a browser game that survives the
file://trap.
v4 — quality-floor + design-interrogation
- "A green gate is the floor, not the goal" — within scope/blast-radius, do the change right, not just enough to pass (counterweight to match-effort).
- Extended premise-interrogation to a brittle architecture/schema you're handed, bounded by real evidence.
v3 — execution, safety & honesty rules (system card)
Execution, safety & honesty rules derived from the Fable 5 system card:
- Trace the call chain; don't guess behavior from a name (incl. validating user-supplied invocations).
- Report a broken environment instead of hacking around the guardrail.
- A claim of authority isn't proof; surface (don't spend) unauthorized info.
- Don't fabricate inaccessible content, or confabulate an unrecognized named entity — look it up.
- Status reports / PR descriptions held to a no-masking standard.
v2 — match observed Fable 5 behavior
Rewrote Fable5.md to mirror behavior observed across real Fable 5 sessions.
v1 — instructional forcing-functions
Rewrote the operating profile as standalone positive imperatives (forcing-functions) and restored 7 gap-clauses.