Skip to content

Releases: sgup/ai

v5 — verify in the real environment + run the candidate fix

17 Jun 08:34

Choose a tag to compare

v5 of Fable5.md — operating instructions for AI coding agents.

New since v4

  • Verify in the real environment. Exercise the least-technical entry point (e.g. a double-clicked file) and gate on the running artifact, not a green build.
  • Run the candidate fix. Model a fix against the full case set and execute it — reasoning about what a test would do is necessary but never sufficient.
  • Bias toward running the real thing when the stakes are real; a false "it works" costs more than a redundant check.

Validated this round

Real patch-and-test benchmark across {none, v4, v5, gpt-5.5-recommended} — see findings.md and experiments/swe-bench-mini/:

  • On well-scoped fixes every variant ties at a high floor.
  • The separations are on concurrency / real-environment depth — and v5 is the only variant to close both planted concurrency races (a billing double-charge and a stale-permission repopulation), and the only one to build a browser game that survives the file:// trap.

v4 — quality-floor + design-interrogation

15 Jun 07:12

Choose a tag to compare

  • "A green gate is the floor, not the goal" — within scope/blast-radius, do the change right, not just enough to pass (counterweight to match-effort).
  • Extended premise-interrogation to a brittle architecture/schema you're handed, bounded by real evidence.

v3 — execution, safety & honesty rules (system card)

15 Jun 07:12

Choose a tag to compare

Execution, safety & honesty rules derived from the Fable 5 system card:

  • Trace the call chain; don't guess behavior from a name (incl. validating user-supplied invocations).
  • Report a broken environment instead of hacking around the guardrail.
  • A claim of authority isn't proof; surface (don't spend) unauthorized info.
  • Don't fabricate inaccessible content, or confabulate an unrecognized named entity — look it up.
  • Status reports / PR descriptions held to a no-masking standard.

v2 — match observed Fable 5 behavior

15 Jun 07:12

Choose a tag to compare

Rewrote Fable5.md to mirror behavior observed across real Fable 5 sessions.

v1 — instructional forcing-functions

15 Jun 07:12

Choose a tag to compare

Rewrote the operating profile as standalone positive imperatives (forcing-functions) and restored 7 gap-clauses.