FlowGuard

A Python toolkit for checking risky AI-agent workflow changes before they become code or release claims.

Public release	Schema	Runtime	License
`v0.52.3`	`1.0`	Python standard library only	MIT

English comes first. A Chinese mirror follows below.

What FlowGuard Is

FlowGuard is a small Python toolkit and AI-agent workflow for checking the risky part of a software change before an agent writes more code.

It asks the agent to turn the danger zone into a finite state model, run that model, and inspect counterexample traces. That makes problems such as duplicate side effects, stale test evidence, broken UI recovery paths, or unsupported "done" claims visible before they become maintenance debt.

It does not call an LLM API. It is not a prompt trick. It is not a replacement for tests. It is a model-first preflight layer for work where order, state, retries, side effects, UI paths, validation evidence, or release confidence matter.

The Problem

AI coding agents are good at local edits. That is useful, but it creates a common failure mode: the nearby code looks fixed while the whole workflow is already wrong.

For example:

You ask an agent to fix retry handling.
The agent changes the function near the bug.
The visible test passes.
The same job is processed again later.
A side effect happens twice because the workflow never modeled the repeated input.

FlowGuard is built for that kind of problem. Instead of telling the agent to "be careful", it asks the agent to name the state, inputs, outputs, side effects, owners, and evidence gates that decide whether the next step is safe.

How It Works

The core shape is:

Input x State -> Set(Output x State)

In plain language:

Input is the event coming in, such as a job, retry, UI click, file payload, or release action.
State is what the system remembers before the event.
Output is what the step says happened.
The new State is what the system remembers after the step.
Set(...) means one input may have several legal branches, and the model must say what they are.

The practical loop is:

risky AI action
-> small executable model
-> invariants, scenarios, and freshness checks
-> counterexample trace
-> revise the plan, code, tests, UI, or claim

The important output is often the counterexample: a concrete sequence of states that shows why the current plan should not continue unchanged.

What It Helps Catch

Situation	What can go wrong	What FlowGuard makes visible
Retry or repeated job processing	the same input creates a second side effect	a repeated-input trace and an idempotency invariant
Cache or refresh logic	old state is reused after it should be invalid	state fields and freshness rules that need to change
UI workflows	buttons exist, but the user cannot recover, cancel, or reach a terminal state	launch-to-terminal journeys, visible controls, disabled reasons, and recovery paths
Refactors	a new module split loses the real state or side-effect owner	facade boundaries, state owners, side-effect owners, and parity evidence
Tests and releases	an old passing test is treated as proof after code, docs, models, or fixtures changed	evidence freshness and minimum revalidation requirements
Parent and child models	one local green check is treated as whole-system confidence	child evidence, parent reattachment, sibling impact, and scoped confidence
Public claims	a README, release note, or "done" message says more than current evidence supports	the claim boundary and the missing proof

FlowGuard can help design the workflow before code exists, and it can help check whether later evidence still supports a claim. The claim is always bounded: a FlowGuard pass means the declared model obligations passed. It does not mean the entire production system is correct.

Quick Start

Clone the repository and install it in editable mode:

git clone https://github.com/liuyingxuvka/FlowGuard.git
cd FlowGuard
python -m pip install -e .
python -m flowguard schema-version

You should see schema version 1.0.

Run a small example that compares a correct model with broken variants:

python examples/job_matching/run_checks.py

The example should report:

the correct model is OK;
the broken duplicate-record model has invariant violations;
the broken repeated-scoring model has invariant violations;
the report includes counterexample traces showing the repeated input path.

That example is intentionally abstract. It does not search real jobs or call an AI model. It exists to show the FlowGuard pattern: repeated inputs, state writes, invariants, and counterexamples.

Use It In Another Project

For a target project, add FlowGuard adoption records so future agents can find the repository, version, and local rules:

python -m flowguard project-adopt --root <target-project>
python -m flowguard project-audit --root <target-project>
python -m flowguard project-upgrade --root <target-project>

Then start small:

choose one risky boundary
-> name the error class you want to prevent
-> describe Input, State, Output, side effects, owners, and completion evidence
-> add one invariant or scenario
-> add one known-bad case
-> run the check
-> inspect the counterexample
-> revise the plan, code, tests, UI, or claim

Escalate only when the risk needs it. A retry bug may need a small model. A release claim, UI flow, refactor split, or parent/child model chain may need a stronger route.

Minimal Model Sketch

The full runnable version is in examples/job_matching. The idea is small:

@dataclass(frozen=True)
class State:
    processed: tuple[str, ...] = ()
    side_effects: int = 0


@dataclass(frozen=True)
class Input:
    job_id: str


class ProcessJob:
    accepted_input_type = Input
    reads = ("processed", "side_effects")
    writes = ("processed", "side_effects")

    def apply(self, input_obj: Input, state: State):
        if input_obj.job_id in state.processed:
            return [FunctionResult("already_processed", state, label="deduplicated_retry")]
        return [
            FunctionResult(
                "processed",
                replace(
                    state,
                    processed=state.processed + (input_obj.job_id,),
                    side_effects=state.side_effects + 1,
                ),
                label="first_processing",
            )
        ]

The model is useful only when it also includes a bad case and a rule worth checking, such as "the same job may not create duplicate side effects."

When To Use It

Use FlowGuard when the next action depends on workflow state, not just on local code text.

Good fits:

AI-agent coding work with multiple stages, handoffs, or validation gates;
retries, deduplication, cache refresh, queues, ingestion, and repeated jobs;
UI flows where visible controls do not prove recovery paths;
refactors where public entrypoints and side effects must stay compatible;
test or release processes where old evidence can be mistaken for current proof;
parent/child model chains where local evidence must be reattached before broad confidence.

Bad fits:

one-line typo fixes;
formatting-only changes;
tasks with no meaningful state, side effect, order, or evidence boundary;
claims that need statistical truth, business truth, or production telemetry rather than structural workflow checks.

Advanced Agent Workflows

You can skip this section if you are only trying the first example.

FlowGuard has one model-first kernel and several route-specific helper layers. The route names are for AI agents and maintainers who need to choose the smallest owner for the current risk.

Route	Use it when
`model-first-function-flow`	ordinary behavior or state modeling is enough
`flowguard-existing-model-preflight`	an existing modeled system should be checked before adding a new boundary
`flowguard-development-process-flow`	staged work, multi-skill setup, install, archive, publish, release, or done confidence depends on evidence freshness
`flowguard-ui-flow-structure`	UI controls, visible surface, journeys, overlays, recovery, and implementation evidence need modeling
`flowguard-code-structure-recommendation`	a functional model should drive module, facade, owner, side-effect, config, or validation boundaries
`flowguard-structure-mesh`	a large script, package, command, or public API split needs compatibility and parity evidence
`flowguard-test-mesh`	validation is slow, layered, stale, skipped, release-only, or split across child suites
`flowguard-model-test-alignment`	model obligations, code contracts, and test evidence need direct comparison
`flowguard-model-mesh`	parent/child model evidence, sibling impact, or oversized model surfaces need mesh governance
`flowguard-model-topology-hazard-review`	a locally green model may still imply future-use hazards
`flowguard-architecture-reduction`	duplicated handlers, adapters, modules, branches, or validation layers may be contracted without changing behavior
`flowguard-model-miss-review`	runtime, tests, replay, logs, or manual checks failed after a FlowGuard model passed

Useful template commands:

python -m flowguard project-template
python -m flowguard risk-intent-template
python -m flowguard risk-template-library-template
python -m flowguard development-process-flow-template
python -m flowguard ui-flow-structure-template
python -m flowguard code-structure-recommendation-template
python -m flowguard model-test-alignment-template
python -m flowguard test-mesh-template
python -m flowguard structure-mesh-template
python -m flowguard closure-contract-template
python -m flowguard topology-hazard-template
python -m flowguard risk-template-search "completion evidence"

Run python -m flowguard --help for the full CLI list.

Relationship To The Guard Family

Project	Focus
FlowGuard	stateful behavior, process flow, evidence freshness, parent/child model confidence
LogicGuard	claims, evidence, warrants, assumptions, rebuttals, scope, and overclaiming in written reasoning
PhysicsGuard	low-fidelity residual checks and model-building blueprints for physical simulation debugging
FlowPilot	long-running project orchestration and route control for AI-agent software work

Documentation Map

File	Purpose
`docs/concept.md`	short conceptual introduction
`docs/modeling_protocol.md`	core model-first protocol
`docs/api_surface.md`	public Python API overview
`docs/invariant_examples.md`	examples of useful invariants
`docs/development_process_flow.md`	staged development, validation freshness, archive, publish, and release gates
`docs/ui_flow_structure.md`	UI interaction and structure modeling
`docs/code_structure_recommendation.md`	model-derived code structure recommendations
`docs/structure_mesh.md`	refactor and module split governance
`docs/test_evidence_mesh.md`	layered validation and evidence freshness
`docs/model_test_alignment.md`	model obligation and test evidence alignment
`docs/model_mesh_protocol.md`	parent/child model mesh governance
`docs/model_topology_hazard_review.md`	topology-grounded future-use hazard review
`docs/model_similarity_consolidation.md`	model-to-model relation review and consolidation handoffs
`docs/flowguard_closure_contract.md`	closure contract for complete FlowGuard use
`docs/risk_evidence_ledger.md`	risk-to-model-to-code-to-evidence confidence boundary
`docs/runtime_gateway_adoption.md`	runtime gateway adoption levels and critical-state writer inventory

Repository Layout

flowguard/     Core library, review helpers, templates, mesh routes, CLI
examples/      Small executable models and public self-reviews
docs/          Protocols, API notes, examples, and adoption guidance
tests/         Focused regression tests for public helpers
assets/        README hero image and generation notes

Public Boundary

This repository is a public starter and reference implementation. It includes the library, examples, protocol docs, public templates, and AI-agent skill material, including Codex-compatible skills.

It does not include private project logs, credentials, customer data, or a claim that every real system is fully covered. FlowGuard checks the model and evidence you declare. Real software still needs tests, code review, UI review, production-facing validation, and human judgment where those are relevant.

License

MIT. See LICENSE.

中文说明

FlowGuard 是给 AI 编程 agent 用的工作流预检工具。它帮助 agent 在继续写代码、改测试、改 UI 或发布声明之前，先检查最容易出问题的那段流程。

它的核心不是让 agent “小心一点”，而是让 agent 把危险路径写成一个小型可执行状态模型。模型跑起来以后，可以提前暴露重复副作用、过期证据、缺失恢复路径、或者 done / release 声明已经不成立这类问题。

FlowGuard 不调用 LLM API，不是 prompt trick，也不是普通测试的替代品。它更像一个结构化预检层：当顺序、状态、重试、副作用、UI 路径、验证证据或发布信心会影响结果时，先把这些关系说清楚、跑一遍、看反例。

为什么需要它

AI 编程 agent 很擅长局部修改。问题是，局部代码看起来修好了，不代表整个 workflow 真的安全。

一个常见例子：

你让 agent 修 retry 逻辑。
agent 改了 bug 附近的函数。
眼前的测试通过了。
后面同一个 job 又被处理了一次。
因为 workflow 没有建模重复输入，某个副作用又发生了一次。

FlowGuard 就是为这种情况设计的。它要求 agent 在动手前说清楚：输入是什么，系统现在记住了什么，这一步会输出什么，会改哪些状态，会产生什么副作用，谁拥有这个边界，哪些证据才算当前有效。

它怎么工作

核心模型是：

Input x State -> Set(Output x State)

翻成人话：

Input 是进来的事件，比如一个 job、一次 retry、一次 UI 点击、一个文件 payload 或一次 release 动作。
State 是系统在这一步之前记住的东西。
Output 是这一步说自己做了什么。
新的 State 是这一步之后系统记住的东西。
Set(...) 表示同一个输入可能有多个合法分支，不能只写 happy path。

实际工作循环是：

危险 AI 行动
-> 小型可执行模型
-> invariant、scenario 和证据新鲜度检查
-> counterexample trace
-> 修改计划、代码、测试、UI 或声明

最有价值的结果通常是 counterexample：一条具体的状态序列，告诉你为什么当前计划不能原样继续。

它能帮你抓什么问题

场景	可能坏在哪里	FlowGuard 让什么变清楚
retry 或重复 job	同一个输入产生第二次副作用	重复输入 trace 和幂等 invariant
cache 或 refresh	旧状态在应该失效后仍被使用	哪些 state 字段和 freshness 规则需要改变
UI workflow	按钮存在，但用户不能恢复、取消或到达终态	从启动到终态的 journey、可见控件、禁用原因和恢复路径
refactor	新模块拆分后，真实 state owner 或 side-effect owner 丢失	facade 边界、state owner、side-effect owner 和 parity evidence
测试和发布	旧测试通过被误当作当前证明	evidence freshness 和最低 revalidation 要求
父子模型	一个局部 green 被误当作整体可信	child evidence、parent reattachment、sibling impact 和 scoped confidence
公开声明	README、release note 或 done 说得比证据更多	claim boundary 和缺失 proof

FlowGuard 可以在代码还没写之前帮助设计 workflow，也可以在后面检查证据是否还能支持当前声明。但它的结论永远有边界：FlowGuard 通过，只表示你声明的模型义务通过，不表示整个生产系统已经正确。

快速开始

克隆仓库并以 editable 模式安装：

git clone https://github.com/liuyingxuvka/FlowGuard.git
cd FlowGuard
python -m pip install -e .
python -m flowguard schema-version

你应该看到 schema version 1.0。

然后运行一个小例子：

python examples/job_matching/run_checks.py

这个例子会对比一个正确模型和两个坏模型。你应该能看到：

正确模型是 OK；
broken duplicate-record model 有 invariant violation；
broken repeated-scoring model 有 invariant violation；
输出里有 counterexample trace，展示重复输入怎么走到错误状态。

这个例子是抽象的。它不搜索真实岗位，也不调用 AI 模型。它只用来展示 FlowGuard 的基本方式：重复输入、状态写入、invariant 和反例。

接入到另一个项目

如果要让另一个项目支持 FlowGuard adoption，可以先写入项目记录：

python -m flowguard project-adopt --root <target-project>
python -m flowguard project-audit --root <target-project>
python -m flowguard project-upgrade --root <target-project>

然后从一个小风险边界开始：

选择一个危险边界
-> 命名你要防住的错误类型
-> 描述 Input、State、Output、副作用、owner 和完成证据
-> 写一个 invariant 或 scenario
-> 放入一个 known-bad case
-> 运行检查
-> 看 counterexample
-> 修改计划、代码、测试、UI 或声明

只有风险真的需要时才升级到高级路线。一个 retry bug 可能只需要小模型；release claim、UI flow、refactor split 或 parent/child model chain 才可能需要更强的路线。

最小模型长什么样

完整可运行版本在 examples/job_matching。基本思路是：

@dataclass(frozen=True)
class State:
    processed: tuple[str, ...] = ()
    side_effects: int = 0


@dataclass(frozen=True)
class Input:
    job_id: str


class ProcessJob:
    accepted_input_type = Input
    reads = ("processed", "side_effects")
    writes = ("processed", "side_effects")

    def apply(self, input_obj: Input, state: State):
        if input_obj.job_id in state.processed:
            return [FunctionResult("already_processed", state, label="deduplicated_retry")]
        return [
            FunctionResult(
                "processed",
                replace(
                    state,
                    processed=state.processed + (input_obj.job_id,),
                    side_effects=state.side_effects + 1,
                ),
                label="first_processing",
            )
        ]

这个模型只有在同时写了坏例子和检查规则时才有价值。比如规则可以是：“同一个 job 不应该产生重复副作用。”

什么时候用

当下一步是否安全取决于 workflow state，而不只是取决于局部代码文本时，用 FlowGuard。

适合：

有多个阶段、handoff 或 validation gate 的 AI-agent coding work；
retry、deduplication、cache refresh、queue、ingestion 和重复 job；
可见控件不等于合法恢复路径的 UI flow；
公开入口和 side effect 必须保持兼容的 refactor；
旧 evidence 可能被误当作当前 proof 的测试或发布流程；
child green 需要重新接回 parent 才能支持 broad confidence 的父子模型。

不适合：

一行 typo；
纯格式修改；
没有 meaningful state、side effect、顺序或 evidence boundary 的任务；
需要统计事实、业务事实或生产 telemetry，而不是结构化 workflow 检查的声明。

高级 Agent 工作流

如果你只是想跑第一个例子，可以先跳过这一节。

FlowGuard 有一个 model-first kernel 和多条 route-specific helper layer。route 名称主要给 AI agent 和维护者用，用来选择当前风险的最小 owner。

Route	什么时候用
`model-first-function-flow`	普通行为/状态建模就够了
`flowguard-existing-model-preflight`	已有 modeled system 需要先查现有边界，再决定是否新增
`flowguard-development-process-flow`	staged work、multi-skill setup、install、archive、publish、release 或 done confidence 取决于证据新鲜度
`flowguard-ui-flow-structure`	UI 控件、可见表面、journey、overlay、恢复路径和实现证据需要建模
`flowguard-code-structure-recommendation`	function model 要推导 module、facade、owner、side-effect、config 或 validation boundary
`flowguard-structure-mesh`	大脚本、包、命令或 public API 拆分需要兼容性和 parity evidence
`flowguard-test-mesh`	验证很慢、分层、过期、被 skip、release-only，或分散在 child suite
`flowguard-model-test-alignment`	model obligation、code contract 和 test evidence 需要直接对齐
`flowguard-model-mesh`	parent/child model evidence、sibling impact 或 oversized model surface 需要治理
`flowguard-model-topology-hazard-review`	本地 green 模型仍可能有未来复发风险
`flowguard-architecture-reduction`	重复 handler、adapter、module、branch 或 validation layer 可能可以安全收缩
`flowguard-model-miss-review`	runtime、test、replay、log 或人工检查在 FlowGuard 通过后仍然失败

常用模板命令：

python -m flowguard project-template
python -m flowguard risk-intent-template
python -m flowguard risk-template-library-template
python -m flowguard development-process-flow-template
python -m flowguard ui-flow-structure-template
python -m flowguard code-structure-recommendation-template
python -m flowguard model-test-alignment-template
python -m flowguard test-mesh-template
python -m flowguard structure-mesh-template
python -m flowguard closure-contract-template
python -m flowguard topology-hazard-template
python -m flowguard risk-template-search "completion evidence"

完整 CLI 列表可以运行：

python -m flowguard --help

Guard Family 关系

项目	关注点
FlowGuard	stateful behavior、process flow、evidence freshness、parent/child model confidence
LogicGuard	写作推理里的 claim、evidence、warrant、assumption、rebuttal、scope 和 overclaiming
PhysicsGuard	物理仿真调试中的低保真 residual check 和模型构建蓝图
FlowPilot	长周期 AI-agent 软件工作的项目编排和路线控制

文档入口

文件	作用
`docs/concept.md`	简短概念介绍
`docs/modeling_protocol.md`	核心 model-first 协议
`docs/api_surface.md`	公开 Python API 概览
`docs/invariant_examples.md`	常用 invariant 示例
`docs/development_process_flow.md`	staged development、validation freshness、archive、publish 和 release gate
`docs/ui_flow_structure.md`	UI interaction 和结构建模
`docs/code_structure_recommendation.md`	模型推导代码结构建议
`docs/structure_mesh.md`	refactor 和 module split 治理
`docs/test_evidence_mesh.md`	分层验证和证据新鲜度
`docs/model_test_alignment.md`	模型义务和测试证据对齐
`docs/model_mesh_protocol.md`	parent/child model mesh 治理
`docs/model_topology_hazard_review.md`	从模型拓扑推断未来使用风险的审查
`docs/model_similarity_consolidation.md`	model-to-model 关系审查和 consolidation handoff
`docs/flowguard_closure_contract.md`	完整 FlowGuard 使用的 closure contract
`docs/risk_evidence_ledger.md`	risk-to-model-to-code-to-evidence 信心边界
`docs/runtime_gateway_adoption.md`	runtime gateway adoption level 和 critical-state writer inventory

仓库结构

flowguard/     核心库、review helpers、templates、mesh routes、CLI
examples/      小型可执行模型和公开 self-review
docs/          协议、API 说明、示例和 adoption guidance
tests/         针对公开 helper 的回归测试
assets/        README hero image 和生成说明

公开边界

这个仓库适合作为公开 starter 和 reference implementation。它包含库代码、示例、协议文档、公开模板和通用 AI-agent skill material，其中也包括 Codex-compatible skills。

它不包含私有项目日志、credential、客户数据，也不声称模型覆盖了所有真实系统。FlowGuard 检查的是你声明的模型和证据。真实软件仍然需要测试、code review、UI review、production-facing validation，以及必要的人类判断。

许可证

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
.agents/skills		.agents/skills
.flowguard		.flowguard
.github/workflows		.github/workflows
assets/readme-hero		assets/readme-hero
docs		docs
examples		examples
flowguard		flowguard
openspec		openspec
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

FlowGuard

What FlowGuard Is

The Problem

How It Works

What It Helps Catch

Quick Start

Use It In Another Project

Minimal Model Sketch

When To Use It

Advanced Agent Workflows

Relationship To The Guard Family

Documentation Map

Repository Layout

Public Boundary

License

中文说明

为什么需要它

它怎么工作

它能帮你抓什么问题

快速开始

接入到另一个项目

最小模型长什么样

什么时候用

高级 Agent 工作流

Guard Family 关系

文档入口

仓库结构

公开边界

许可证

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 86

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages