Status: Experimental / pre-1.0. APIs and folder layout may change.
Ranger is an execution engine for long-lived, stateful workflows that use LLMs, tools, and human input. You write plain Python functions that declare what data they need and what data they produce. Ranger figures out the order to run them and records each step in a local database so you can inspect, replay, and compare runs later.
Instead of hand-rolled while True loops and ad-hoc logging, you get a simple dataflow:
state goes in → steps fire when their inputs are ready → new state comes out. Ranger keeps the full trace so you can see how a result was produced and how a change in model, prompt, or code affects behavior.
-
Replayable runs
Every run is stored in.ranger/<domain>.dbwith state snapshots and step metadata. You can re-run a scenario, step through it, or diff runs across versions. -
Automatic scheduling
Steps declare the keys they read and write. Ranger only runs a step when its inputs are present and at least one output needs an update, and it batches non-conflicting steps for a bit of parallelism. -
Less orchestration code
Most control flow falls out of the data dependencies. You focus on small, testable functions; Ranger takes care of “what should run next?” -
Guardrail-friendly
Because reads and writes are explicit, it’s straightforward to layer policies, checks, or risk scoring around the engine without burying them in glue code.
If you like treating workflows as dataflow—“what do we know now, what can we safely compute next?”—Ranger is built with that style in mind.
At the core there are just three ideas:
-
State
A key–value map with optional per-key validation, for example:{ "repo.ast": {...}, "tests.gen": {...}, "run.result": {...}, } -
Capabilities (Steps / Tools / LLMs / Humans)
Each capability declares:inputs=[...]– the keys it must see in State before runningoutputs=[...]– the keys it promises to write back into State
It returns a small dict of
{key: value}updates; Ranger merges that into the snapshot atomically. -
Planner + Goal
- A capability becomes ready when:
- all its
inputsexist in State, and - at least one of its
outputsis missing, or any input changed since it last ran.
- all its
- The engine takes a snapshot, finds all ready units, picks a compatible batch (no two write the same key), runs them, commits, and repeats.
- After each batch, a Goal predicate checks whether you’re done (or explains why you’re blocked).
- A capability becomes ready when:
Execution loop at a glance:
flowchart TD
S["Snapshot(State)"] --> R["Find ready capabilities"]
R -->|none| B["Blocked → explain missing inputs / unmet goal"]
R -->|some| C["Choose batch with disjoint outputs"]
C --> E["Execute batch"]
E --> M["Merge results into State"]
M --> G{"Goal satisfied?"}
G -- "No" --> S
G -- "Yes" --> D["Done"]
- Snapshot(State) freezes the current evidence so readiness decisions are deterministic and replayable.
- Find ready capabilities surfaces every transformation whose declared inputs exist and at least one promised output still needs to be written.
- Choose batch with disjoint outputs enforces conflict-free parallelism; the engine rejects any combination that would contend for the same key.
- Execute batch → Merge results runs the selected capabilities, merges their updates atomically, and records provenance alongside timings.
- Goal satisfied? evaluates a declarative predicate that either certifies completion or explains which facts are still missing.
No loops or if/else chains live in user code. Control flow emerges from how State evolves.
Everything you ship inside Ranger is a state transformation over State. Decorators describe what each capability reads and writes so the planner and runners can schedule work deterministically.
@step– Pure function that reads immutable inputs, emits new facts, and can be retried freely. Validation incore.validatemakes sure you only reference declared keys.@tool– Side-effecting capability (HTTP calls, shell work, filesystem, etc.) executed through the Python runner. Declares exactly which State it mutates so provenance stays intact.@llm– Structured LLM call configured throughcore.llm.provider. Prompts, schemas, and retry policies are part of the decorator, keeping generations reproducible.@human– Human-in-the-loop checkpoint executed bycore.runners.human_runner, often used for approvals.@goal– Declarative predicate that certifies completion or explains why you are still blocked.Agent– Thin runner (core.engine.Agent) that keeps applying ready capabilities inside your budget until the goal passes.
All decorators operate on the same immutable snapshot.
- Contracts –
core.capabilityandcore.plantrack every key that a capability reads or writes, enabling deterministic scheduling and guardrail enforcement. - Provenance –
core.provenancerecords coverage, timings, and evidence for every capability so you can replay or audit any run later. - Execution internals – Modules under
topology/currently provide planning, context packing, and guard helpers used by the engine and runtimes.
Ranger intentionally keeps its core structures lightweight today. That makes it flexible, but it also means some README language can be interpreted as stronger guarantees than the code currently enforces:
- State entries store Python
Anyvalues (core/workspace.py) unless you add explicit write specs or validators. - Capabilities are declared via read/write key sets and a runner (
core/capability.py), not a full static type system. - Validation is practical but thin (
core/validate.py): predicate checks, optional JSON Schema, and required-key checks. - Context packing in
topology/packer.pyuses heuristic scoring and an approximate token estimate by default (now pluggable via a custom estimator).
If you need strict schema enforcement end-to-end, treat Ranger as an orchestration substrate and add stronger contracts in your own capability layer (or via WriteSpec + schema validation).
Here’s a tiny example with no APIs or models. It computes c = (a + 1) * 2 and stops when c == 4.
from core.sdk import step, goal, Agent
@step(inputs=["a"], outputs=["b"])
def inc(state):
return {"b": state["a"] + 1}
@step(inputs=["b"], outputs=["c"])
def double(state):
return {"c": state["b"] * 2}
@goal(scope={"c"})
def done(state):
return state.get("c") == 4
if __name__ == "__main__":
agent = Agent([inc, double])
result = agent.run(initial={"a": 1}, goal=done, max_steps=10)
assert result.ok, result
print("Final c:", result.final.value("c"))There is no orchestration in this script. The engine:
- Sees
"a"→ runsinc→ writes"b". - Sees
"b"→ runsdouble→ writes"c". - Goal sees
"c == 4"→ run finishes.
That entire trace is stored in .ranger/demo.db: the snapshot deltas, timings, and goal evaluations. You can replay it with ranger scenario, diff it against future runs, or visualize it with ranger visualize to confirm the scheduling behaves as intended.
Because control flow is driven by State, ReAct patterns fall out naturally:
from core.sdk import step, tool, llm, goal, Agent
# REASON: decide whether to search or answer
@llm(
inputs=["question", "obs"],
outputs=["thought"],
system="Return JSON {thought:str} with either 'search' or 'answer'.",
template='{"thought":"{{ "search" if (obs|length) < 2 else "answer" }}"}',
schema={
"type": "object",
"properties": {"thought": {"type": "string"}},
"required": ["thought"],
},
provider=my_llm_provider,
)
def think(state):
...
# PLAN: craft a query when needed
@step(inputs=["question", "thought"], outputs=["query"])
def plan(state):
if state["thought"] == "search":
return {"query": f"{state['question']} key facts"}
return {"query": ""}
# ACT: perform the search (side effect → Tool)
@tool(inputs=["query", "obs"], outputs=["obs"])
def search(state):
q = state["query"]
obs = state.get("obs", [])
if not q:
return {"obs": obs}
return {"obs": obs + [{"source": "stub", "snippet": f"Result:{q}"}]}
# WRITE: produce an answer draft
@llm(
inputs=["question", "obs"],
outputs=["answer.draft"],
system="Return JSON {text:str, citations:list}.",
template='{"text":"Answer to {{question}}.","citations":[]}',
schema={
"type": "object",
"properties": {"text": {"type": "string"}, "citations": {"type": "array"}},
"required": ["text", "citations"],
},
provider=my_llm_provider,
)
def write(state):
...
@goal(scope={"answer.draft"})
def answered(state):
return "answer.draft" in state
if __name__ == "__main__":
Agent([think, plan, search, write]).run(
initial={"question": "What is ReAct?", "obs": []},
goal=answered,
max_steps=20,
)The engine automatically:
- loops between Reason → Plan → Search while
thought == "search", - then shifts to Reason → Write once there are enough observations.
You never write that loop explicitly.
Behind the scenes, readiness is recomputed whenever new evidence lands in "obs", and batch selection keeps writes disjoint so think, plan, search, and write do not race over the same keys. The scenario database retains every prompt, response, and decision, so you can audit why a particular answer was produced.
For now, Ranger is meant to be used from a clone of this repository.
git clone https://github.com/skishore23/ranger.git
cd ranger
pip install -e . # editable install for development- Add
rangerto your project’s virtualenv (via thepip install -e .above). - Import from
core.sdk, define a few capabilities, and callAgent.run(...)as in the examples. - Inspect
result.finaland, if needed, the underlying scenario database for that run.
The ranger CLI helps you scaffold and inspect agents that follow Ranger’s conventions:
# Scaffold a new agent package + smoke tests
ranger init demo-agent
# Inspect memory atoms in a run database
ranger trace demo.db --domain demo --limit 20
# Replay a run as a "scenario" and dump JSON coverage + goals
ranger scenario demo.db --domain demo --json
# Visualize capability graphs (requires Graphviz + optional extras)
ranger visualize agents.testwriter.agent:TestWriterAgent --repo . --format svg
ranger visualize agents.deep_research.agent:DeepResearchAgent --repo . --format svgThe scaffold mirrors the bundled agents and wires up memory + LLM regions via boot.py. Run ranger --help for a full list of commands.
For larger projects you codify a plan once and let the runtime enforce it with evidence checks:
- Build plain capabilities with
@step,@tool,@llm,@human. - Compose them into a Plan using
core.plan.planandcore.plan.action. - Wrap the compiled plan in an
AgentRuntimesubclass that:- configures memory domains and filenames,
- registers LLM profiles,
- applies guard regions,
- exposes a simple
.run(...)facade for callers.
agents/common/runtime.AgentRuntime wires those pieces into budgets and visualization hooks so you can focus on domain behavior rather than plumbing.
Example sketch:
from agents.common import AgentRuntime
from boot import get_default_budget, setup_openai_llm
from core.llm.provider import register_llm_profile
from core.plan import plan, action
from . import capabilities # your @step/@tool/@llm functions
class MyAgent(AgentRuntime):
def __init__(self, **runtime_options):
super().__init__(
budget=get_default_budget(),
memory_key="myagent.memory",
memory_domain="myagent",
db_filename="myagent.db",
**runtime_options,
)
def build_plan(self):
register_llm_profile(
"myagent.generate",
region_key="myagent.llm",
defaults={"model": "gpt-4o-mini", "temperature": 0.0},
region_factory=lambda: setup_openai_llm(
key="myagent.llm",
model="gpt-4o-mini",
temperature=0.0,
),
)
stages = [
capabilities.step1,
capabilities.step2,
# ...
]
return plan(*[action(cap) for cap in stages])
def run(self, *, max_steps: int = 120):
return self.run_agent(
initial={"myagent.config": {...}},
goal=capabilities.my_goal,
max_steps=max_steps,
)AgentRuntime takes care of registry resets, scenario harness, and visualization so your façade stays small.
Attach regions (see regions/) for external memory or LLM providers, and register capabilities so the planner can schedule them automatically.
Every run is backed by a scenario database under .ranger/. For a given domain:
- State snapshots (before/after each batch)
- Capability executions (inputs, outputs, timings)
- Goal evaluations and “why not yet done” explanations
all land in one file, e.g. .ranger/testwriter.db.
You can then:
-
Inspect raw atoms:
ranger trace testwriter.db --domain testwriter --limit 50
-
Replay and summarize:
ranger scenario testwriter.db --domain testwriter --json
-
Render a capability graph:
ranger visualize agents.testwriter.agent:TestWriterAgent --repo . --format svg
This makes Ranger feel closer to a build system or debugger than to a black-box chatbot.
ranger/scenario.py drives those commands by replaying entries from .ranger/<domain>.db and rehydrating the provenance captured by core.provenance. Because every snapshot and capability write is recorded, audits become a matter of querying evidence rather than reproducing a flaky chat transcript.
This repo is organized into a few layers:
core/– the engine: State snapshots, planner, runners, merge logic, provenance tracking, validation, and visualization support.ranger/– CLI entry points plus tooling for tracing, visualization, and scaffolding.agents/– sample agents (test-writer, deep research) that show how to pair plans with runtime facades.regions/– memory and provider bindings (SQLite memory, OpenAI LLM, etc.) that you can register inside runtimes.topology/– planning and context-packing utilities used by the runtime (still experimental).docs/– API and agent guides.tests/– unit and integration coverage for engine, runners, and bundled agents.
Depending only on core/ + ranger/ gives you the execution engine and CLI; agents/ and regions/ stay optional reference implementations.
Ranger is a good fit if:
- You are building non-trivial LLM systems (test writers, research agents, safety pipelines, etc.) that:
- touch many sources of truth,
- have multiple asynchronous or side-effecting steps,
- need replayable, explainable behavior.
- You want to treat orchestration as a data problem (“what facts do we know now, what can we compute next?”) instead of manually coding loops.
- You care about guardrails and risk controls that can inspect and shape State over time.
It might be overkill if:
- You just need a single LLM call plus one or two tools.
- You don’t care about replay, provenance, or long-term maintainability of flows.