OpexAgents

Cognition optimization layer for LLM agents — reduce reasoning cost, improve determinism, and accelerate execution by reusing thought patterns, predicting context, and structuring knowledge.

Architecture

User Query
    │
    ▼
┌─────────────────────┐
│   Intent Detector   │  → classifies type (rag/analysis/agent/creative)
│                     │    + depth (low/medium/high)
└────────┬────────────┘
         │
    ▼
┌─────────────────────┐
│ Thought Reuse Engine│  → embeds query → ANN search → injects reasoning steps
│       (TRE)         │    cosine similarity ≥ 0.78 → template matched
└────────┬────────────┘
         │
    ▼
┌──────────────────────────┐
│  Token Budget Optimizer  │  → allocates tokens per component
│         (IATB)           │    intent + depth → context/reasoning/response split
└────────┬─────────────────┘
         │
    ▼
┌──────────────────────────┐
│  Hierarchical Context    │  → BFS traversal with importance pruning
│  Graph (HCG)             │    concept → entity → fact → raw
└────────┬─────────────────┘
         │
    ▼
┌──────────────────────────┐
│ Predictive Context Loader│  → Markov-chain next-query prediction
│        (PCL)             │    → rule-based fallback
└────────┬─────────────────┘
         │
    ▼
┌──────────────────────────┐
│     LLM Execution        │  → model-agnostic (OpenAI-compatible endpoint)
│   (any provider)         │    system = reasoning steps + context graph
└────────┬─────────────────┘
         │
    ▼
┌──────────────────────────┐
│   Validation Layer       │  → post-process, word count check, warnings
└────────┬─────────────────┘
         │
    ▼
ProcessResult { response, tokens_saved, used_template, prediction_next, … }

Core Principles

Optimise thinking, not tokens — structured reasoning reuse beats raw compression
Reuse reasoning patterns — ThoughtTemplates extracted from successful executions
Predict instead of react — Markov-chain pre-loading of likely follow-ups
Preserve intent, not text — HCG retains semantic structure, not raw verbatim
Model-agnostic — any OpenAI-compatible provider works out of the box

Quickstart

1. Install

cd agentdyne9/cee
python -m venv .venv
source .venv/bin/activate          # Windows: .venv\Scripts\activate
pip install -e ".[dev]"

2. Configure

cp .env.example .env
# Edit .env — set your OPENAI_API_KEY and any overrides

Key environment variables:

Variable	Default	Description
`OPENAI_API_KEY`	(empty)	LLM provider key. Empty = stub mode (offline)
`OPENAI_BASE_URL`	`https://api.openai.com/v1`	Any OpenAI-compatible endpoint
`CEE_LLM_MODEL`	`gpt-4o-mini`	Model identifier
`CEE_EMBEDDING_MODEL`	`text-embedding-3-small`	Embedding model
`REDIS_URL`	(empty)	Redis URL; empty = in-memory fallback
`CEE_CHROMA_PATH`	`.cee_data/chroma`	ChromaDB persistence path
`CEE_MAX_TOKENS`	`4096`	Total token budget per request
`CEE_THOUGHT_MATCH_THRESHOLD`	`0.78`	Cosine similarity threshold for template reuse

3. Seed starter templates

cee train-templates --inline

4. Start the server

cee serve
# or: make run

Server starts at http://localhost:8000 Interactive docs at http://localhost:8000/docs

CLI Reference

cee --help

cee serve                          # Start API server
cee serve --port 9000 --reload     # Dev mode with auto-reload

cee run --query "Explain BFS"      # Query via running server
cee run --query "..." --inline     # Query without a server (direct pipeline)
cee run --query "..." \
    --context '[{"type":"fact","content":"...","importance":0.9}]' \
    --agent-state '{"tools":["web_search"]}'

cee train-templates --inline       # Seed 10 starter templates
cee benchmark --inline             # Run 10-query benchmark
cee health                         # Check running server health

API Reference

`POST /v1/process`

Run the full CEE pipeline.

Request:

{
  "query": "Compare Python and Go for building microservices.",
  "context": [
    {
      "type": "fact",
      "content": "Our team has 5 years of Python experience.",
      "importance": 0.9
    }
  ],
  "agent_state": {}
}

Response:

{
  "response": "Python and Go differ in several key dimensions...",
  "intent": "analysis",
  "used_template": {
    "id": "tmpl-a1b2c3d4",
    "pattern": "compare two technologies, frameworks, or approaches",
    "success_rate": 0.84,
    "use_count": 47
  },
  "context_used": ["node-abc", "node-def"],
  "tokens_saved": 2048,
  "prediction_next": [
    "What are the trade-offs?",
    "Which should I choose for high throughput?"
  ],
  "warnings": []
}

`POST /v1/templates`

Create a new ThoughtTemplate.

{
  "pattern": "explain a sorting algorithm",
  "steps": [
    "Define the algorithm in one sentence.",
    "Walk through the steps with an example.",
    "State time and space complexity.",
    "Mention best and worst cases."
  ],
  "tags": ["educational", "algorithms"]
}

`GET /v1/health`

{
  "status": "ok",
  "version": "0.1.0",
  "template_count": 10,
  "llm_model": "gpt-4o-mini"
}

Testing

make test              # Full suite (offline/stub — no API key needed)
make test-cov          # With HTML coverage report
make test-fast         # Skip slow tests

Tests run fully offline using the stub LLM client and an in-memory ChromaDB instance.

Docker

# Start CEE + Redis
make docker-up

# View logs
make docker-logs

# Stop everything
make docker-down

Services:

Service	Port	Description
`cee`	`8000`	CEE API server
`redis`	`6379`	Cache backend

Developer Workflow

make dev-install       # Install all dependencies + pre-commit hooks
make fmt               # Auto-format with ruff
make lint              # Lint with ruff
make type-check        # mypy strict type checking
make test              # Run tests
make seed              # Seed starter templates
make benchmark         # Run benchmark
make run               # Start dev server with reload
make clean             # Remove all build / cache artifacts

Project Structure

cee/
├── cee/
│   ├── __init__.py
│   ├── config.py               # Pydantic Settings
│   ├── main.py                 # FastAPI app factory + lifespan
│   ├── pipeline.py             # CEEPipeline orchestrator
│   ├── cli.py                  # Typer CLI (serve/run/benchmark/train)
│   ├── models/
│   │   ├── intent.py           # IntentType, DepthLevel, Intent
│   │   ├── thought_template.py # ThoughtTemplate + EMA tracking
│   │   ├── context_node.py     # ContextNode + weighted importance
│   │   └── prediction.py       # Prediction model
│   ├── engines/
│   │   ├── intent_detector.py          # Rule + LLM fallback classifier
│   │   ├── thought_reuse_engine.py     # ANN template matching
│   │   ├── hierarchical_context_graph.py # BFS + importance pruning
│   │   ├── predictive_context_loader.py  # Markov-chain predictor
│   │   ├── token_budget_optimizer.py     # Intent-aware budget allocation
│   │   └── validation_layer.py           # Post-processing validation
│   ├── storage/
│   │   ├── vector_store.py     # ChromaDB adapter
│   │   └── cache.py            # Redis + in-memory fallback
│   ├── utils/
│   │   ├── embeddings.py       # OpenAI embeddings + stub fallback
│   │   ├── llm_client.py       # Model-agnostic completion client
│   │   └── logging.py          # structlog JSON configuration
│   └── api/
│       ├── schemas.py          # Pydantic request/response models
│       └── router.py           # FastAPI router (/process, /templates, /health)
├── tests/
│   ├── conftest.py             # Fixtures (pipeline, api_client)
│   ├── test_models.py          # Unit: Intent, ThoughtTemplate, ContextNode, Prediction
│   ├── test_engines.py         # Unit: IntentDetector, HCG, TokenBudget, Validation
│   ├── test_pipeline.py        # Integration: end-to-end pipeline
│   └── test_api.py             # Integration: HTTP endpoints via ASGI
├── scripts/
│   └── seed_templates.py       # 10 starter ThoughtTemplates
├── Dockerfile
├── docker-compose.yml
├── Makefile
├── pyproject.toml
├── requirements.txt
├── .env.example
└── .gitignore

Data Flow: Token Budget Example

For a HIGH depth ANALYSIS query with max_tokens=4096:

Total budget:    4096 tokens
Depth fraction:  1.00  (HIGH)
Effective:       4096 tokens

Split (analysis weights: 40% ctx / 20% rsn / 40% rsp):
  Context graph:       1638 tokens  ← HCG compression target
  Reasoning steps:      819 tokens  ← ThoughtTemplate injection
  LLM response:        1639 tokens  ← max_tokens passed to LLM

For a LOW depth RAG query:

Total budget:    4096 tokens
Depth fraction:  0.40  (LOW)
Effective:       1638 tokens
Tokens saved:    2458 tokens (60%)

Extending CEE

Add a new intent type

Add a value to IntentType in cee/models/intent.py
Add regex patterns in cee/engines/intent_detector.py
Add a weight tuple in cee/engines/token_budget_optimizer.py

Swap the LLM provider

Set OPENAI_BASE_URL to any OpenAI-compatible endpoint:

Mistral: https://api.mistral.ai/v1
Ollama (local): http://localhost:11434/v1
Anthropic via proxy: use any compatible wrapper

Add a custom reasoning template

curl -X POST http://localhost:8000/v1/templates \
  -H "Content-Type: application/json" \
  -d '{
    "pattern": "my custom query pattern",
    "steps": ["Step 1", "Step 2", "Step 3"],
    "tags": ["custom"]
  }'

Roadmap

Reinforcement learning for template quality optimisation
Multi-agent coordination layer
Self-evolving reasoning templates
CCL (Content Correctness Layer) integration
OpenTelemetry tracing spans per pipeline stage
gRPC transport option alongside REST

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpexAgents

Architecture

Core Principles

Quickstart

1. Install

2. Configure

3. Seed starter templates

4. Start the server

CLI Reference

API Reference

`POST /v1/process`

`POST /v1/templates`

`GET /v1/health`

Testing

Docker

Developer Workflow

Project Structure

Data Flow: Token Budget Example

Extending CEE

Add a new intent type

Swap the LLM provider

Add a custom reasoning template

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
CEE		CEE
cee		cee
scripts		scripts
tests		tests
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

OpexAgents

Architecture

Core Principles

Quickstart

1. Install

2. Configure

3. Seed starter templates

4. Start the server

CLI Reference

API Reference

POST /v1/process

POST /v1/templates

GET /v1/health

Testing

Docker

Developer Workflow

Project Structure

Data Flow: Token Budget Example

Extending CEE

Add a new intent type

Swap the LLM provider

Add a custom reasoning template

Roadmap

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /v1/process`

`POST /v1/templates`

`GET /v1/health`

Packages