Cognition optimization layer for LLM agents — reduce reasoning cost, improve determinism, and accelerate execution by reusing thought patterns, predicting context, and structuring knowledge.
User Query
│
▼
┌─────────────────────┐
│ Intent Detector │ → classifies type (rag/analysis/agent/creative)
│ │ + depth (low/medium/high)
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Thought Reuse Engine│ → embeds query → ANN search → injects reasoning steps
│ (TRE) │ cosine similarity ≥ 0.78 → template matched
└────────┬────────────┘
│
▼
┌──────────────────────────┐
│ Token Budget Optimizer │ → allocates tokens per component
│ (IATB) │ intent + depth → context/reasoning/response split
└────────┬─────────────────┘
│
▼
┌──────────────────────────┐
│ Hierarchical Context │ → BFS traversal with importance pruning
│ Graph (HCG) │ concept → entity → fact → raw
└────────┬─────────────────┘
│
▼
┌──────────────────────────┐
│ Predictive Context Loader│ → Markov-chain next-query prediction
│ (PCL) │ → rule-based fallback
└────────┬─────────────────┘
│
▼
┌──────────────────────────┐
│ LLM Execution │ → model-agnostic (OpenAI-compatible endpoint)
│ (any provider) │ system = reasoning steps + context graph
└────────┬─────────────────┘
│
▼
┌──────────────────────────┐
│ Validation Layer │ → post-process, word count check, warnings
└────────┬─────────────────┘
│
▼
ProcessResult { response, tokens_saved, used_template, prediction_next, … }
- Optimise thinking, not tokens — structured reasoning reuse beats raw compression
- Reuse reasoning patterns — ThoughtTemplates extracted from successful executions
- Predict instead of react — Markov-chain pre-loading of likely follow-ups
- Preserve intent, not text — HCG retains semantic structure, not raw verbatim
- Model-agnostic — any OpenAI-compatible provider works out of the box
cd agentdyne9/cee
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e ".[dev]"cp .env.example .env
# Edit .env — set your OPENAI_API_KEY and any overridesKey environment variables:
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
(empty) | LLM provider key. Empty = stub mode (offline) |
OPENAI_BASE_URL |
https://api.openai.com/v1 |
Any OpenAI-compatible endpoint |
CEE_LLM_MODEL |
gpt-4o-mini |
Model identifier |
CEE_EMBEDDING_MODEL |
text-embedding-3-small |
Embedding model |
REDIS_URL |
(empty) | Redis URL; empty = in-memory fallback |
CEE_CHROMA_PATH |
.cee_data/chroma |
ChromaDB persistence path |
CEE_MAX_TOKENS |
4096 |
Total token budget per request |
CEE_THOUGHT_MATCH_THRESHOLD |
0.78 |
Cosine similarity threshold for template reuse |
cee train-templates --inlinecee serve
# or: make runServer starts at http://localhost:8000 Interactive docs at http://localhost:8000/docs
cee --help
cee serve # Start API server
cee serve --port 9000 --reload # Dev mode with auto-reload
cee run --query "Explain BFS" # Query via running server
cee run --query "..." --inline # Query without a server (direct pipeline)
cee run --query "..." \
--context '[{"type":"fact","content":"...","importance":0.9}]' \
--agent-state '{"tools":["web_search"]}'
cee train-templates --inline # Seed 10 starter templates
cee benchmark --inline # Run 10-query benchmark
cee health # Check running server healthRun the full CEE pipeline.
Request:
{
"query": "Compare Python and Go for building microservices.",
"context": [
{
"type": "fact",
"content": "Our team has 5 years of Python experience.",
"importance": 0.9
}
],
"agent_state": {}
}Response:
{
"response": "Python and Go differ in several key dimensions...",
"intent": "analysis",
"used_template": {
"id": "tmpl-a1b2c3d4",
"pattern": "compare two technologies, frameworks, or approaches",
"success_rate": 0.84,
"use_count": 47
},
"context_used": ["node-abc", "node-def"],
"tokens_saved": 2048,
"prediction_next": [
"What are the trade-offs?",
"Which should I choose for high throughput?"
],
"warnings": []
}Create a new ThoughtTemplate.
{
"pattern": "explain a sorting algorithm",
"steps": [
"Define the algorithm in one sentence.",
"Walk through the steps with an example.",
"State time and space complexity.",
"Mention best and worst cases."
],
"tags": ["educational", "algorithms"]
}{
"status": "ok",
"version": "0.1.0",
"template_count": 10,
"llm_model": "gpt-4o-mini"
}make test # Full suite (offline/stub — no API key needed)
make test-cov # With HTML coverage report
make test-fast # Skip slow testsTests run fully offline using the stub LLM client and an in-memory ChromaDB instance.
# Start CEE + Redis
make docker-up
# View logs
make docker-logs
# Stop everything
make docker-downServices:
| Service | Port | Description |
|---|---|---|
cee |
8000 |
CEE API server |
redis |
6379 |
Cache backend |
make dev-install # Install all dependencies + pre-commit hooks
make fmt # Auto-format with ruff
make lint # Lint with ruff
make type-check # mypy strict type checking
make test # Run tests
make seed # Seed starter templates
make benchmark # Run benchmark
make run # Start dev server with reload
make clean # Remove all build / cache artifactscee/
├── cee/
│ ├── __init__.py
│ ├── config.py # Pydantic Settings
│ ├── main.py # FastAPI app factory + lifespan
│ ├── pipeline.py # CEEPipeline orchestrator
│ ├── cli.py # Typer CLI (serve/run/benchmark/train)
│ ├── models/
│ │ ├── intent.py # IntentType, DepthLevel, Intent
│ │ ├── thought_template.py # ThoughtTemplate + EMA tracking
│ │ ├── context_node.py # ContextNode + weighted importance
│ │ └── prediction.py # Prediction model
│ ├── engines/
│ │ ├── intent_detector.py # Rule + LLM fallback classifier
│ │ ├── thought_reuse_engine.py # ANN template matching
│ │ ├── hierarchical_context_graph.py # BFS + importance pruning
│ │ ├── predictive_context_loader.py # Markov-chain predictor
│ │ ├── token_budget_optimizer.py # Intent-aware budget allocation
│ │ └── validation_layer.py # Post-processing validation
│ ├── storage/
│ │ ├── vector_store.py # ChromaDB adapter
│ │ └── cache.py # Redis + in-memory fallback
│ ├── utils/
│ │ ├── embeddings.py # OpenAI embeddings + stub fallback
│ │ ├── llm_client.py # Model-agnostic completion client
│ │ └── logging.py # structlog JSON configuration
│ └── api/
│ ├── schemas.py # Pydantic request/response models
│ └── router.py # FastAPI router (/process, /templates, /health)
├── tests/
│ ├── conftest.py # Fixtures (pipeline, api_client)
│ ├── test_models.py # Unit: Intent, ThoughtTemplate, ContextNode, Prediction
│ ├── test_engines.py # Unit: IntentDetector, HCG, TokenBudget, Validation
│ ├── test_pipeline.py # Integration: end-to-end pipeline
│ └── test_api.py # Integration: HTTP endpoints via ASGI
├── scripts/
│ └── seed_templates.py # 10 starter ThoughtTemplates
├── Dockerfile
├── docker-compose.yml
├── Makefile
├── pyproject.toml
├── requirements.txt
├── .env.example
└── .gitignore
For a HIGH depth ANALYSIS query with max_tokens=4096:
Total budget: 4096 tokens
Depth fraction: 1.00 (HIGH)
Effective: 4096 tokens
Split (analysis weights: 40% ctx / 20% rsn / 40% rsp):
Context graph: 1638 tokens ← HCG compression target
Reasoning steps: 819 tokens ← ThoughtTemplate injection
LLM response: 1639 tokens ← max_tokens passed to LLM
For a LOW depth RAG query:
Total budget: 4096 tokens
Depth fraction: 0.40 (LOW)
Effective: 1638 tokens
Tokens saved: 2458 tokens (60%)
- Add a value to
IntentTypeincee/models/intent.py - Add regex patterns in
cee/engines/intent_detector.py - Add a weight tuple in
cee/engines/token_budget_optimizer.py
Set OPENAI_BASE_URL to any OpenAI-compatible endpoint:
- Mistral:
https://api.mistral.ai/v1 - Ollama (local):
http://localhost:11434/v1 - Anthropic via proxy: use any compatible wrapper
curl -X POST http://localhost:8000/v1/templates \
-H "Content-Type: application/json" \
-d '{
"pattern": "my custom query pattern",
"steps": ["Step 1", "Step 2", "Step 3"],
"tags": ["custom"]
}'- Reinforcement learning for template quality optimisation
- Multi-agent coordination layer
- Self-evolving reasoning templates
- CCL (Content Correctness Layer) integration
- OpenTelemetry tracing spans per pipeline stage
- gRPC transport option alongside REST
Apache-2.0 © AgentDyne