Skip to content

ai-deeptech/opexagents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpexAgents

Cognition optimization layer for LLM agents — reduce reasoning cost, improve determinism, and accelerate execution by reusing thought patterns, predicting context, and structuring knowledge.


Architecture

User Query
    │
    ▼
┌─────────────────────┐
│   Intent Detector   │  → classifies type (rag/analysis/agent/creative)
│                     │    + depth (low/medium/high)
└────────┬────────────┘
         │
    ▼
┌─────────────────────┐
│ Thought Reuse Engine│  → embeds query → ANN search → injects reasoning steps
│       (TRE)         │    cosine similarity ≥ 0.78 → template matched
└────────┬────────────┘
         │
    ▼
┌──────────────────────────┐
│  Token Budget Optimizer  │  → allocates tokens per component
│         (IATB)           │    intent + depth → context/reasoning/response split
└────────┬─────────────────┘
         │
    ▼
┌──────────────────────────┐
│  Hierarchical Context    │  → BFS traversal with importance pruning
│  Graph (HCG)             │    concept → entity → fact → raw
└────────┬─────────────────┘
         │
    ▼
┌──────────────────────────┐
│ Predictive Context Loader│  → Markov-chain next-query prediction
│        (PCL)             │    → rule-based fallback
└────────┬─────────────────┘
         │
    ▼
┌──────────────────────────┐
│     LLM Execution        │  → model-agnostic (OpenAI-compatible endpoint)
│   (any provider)         │    system = reasoning steps + context graph
└────────┬─────────────────┘
         │
    ▼
┌──────────────────────────┐
│   Validation Layer       │  → post-process, word count check, warnings
└────────┬─────────────────┘
         │
    ▼
ProcessResult { response, tokens_saved, used_template, prediction_next, … }

Core Principles

  1. Optimise thinking, not tokens — structured reasoning reuse beats raw compression
  2. Reuse reasoning patterns — ThoughtTemplates extracted from successful executions
  3. Predict instead of react — Markov-chain pre-loading of likely follow-ups
  4. Preserve intent, not text — HCG retains semantic structure, not raw verbatim
  5. Model-agnostic — any OpenAI-compatible provider works out of the box

Quickstart

1. Install

cd agentdyne9/cee
python -m venv .venv
source .venv/bin/activate          # Windows: .venv\Scripts\activate
pip install -e ".[dev]"

2. Configure

cp .env.example .env
# Edit .env — set your OPENAI_API_KEY and any overrides

Key environment variables:

Variable Default Description
OPENAI_API_KEY (empty) LLM provider key. Empty = stub mode (offline)
OPENAI_BASE_URL https://api.openai.com/v1 Any OpenAI-compatible endpoint
CEE_LLM_MODEL gpt-4o-mini Model identifier
CEE_EMBEDDING_MODEL text-embedding-3-small Embedding model
REDIS_URL (empty) Redis URL; empty = in-memory fallback
CEE_CHROMA_PATH .cee_data/chroma ChromaDB persistence path
CEE_MAX_TOKENS 4096 Total token budget per request
CEE_THOUGHT_MATCH_THRESHOLD 0.78 Cosine similarity threshold for template reuse

3. Seed starter templates

cee train-templates --inline

4. Start the server

cee serve
# or: make run

Server starts at http://localhost:8000 Interactive docs at http://localhost:8000/docs


CLI Reference

cee --help

cee serve                          # Start API server
cee serve --port 9000 --reload     # Dev mode with auto-reload

cee run --query "Explain BFS"      # Query via running server
cee run --query "..." --inline     # Query without a server (direct pipeline)
cee run --query "..." \
    --context '[{"type":"fact","content":"...","importance":0.9}]' \
    --agent-state '{"tools":["web_search"]}'

cee train-templates --inline       # Seed 10 starter templates
cee benchmark --inline             # Run 10-query benchmark
cee health                         # Check running server health

API Reference

POST /v1/process

Run the full CEE pipeline.

Request:

{
  "query": "Compare Python and Go for building microservices.",
  "context": [
    {
      "type": "fact",
      "content": "Our team has 5 years of Python experience.",
      "importance": 0.9
    }
  ],
  "agent_state": {}
}

Response:

{
  "response": "Python and Go differ in several key dimensions...",
  "intent": "analysis",
  "used_template": {
    "id": "tmpl-a1b2c3d4",
    "pattern": "compare two technologies, frameworks, or approaches",
    "success_rate": 0.84,
    "use_count": 47
  },
  "context_used": ["node-abc", "node-def"],
  "tokens_saved": 2048,
  "prediction_next": [
    "What are the trade-offs?",
    "Which should I choose for high throughput?"
  ],
  "warnings": []
}

POST /v1/templates

Create a new ThoughtTemplate.

{
  "pattern": "explain a sorting algorithm",
  "steps": [
    "Define the algorithm in one sentence.",
    "Walk through the steps with an example.",
    "State time and space complexity.",
    "Mention best and worst cases."
  ],
  "tags": ["educational", "algorithms"]
}

GET /v1/health

{
  "status": "ok",
  "version": "0.1.0",
  "template_count": 10,
  "llm_model": "gpt-4o-mini"
}

Testing

make test              # Full suite (offline/stub — no API key needed)
make test-cov          # With HTML coverage report
make test-fast         # Skip slow tests

Tests run fully offline using the stub LLM client and an in-memory ChromaDB instance.


Docker

# Start CEE + Redis
make docker-up

# View logs
make docker-logs

# Stop everything
make docker-down

Services:

Service Port Description
cee 8000 CEE API server
redis 6379 Cache backend

Developer Workflow

make dev-install       # Install all dependencies + pre-commit hooks
make fmt               # Auto-format with ruff
make lint              # Lint with ruff
make type-check        # mypy strict type checking
make test              # Run tests
make seed              # Seed starter templates
make benchmark         # Run benchmark
make run               # Start dev server with reload
make clean             # Remove all build / cache artifacts

Project Structure

cee/
├── cee/
│   ├── __init__.py
│   ├── config.py               # Pydantic Settings
│   ├── main.py                 # FastAPI app factory + lifespan
│   ├── pipeline.py             # CEEPipeline orchestrator
│   ├── cli.py                  # Typer CLI (serve/run/benchmark/train)
│   ├── models/
│   │   ├── intent.py           # IntentType, DepthLevel, Intent
│   │   ├── thought_template.py # ThoughtTemplate + EMA tracking
│   │   ├── context_node.py     # ContextNode + weighted importance
│   │   └── prediction.py       # Prediction model
│   ├── engines/
│   │   ├── intent_detector.py          # Rule + LLM fallback classifier
│   │   ├── thought_reuse_engine.py     # ANN template matching
│   │   ├── hierarchical_context_graph.py # BFS + importance pruning
│   │   ├── predictive_context_loader.py  # Markov-chain predictor
│   │   ├── token_budget_optimizer.py     # Intent-aware budget allocation
│   │   └── validation_layer.py           # Post-processing validation
│   ├── storage/
│   │   ├── vector_store.py     # ChromaDB adapter
│   │   └── cache.py            # Redis + in-memory fallback
│   ├── utils/
│   │   ├── embeddings.py       # OpenAI embeddings + stub fallback
│   │   ├── llm_client.py       # Model-agnostic completion client
│   │   └── logging.py          # structlog JSON configuration
│   └── api/
│       ├── schemas.py          # Pydantic request/response models
│       └── router.py           # FastAPI router (/process, /templates, /health)
├── tests/
│   ├── conftest.py             # Fixtures (pipeline, api_client)
│   ├── test_models.py          # Unit: Intent, ThoughtTemplate, ContextNode, Prediction
│   ├── test_engines.py         # Unit: IntentDetector, HCG, TokenBudget, Validation
│   ├── test_pipeline.py        # Integration: end-to-end pipeline
│   └── test_api.py             # Integration: HTTP endpoints via ASGI
├── scripts/
│   └── seed_templates.py       # 10 starter ThoughtTemplates
├── Dockerfile
├── docker-compose.yml
├── Makefile
├── pyproject.toml
├── requirements.txt
├── .env.example
└── .gitignore

Data Flow: Token Budget Example

For a HIGH depth ANALYSIS query with max_tokens=4096:

Total budget:    4096 tokens
Depth fraction:  1.00  (HIGH)
Effective:       4096 tokens

Split (analysis weights: 40% ctx / 20% rsn / 40% rsp):
  Context graph:       1638 tokens  ← HCG compression target
  Reasoning steps:      819 tokens  ← ThoughtTemplate injection
  LLM response:        1639 tokens  ← max_tokens passed to LLM

For a LOW depth RAG query:

Total budget:    4096 tokens
Depth fraction:  0.40  (LOW)
Effective:       1638 tokens
Tokens saved:    2458 tokens (60%)

Extending CEE

Add a new intent type

  1. Add a value to IntentType in cee/models/intent.py
  2. Add regex patterns in cee/engines/intent_detector.py
  3. Add a weight tuple in cee/engines/token_budget_optimizer.py

Swap the LLM provider

Set OPENAI_BASE_URL to any OpenAI-compatible endpoint:

  • Mistral: https://api.mistral.ai/v1
  • Ollama (local): http://localhost:11434/v1
  • Anthropic via proxy: use any compatible wrapper

Add a custom reasoning template

curl -X POST http://localhost:8000/v1/templates \
  -H "Content-Type: application/json" \
  -d '{
    "pattern": "my custom query pattern",
    "steps": ["Step 1", "Step 2", "Step 3"],
    "tags": ["custom"]
  }'

Roadmap

  • Reinforcement learning for template quality optimisation
  • Multi-agent coordination layer
  • Self-evolving reasoning templates
  • CCL (Content Correctness Layer) integration
  • OpenTelemetry tracing spans per pipeline stage
  • gRPC transport option alongside REST

License

Apache-2.0 © AgentDyne

About

Cognition optimization layer for LLM agents — reduce reasoning cost, improve determinism, and accelerate execution by reusing thought patterns, predicting context, and structuring knowledge.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors