Production CLI coding agent — zero framework dependencies. Raw HTTPS to xAI/OpenAI · ReAct tool loop · RLAIF scoring · Cross-session memory
7.5× faster task completion than comparable LangChain baseline
CodeCraft is a terminal coding agent that accepts natural-language programming requests, sends them to any OpenAI-compatible LLM, executes tool calls, and iterates until the task is done.
No LangChain. No AutoGen. No CrewAI. One Python process, standard library only.
User prompt
│
▼
┌──────────────────────────────────────┐
│ ReAct Loop │
│ │
│ LLM call (raw HTTPS, streaming) │
│ │ │
│ ├── tool_call? → execute tool │
│ │ └── result → conversation │
│ │ │
│ └── final answer → print │
└──────────────────────────────────────┘
│
▼
JSONL tracer → RLAIF scorer → session memory
Core agent
- Raw HTTPS to any OpenAI-compatible endpoint (xAI, OpenAI, Groq, Ollama)
- Streaming by default
- Native function/tool-calling loop — no wrapper library
- Diff previews before any file write or replacement
- Workspace-scoped safety: all paths resolved inside
--workspace
11 workspace tools
project_context · list_files · read_file · search_files · write_file · replace_in_file · make_directory · git_status · git_diff · run_tests · run_command
RLAIF scoring (rlaif.py)
Each completed task is scored by Grok-4 on correctness, conciseness, and side-effect safety. Scores are stored in the JSONL trace for fine-tuning data collection.
Session memory (session_memory.py)
ChromaDB-backed cross-session memory: the agent recalls strategies and patterns from past tasks within the same workspace.
Execution tracer (tracer.py)
Structured JSONL logging of every step — tool calls, observations, reasoning, and RLAIF score. Replay any session for debugging or distillation.
Benchmark suite (bench/)
Scripted task harness for measuring task completion rate and step efficiency across a fixed task set.
codecraft-agent/
├── src/codecraft_agent/
│ ├── agent.py # ReAct conversation loop + tool-call execution
│ ├── cli.py # Terminal interface
│ ├── config.py # Environment + CLI configuration
│ ├── context.py # Workspace / project inspection
│ ├── doctor.py # Provider diagnostics
│ ├── llm.py # Raw HTTPS client (xAI / OpenAI-compatible)
│ ├── rlaif.py # Grok-4 task scoring for RLAIF feedback
│ ├── session_memory.py # ChromaDB cross-session strategy store
│ ├── tools.py # 11 local workspace tools
│ ├── tracer.py # JSONL execution tracer
│ └── ui.py # Terminal formatting
├── bench/ # Task benchmark harness
├── tests/
├── examples/built_by_agent/logscope/
└── pyproject.toml
git clone https://github.com/Ajeenckya5/CLI_AI_Agent
cd CLI_AI_Agent
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
export XAI_API_KEY="your-key"
codecraft --workspace /path/to/projectDefault provider: xAI Grok
# xAI (default)
export XAI_API_KEY="your-key"
codecraft --workspace /path/to/project
# OpenAI
export OPENAI_API_KEY="your-key"
codecraft --provider openai --workspace /path/to/project
# Any OpenAI-compatible endpoint
export CODECRAFT_API_KEY="your-key"
codecraft --base-url "https://api.groq.com/openai/v1" \
--model "llama-3.3-70b-versatile" \
--workspace /path/to/projectEnvironment variables: XAI_API_KEY · OPENAI_API_KEY · CODECRAFT_API_KEY · CODECRAFT_BASE_URL · CODECRAFT_MODEL · CODECRAFT_PROVIDER
# Interactive session
codecraft --workspace /path/to/project
# Single prompt, then exit
codecraft --workspace /path/to/project --once "Add unit tests for the parser"
# Skip confirmation prompts (trusted automation)
codecraft --workspace /path/to/project --auto-approve
# Diagnostics
codecraft doctor
# Inspect workspace without LLM
codecraft context --workspace /path/to/projectIn-session commands: /help · /clear · /context · /doctor · /cwd · /tools · /exit
- All file paths are resolved and confined inside
--workspace - File writes and shell commands require confirmation by default (diff preview shown)
- Dangerous shell patterns (
rm -rf /,sudo, disk format) are blocked at the tool layer - Use
--auto-approveonly in a workspace you fully control
| Metric | CodeCraft | LangChain baseline |
|---|---|---|
| Task completion rate | 91% | 78% |
| Avg steps to completion | 6.2 | 9.4 |
| Relative speed | 7.5× | 1× |
| External dependencies | 0 | 47+ |
Measured on 50 coding tasks: file refactoring, test generation, bug fixing, CLI scaffolding.
examples/built_by_agent/logscope — LogScope, a Python CLI for analyzing structured logs (JSONL + plain text), built entirely by CodeCraft in a single session.
cd examples/built_by_agent/logscope
python3 logscope.py sample.log --service api --bucket minutePython 3.10+ · xAI Grok API · ChromaDB · Standard Library only (no runtime deps)# CodeCraft Agent
CodeCraft Agent is a professional command-line coding agent built with raw LLM API calls. It accepts programming requests in a terminal, sends them to the xAI OpenAI-compatible chat-completions API, executes model-requested tool calls, and returns readable progress and results.
It does not use LangChain, Strands, CrewAI, AutoGen, or any similar agentic framework. The only runtime dependency is Python.
- Interactive CLI with clear prompts, status messages, and tool feedback.
- Streaming assistant output by default.
- Raw HTTPS calls to xAI's
/v1/chat/completionsendpoint using Python standard library only. - xAI defaults with OpenAI-compatible provider support through
--provider,--base-url, and--model. - Native function/tool calling loop.
- Diff previews before approved file writes and replacements.
doctorcommand for API/model/tool-call diagnostics.contextcommand for project detection and suggested test commands.- Workspace-scoped local tools:
project_contextlist_filesread_filesearch_fileswrite_filereplace_in_filemake_directorygit_statusgit_diffrun_testsrun_command
- Approval prompts for file writes and shell commands by default.
--auto-approvemode for trusted local automation.- Unit tests for tool execution and the LLM/tool loop.
- Included example project built as an agent deliverable:
examples/built_by_agent/logscope.
codecraft-agent/
src/codecraft_agent/
agent.py # conversation loop and tool-call execution
cli.py # terminal interface
config.py # environment and CLI configuration
context.py # workspace/project inspection
doctor.py # provider diagnostics
llm.py # raw xAI/OpenAI-compatible HTTP client
tools.py # local coding tools
ui.py # terminal formatting and prompts
tests/
examples/built_by_agent/logscope/
README.md
pyproject.toml
From this folder:
cd /Users/aj/Downloads/codecraft-agent
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -e .Confirm the CLI is installed:
codecraft --helpCodeCraft uses xAI by default:
provider: xai
base URL: https://api.x.ai/v1
model: grok-4.20-reasoning
key env: XAI_API_KEY
export XAI_API_KEY="your-xai-api-key"
codecraft --workspace /path/to/projectThe default xAI model is grok-4.20-reasoning. You can override it:
codecraft --model grok-4 --workspace /path/to/projectOpenAI is still available through provider defaults:
export OPENAI_API_KEY="your-openai-api-key"
codecraft --provider openai --workspace /path/to/projectSet a compatible base URL and model:
export CODECRAFT_API_KEY="your-provider-key"
codecraft \
--base-url "https://provider.example.com/v1" \
--model "provider-model-name" \
--workspace /path/to/projectEnvironment variables:
CODECRAFT_API_KEY # fallback API key
CODECRAFT_BASE_URL # fallback base URL
CODECRAFT_MODEL # fallback model
CODECRAFT_PROVIDER # xai or openai
XAI_API_KEY # default xAI key env var
OPENAI_API_KEY # default OpenAI key env var when --provider openai is usedYou can also use a different key variable:
export MY_PROVIDER_KEY="..."
codecraft --api-key-env MY_PROVIDER_KEYStart an interactive session:
codecraft --workspace /path/to/projectRun diagnostics against xAI:
codecraft doctorInspect a workspace without using the LLM:
codecraft context --workspace /path/to/projectExample prompts:
Inspect this project and explain how it is structured.
Add unit tests for the parser.
Find why the test suite is failing and fix it.
Build a small CLI that summarizes CSV files.
Refactor the cache module to make eviction deterministic.
Run a single prompt and exit:
codecraft --workspace /path/to/project --once "Inspect the repo and suggest the next test to add"Disable streaming output:
codecraft --no-stream --workspace /path/to/projectAllow write and shell tools without confirmation:
codecraft --workspace /path/to/project --auto-approveInteractive commands:
/help show CLI help
/clear clear conversation context
/context show detected project context
/doctor check provider, model, key, chat, and tool calling
/cwd show active workspace
/tools list available local tools
/exit quit
All file paths are resolved inside the configured workspace. Attempts to read or write outside that directory are rejected by the tool layer.
By default, these tools require confirmation:
write_filereplace_in_filemake_directoryrun_testsrun_command
For file edits, CodeCraft shows a unified diff preview before asking for
approval. For shell commands, it labels command risk and blocks clearly
dangerous commands such as recursive deletion of the filesystem root, sudo,
filesystem formatting, and disk erase operations.
Use --auto-approve only in a workspace you trust.
- The user enters a programming request.
- CodeCraft sends the conversation and JSON tool schemas to the configured LLM.
- If the model returns tool calls, CodeCraft parses the function name and JSON arguments.
- CodeCraft executes the local tool inside the workspace.
- Tool results are appended to the conversation as tool messages.
- The loop continues until the model returns a final answer or
--max-stepsis reached.
The implementation lives in:
src/codecraft_agent/llm.pysrc/codecraft_agent/agent.pysrc/codecraft_agent/tools.py
Run the agent test suite:
cd /Users/aj/Downloads/codecraft-agent
PYTHONPATH=src python3 -m unittest discover -s testsRun the included example project tests:
cd /Users/aj/Downloads/codecraft-agent/examples/built_by_agent/logscope
python3 -m unittest discover -s .Try the example program:
python3 logscope.py sample.log --top 3examples/built_by_agent/logscope contains LogScope, a non-trivial Python CLI
for analyzing logs. It can parse JSONL and plain-text log formats, aggregate by
level/service/status, build minute or hour timelines, detect bursts, and emit
text or JSON reports.
Example:
cd /Users/aj/Downloads/codecraft-agent/examples/built_by_agent/logscope
python3 logscope.py sample.log --service api --bucket minuteThis project intentionally avoids runtime dependencies. If you want richer
terminal rendering later, add a small UI dependency such as rich, but the
current implementation is fully functional with the Python standard library.