A small Go agent that talks to a local llama.cpp server (OpenAI-compatible HTTP API), runs a tool-using loop, and can accept input from a terminal CLI, Telegram, or a periodic cron “heartbeat”. This is a hobby project; do not rely on it for production or security-sensitive workloads.
- LLM backend —
llamacppprovider: completions and streaming, including optional reasoning/thinking deltas when the server exposes them. - Supervisor + sub-agents — Main agent can delegate via
spawn_agentto a pool of specialised agents loaded from~/.micro-agent/agents, with LLM-based routing. - Tools — Filesystem (
list_dir,read_file,write_file,append_file,edit_file), optionalshell_exec, long-term memory tools (memory_save,memory_search,memory_delete) when a store is available, HTTPweb_fetch, and Browserless-backedbrowser_search/browser_contentwhenBROWSERLESS_URLis set.telegram_sendis registered whenTELEGRAM_BOT_TOKENis set. - Long-term memory — Vector store backed by Milvus (embeddings from the same llama-server). If embeddings are disabled (
memory.embed/MEMORY_EMBED) or Milvus is unreachable, the agent still runs with session memory only (in-process session store). - Channels — Interactive readline CLI (with streamed thinking/reply styling and
/attach <path>to queue document files for the next message), Telegram long polling (user document messages are downloaded and turned into text attachments), HTTP channel with embedded browser chat UI (JSON or multipart uploads, SSE streaming), and optional cron channel that injects periodic ticks from a heartbeat file (uses SQLite for tick state). - Session handling — Per-channel session keys, optional conversation tree on the CLI for branching sessions.
- Context control — Configurable message limits and compaction (
truncateorsummarizeagainst a token threshold). - Logging —
-v/-vv/-vvvverbosity; optional log file; daemon mode can log to stderr and file together. - Safety toggles —
--safe(or--no-fs,--no-web,--no-spawn) to strip destructive filesystem tools, browser/fetch tools, and sub-agent spawning.
More detail lives under docs/ (config, channels, memory, tools, multi-agent, core).
- Go 1.25+ (see
go.mod). - llama-server (or compatible OpenAI-style server) reachable at the URL in config (default
http://localhost:8080). - Long-term memory (optional) — Running Milvus and
memory.embed: truewith a validmilvus_addr(seeexamples/config.json). Omit or disable embeddings for a simpler, session-only setup. - Browserless (optional) — For JS-rendered search/content; compose includes a
browserlessservice.
git clone https://github.com/offdev/micro-agent.git
cd micro-agent
go build -o ua ./cmd/uaInstall the binary wherever you prefer (e.g. mv ua ~/bin/). Ensure llama-server is running and points at your model.
- Default config path:
$UA_CONFIGor~/.micro-agent/config.json. Missing file is OK; environment variables override file values. - Copy and edit
examples/config.jsonas a starting point. - Optional prompts:
SYSTEM.mdnext to your workdir parent (~/.micro-agent/SYSTEM.mdby default),AGENTS.mdunder the workdir, and per-agent definitions under~/.micro-agent/agents/.
Important environment variables (see also comments in cmd/ua/main.go):
| Variable | Role |
|---|---|
LLAMA_URL |
Base URL of llama-server |
WORKDIR |
Process working directory for tools |
MEMORY_EMBED |
false to skip embeddings / long-term memory store |
MILVUS_ADDR |
Milvus gRPC address (e.g. localhost:19530) |
TELEGRAM_BOT_TOKEN |
Enables Telegram channel + tool |
HTTP_CHANNEL_ENABLED |
Enables HTTP channel + embedded chat UI |
HTTP_CHANNEL_LISTEN |
HTTP listen address (default 127.0.0.1:8765) |
HTTP_CHANNEL_TOKEN |
Optional shared secret (X-UA-Token) for /api/chat |
BROWSERLESS_URL |
Browserless HTTP URL for browser tools |
CRON_ENABLED |
Enables heartbeat channel (often with --daemon) |
From the repository root after go build:
./ua- With a TTY, you get the CLI by default. Use
--daemonfor headless mode (requires another channel such as Telegram or cron, or the process will exit with “no channels configured”). - Use
--interactiveto force the CLI when stdin is not a TTY. - For browser chat, set
HTTP_CHANNEL_ENABLED=trueand openhttp://127.0.0.1:8765(or your configured listen address).
- Streaming output uses SSE events (
thinking,delta,error,done) in arrival order. - Long lines and long tokens wrap cleanly; horizontal scrolling is disabled.
- Input controls:
Entersends the message,Shift+Enterinserts a newline. - Response whitespace is preserved (including leading spaces in streamed deltas).
- Attachments — “Attach documents” sends multipart
POST /api/chatwithsession_id,message, and repeatedfileparts (plain text, PDF, images for vision, and other types perdocs/channel.md). With no files selected, the UI uses JSON as before. - If the HTTP channel is protected with
HTTP_CHANNEL_TOKEN, setlocalStorage.ua_http_tokenin the browser devtools (or equivalent) so requests includeX-UA-Token.
- Run
/attach /path/to/file(quote paths with spaces) one or more times, then send your message on the next line. Pending paths are applied to that line only. Errors for a given file go to stderr.
Safer exploration (read-only style: no writes/edits/shell, no browser/fetch, no sub-agents):
./ua --safeVerbosity: -v, -vv, -vvv.
The stack is defined in docker-compose.yml at the repository root. It builds the agent from deploy/Dockerfile, pulls up Milvus (etcd, minio, standalone), optional Browserless, optional Attu, and runs the ua service with network_mode: host so the agent can reach llama-server and Milvus on localhost alongside the host.
Before first run on the host:
mkdir -p ~/.micro-agent
cp examples/config.json ~/.micro-agent/config.json
# Edit milvus_addr, model, URLs as needed.
touch ~/.micro-agent/SYSTEM.md # or add real content; compose mounts it read-onlyBuild and start (from repo root):
docker compose up --build- llama-server is not in this compose file — run it on the host (or elsewhere) and set
LLAMA_URLin theuaservice environment if it is nothttp://localhost:8080. - The
uaservice mounts${HOME}/.micro-agentinto the container so workdir, DB paths, and state stay on the host; config andSYSTEM.mdare mounted read-only as in the compose file.
Running more safely in Docker — Prefer a restricted tool set and avoid mounting sensitive host paths beyond a dedicated agent directory. You can override the container command, for example:
docker compose run --rm ua ./ua --interactive --safe(Adjust flags for --daemon + Telegram/cron if you do not use an interactive TTY.)
Because the optional stack uses host networking and privileged-adjacent services, treat this as local experimentation only, not an isolation boundary for untrusted code.
Execution is centred on internal/core.Agent.Run. The application wraps a supervisor (internal/multiagent) around that same loop and feeds it messages from channels (internal/app).
- If
Instructions(system prompt) is non-empty and the conversation does not already start with asystemmessage, the system prompt is prepended. - A name → tool map is built once per
Runfor lookups. - The callback chain runs
BeforeAgentLoop(e.g. logging hooks when verbosity is enabled).
Each loop iteration:
BeforeLLMCall— Callbacks may transform the message list sent to the model.- Tool definitions — Current tools are serialized to the provider once for this turn.
- LLM call — Either:
- Streaming — If a
streamConsumeris set (CLI/Telegram path), the provider’sStreamis used; eachDeltacan carry incremental text, aThinkingflag for reasoning-only chunks, and a terminalDonewithFinalholding the full assistant turn (content + tool calls). Only the final assistant content is persisted; thinking streams are for display only. - Non-streaming —
Completereturns a singleResponse.
- Streaming — If a
- Empty-response handling — If the model returns no text and no tool calls, the loop injects a short synthetic user nudge and retries, up to a small fixed number of times; otherwise
Runreturns an error (empty turns are not appended). AfterLLMCall— Callbacks observe the response; errors propagate.- Persist assistant message — The assistant message (content +
tool_calls, if any) is appended to the conversation. - Exit if no tools — If there are no tool calls, the outer loop ends.
- Tool execution — Otherwise, each tool call is processed concurrently (goroutines +
WaitGroup):BeforeToolExecutionmay adjust or reject the call.- Unknown tools produce an error result string; known tools run
Execute. AfterToolExecutionruns for logging/metrics; errors from this hook are ignored by design.
- Tool results — For each call, a
toolrole message is appended (content orerror: …), linked bytool_call_idand name.
- After the loop finishes,
AfterAgentLoopruns on the callback chain.
- Sessions —
SessionStoreholds per-sessionConversationstate; each inbound message appends a user message, thenSupervisor.Run(same asAgent.Run) runs the loop. - Compaction — After a successful run, if strategy is
summarizeand estimated tokens exceed the threshold, the app may replace the conversation with a summarised version before storing it back. - Sub-agents — The supervisor’s agent has
spawn_agent; sub-agents in the pool do not getspawn_agent, avoiding unbounded recursion.
See LICENSE.