Skip to content

fluid-chey/visiontree

 
 

Repository files navigation

Voicetree

Obsidian meets Claude Code

Voicetree is an interactive graph-view where nodes are either markdown notes, or terminal based agents (Claude code, Codex, OpenCode, Gemini etc. )

Agents can spawn their own subagents onto the graph. Agents will have the nearby nodes injected into their context. Agents are also able to edit and create their own nodes.

This project aims to build from first principles the most possibly efficient human-AI interaction system.

Voicetree Demo

Build macOS Windows Linux Discord

Why?

Challenge Voicetree Solution
Manual agent coordination Agents can breakdown tasks into subgraphs and recursively spawn children terminals
4-10 agent terminals is overwhelming Spatially organise agents, tasks and progress on the graph
Agents don't know what you know You share the same memory graph with agents
Agents suffer context-rot and lack memory Defaults to short, focussed sessions with automatic handover

Install

Download links macOS (Apple Silicon) | macOS (Intel) | Windows | Linux

MacOS

brew tap voicetreelab/voicetree && brew install voicetree

Linux

curl -fsSL https://raw.githubusercontent.com/voicetreelab/voicetree/main/install.sh | sh

Windows: https://github.com/voicetreelab/voicetree/releases/latest/download/voicetree.exe


How It Works

Your agents (Claude Code, Codex, Opencode, Gemini etc.) live inside the graph, next to their tasks, plans, and progress updates.

Context retrieval: Agents see all nodes within a configurable radius and can semantic search against local embeddings.

Spatial layout: Location-based memory is the most efficient way to remember things.

Externalized working memory: Each node represents a concept at any level of abstraction. The graph structure mirrors your mental model - relationships between ideas are represented exactly as you think about them, offloading cognitive load to the canvas.

In Detail

Nodes are markdown files, connections are wikilinks to the .md file paths. You open rich markdown editors directly within the graph by hovering over a node, (or use speech-to-graph mode).

You can spawn coding agents on a node, the contents of that node will become the agents task, and it will also be given all context within an adjustable distance around them, and can semantic search against local embeddings. This means agents see what you see. You share the same memory, the same second brain. The graph structure allows for context retrieval to be targeted to only what is most relevant rather than dumping entire conversation history - avoiding the 30-60% performance degradation from context rot1.

Agents can build their own subgraphs, decomposing their tasks into small connected chunks of work. You can glance at the high-level structure and progress of these, and zoom in to the details of what matters most. For example, ask a Voicetree agent to divide their plan into nodes of data-model, architecture, pure logic, edge logic, UI components, and integration. This lets you carefully track the planing to implementation for what matters most: the high level changes & core logic.

Agents can then spawn and orchestrate their own parallel subagents to work through these dependency graphs. In Voicetree, subagents are just native terminals so you have full transparency and control over them unlike with other CLI agents.

As your project & context grows, the Voicetree approach scales. You use your brains most efficient form of memory: remembering the location of where things are. Each node can represent any concept at any level of abstraction. You can see and reason about the structure between these concepts more easily as it is represented exactly as your brain represented them. This lets you externalise your working memory, freeing up cognitive load for the real problem-solving.


Voice Mode

Capture ideas hands-free with speech-to-graph.

Why speaking works: Speaking activates deliberate (System 2) thinking - verbalizing forces you to think about what you are doing. Japanese train conductors use "point and calling" (shisa kanko) to reduce errors by 85% for the same reason. Speech also engages different brain regions than writing, with lower cognitive load for idea generation. It's usually messy and hard to store/retrieve, so we turn voice into a structured mindmap.

Backtracking without mental load: Go arbitrarily deep down a problem. The graph holds the chain of "why am I doing this?" so you don't have to.

Tangibility: Thought becomes visible and persistent. This isn't just documentation; Making progress tangible is a prerequisite for flow states.


Development

Prerequisites: Node.js 18+, Python 3.13, uv

cd webapp && npm install && npm run electron  # App
uv sync && uv run pytest                               # Backend

License

BSL 1.1, converts to Apache 2.0 after 4 years. See LICENSE.

Contact

Questions? Join the Discord. Feedback is valuable - ping us with thoughts, criticisms, or feature requests.

Footnotes

  1. Chroma Research, "Context Rot: How Increasing Input Tokens Impacts LLM Performance" (July 2025). 30-60% performance gaps between focused (~300 token) and full (~113k token) prompts. https://research.trychroma.com/context-rot

About

VisionTree is a design-focused evolution of VoiceTree that helps designers capture visual intent, provide rich screenshot and screen-recording context, and guide AI agents toward faithful execution. It bridges human design judgment and machine output by structuring visual context — turning vision into actionable understanding.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 76.1%
  • Python 20.3%
  • Shell 1.4%
  • JavaScript 1.1%
  • CSS 0.9%
  • PowerShell 0.1%
  • Other 0.1%