Skip to content

Atnatewoss/codeatlas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeAtlas

Build a mental model of any codebase in minutes.

Python 3.12+ Node 18+ License MIT


Go from zero to productive on any codebase in minutes - CodeAtlas autonomously explores, maps, and explains the full architecture.

Features

  • Autonomous Exploration - LLM-driven agent generates hypotheses, selects tools, executes them in parallel, and iterates until it understands the codebase.
  • BFS + Beam Search Hybrid - Broad exploration at each depth level with beam pruning that keeps only the top-K scoring branches, focusing compute on the most promising paths.
  • Code Graph Backend - Every symbol, call, import, and inheritance edge mapped via graphify.
  • Traced Citations - Every claim links back to specific files and lines (file.py:42) for verification.
  • Architecture Diagrams - Auto-generated Mermaid call graphs and data flow diagrams.
  • Real-Time Streaming - WebSocket-based so you see progress as it happens.

How It Works

CodeAtlas first builds a rich code graph (via graphify, AST-derived, with nodes for every symbol and edges for calls, imports, and inheritance) that represents the full structure of the repository. This graph then drives a LangGraph-powered Tree-of-Thought workflow - exploring the codebase, reasoning about architecture, and synthesizing everything into a coherent mental model with call graphs, data flow diagrams, and cited evidence.

The workflow is a BFS + Beam Search hybrid - BFS controls depth while beam search prunes low-value branches at each level, keeping only the top-K scoring thoughts.

generate_thoughts → execute_batch → evaluate_batch → beam_prune_expand ──→ synthesize
                                                                  │
                                                                  ├──→ execute_batch (normal loop)
                                                                  └──→ generate_thoughts (re-generate when beam empty)
Node Description
generate_thoughts LLM proposes 2–3 hypotheses with tool selections. On re-generation, avoids previously explored angles.
execute_batch Runs each pending thought's tool against the repo in parallel (ThreadPoolExecutor). Collects outcomes and file paths.
evaluate_batch Hybrid scorer: LLM evaluates relevance + evidence strength, computes source diversity from unique files touched. All evaluations run in parallel.
beam_prune_expand Beam step - drops scores < 0.4, keeps top-K (keep_top_k), generates child thoughts (up to max_children). If beam empties before max_depth, routes back for fresh angles. Early-exits to synthesis when ≥70% of beam candidates are ready.
synthesize Collects evidence from best branches, generates a Mermaid architecture diagram, produces a final answer with numbered citations (file:line), rejected-branch summary, and uncertainties.

The state machine is built with LangGraph. The stack: FastAPI (backend), Next.js (frontend), GitHub Models (LLM, free tier).


Quick Start

# 1. Install dependencies
make setup                          # macOS / Linux
.\dev.ps1 setup                     # Windows

# 2. Configure your API key
cp apps/api/.env.example apps/api/.env
# Edit .env → set GITHUB_TOKEN=ghp_your_token_here

# 3. Run (API + Web in parallel)
make dev                            # macOS / Linux
.\dev.ps1 dev                       # Windows
# Run tests
make test                           # macOS / Linux
.\dev.ps1 test                      # Windows

Configuration

Variable Default Description
GITHUB_TOKEN - GitHub PAT for API access
GENERATION_LLM_MODEL gpt-4o-mini Model for hypothesis generation
EVALUATION_LLM_MODEL gpt-4o-mini Model for scoring
SYNTHESIS_LLM_MODEL gpt-4o-mini Model for final synthesis
MAX_DEPTH 3 BFS depth limit
MAX_CHILDREN 2 Max child thoughts per parent
KEEP_TOP_K 5 Beam width
EXECUTION_WORKERS 4 Parallel tool call workers
EVALUATION_WORKERS 2 Parallel LLM evaluation workers

Roadmap

See ROADMAP.md for planned work including CLI, BYOK (bring your own LLM key), Slack/Discord bots, VS Code extension, GitHub Action, and more.

About

Deep Research for Open Source Software

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors