Codebase Assistant

A GitHub repository understanding assistant. Paste a repo URL, index the source files, then ask questions like "Where is authentication handled?" or "Explain the payment flow." The backend retrieves relevant chunks and answers with file references.

Live deployment:

App: https://d1femwt9slkevk.cloudfront.net/
Short URL: https://shorturl.at/t59If

Architecture

flowchart LR
    User[User Browser] --> CF[CloudFront CDN]
    CF --> S3[S3 Static Frontend]

    User --> APIGW[API Gateway HTTP API]

    APIGW --> ApiLambda[FastAPI Lambda]

    ApiLambda --> Chat[Chat Endpoint]
    ApiLambda --> RepoIndex[Repository Index Endpoint]
    ApiLambda --> JobStatus[Job Status Endpoint]

    RepoIndex --> Queue[(SQS Index Queue)]
    Queue --> IndexerLambda[Indexer Lambda Worker]
    Queue --> DLQ[(SQS Dead Letter Queue)]

    IndexerLambda --> GitHub[GitHub Repository]
    IndexerLambda --> OpenAI[OpenAI Embeddings API]
    IndexerLambda --> DB[(RDS PostgreSQL + pgvector)]

    Chat --> DB
    Chat --> OpenAIChat[OpenAI Chat API]
    JobStatus --> DB

    DB --> Chat
    OpenAIChat --> Answer[Semantic Codebase Answers<br/>+ File Citations]
    Chat --> Answer
    Answer --> User

Stack

Frontend: Next.js, Tailwind CSS
Backend: FastAPI
RAG: LangChain, OpenAI
Vector store: PostgreSQL + pgvector
Repository processing: GitPython

Project Layout

codebase-assistant/
├── backend/
│   ├── app/
│   │   ├── api/
│   │   ├── db/
│   │   ├── models/
│   │   ├── services/
│   │   └── main.py
│   ├── .env.example
│   └── requirements.txt
├── frontend/
│   ├── app/
│   ├── components/
│   ├── .env.example
│   └── package.json
└── README.md

Backend Setup

Start Postgres with pgvector first:

docker compose up -d postgres

cd backend
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
copy .env.example .env
uvicorn app.main:app --reload

Set OPENAI_API_KEY in backend/.env for OpenAI embeddings and LLM answers.

You can also choose Local Llama in the UI for a free/offline mode. Local Llama uses deterministic local hash embeddings for retrieval and calls an Ollama-compatible local server for answers:

ollama pull llama3.1
ollama serve

Optional backend environment variables:

OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_CHAT_MODEL=gpt-4o-mini
OLLAMA_BASE_URL=http://localhost:11434
LLAMA_CHAT_MODEL=llama3.1 or llama3.2:1b
OLLAMA_TIMEOUT_SECONDS=300
DATABASE_URL=postgresql://codeassist:codeassist@localhost:5432/codeassist
STORAGE_DIR=storage

If OpenAI is not configured, OpenAI mode falls back to local hash embeddings and returns relevant chunks instead of a synthesized answer. If Local Llama is selected but Ollama is not running, the app also returns the relevant chunks with setup guidance.

Backend runs at:

http://localhost:8000

Health check:

curl http://localhost:8000/health

Frontend Setup

cd frontend
npm install
copy .env.example .env.local
npm run dev

Frontend runs at:

http://localhost:3000

Docker Compose

Run the full app with PostgreSQL/pgvector:

docker compose up --build

Services:

Frontend: http://localhost:3000
Backend: http://localhost:8000
Postgres/pgvector: localhost:5432

Serverless AWS Deployment

The project is also configured for this AWS stack:

Frontend: S3 + CloudFront
Backend: API Gateway HTTP API + Lambda
Indexing: SQS + Lambda worker
Database: RDS PostgreSQL + pgvector

Serverless deployment files:

infra/serverless/template.yaml
backend/Dockerfile.lambda
backend/app/lambda_handler.py
backend/app/workers/index_repo.py
scripts/deploy-backend.ps1
scripts/deploy-frontend.ps1
docs/serverless-aws.md

High-level deploy flow:

.\scripts\deploy-backend.ps1 `
  -DatabaseUrl "postgresql://USER:PASSWORD@HOST:5432/DBNAME" `
  -OpenAiApiKey "sk-..." `
  -AllowedOrigins "https://d1femwt9slkevk.cloudfront.net"

Then deploy the static frontend:

.\scripts\deploy-frontend.ps1 `
  -BucketName "FRONTEND_BUCKET_OUTPUT" `
  -DistributionId "CLOUDFRONT_DISTRIBUTION_ID_OUTPUT" `
  -ApiBaseUrl "API_URL_OUTPUT"

See docs/serverless-aws.md for the full process.

API

Index a repository

POST /repos/index
Content-Type: application/json

{
  "repo_url": "https://github.com/user/project",
  "ai_provider": "openai"
}

Response:

{
  "repo_id": "abc123",
  "repo_url": "https://github.com/user/project",
  "ai_provider": "openai",
  "files_indexed": 42,
  "chunks_indexed": 128,
  "status": "indexed"
}

Ask a question

POST /chat
Content-Type: application/json

{
  "repo_id": "abc123",
  "question": "Where is authentication handled?",
  "ai_provider": "openai",
  "top_k": 6
}

Response includes an answer and source chunks with file path, language, line range, content, and score.

Indexing Behavior

The loader ignores heavy or generated folders such as .git, node_modules, venv, dist, build, __pycache__, .next, and coverage.

Indexed extensions:

.py .js .ts .tsx .jsx .java .cpp .c .go .rs .md .yaml .yml .json .sql

Chunk metadata:

{
  "repo_id": "abc123",
  "file_path": "src/auth/login.py",
  "language": "python",
  "start_line": 10,
  "end_line": 45,
  "content": "def login_user(...): ..."
}

Notes

PostgreSQL with pgvector stores repos, files, chunks, metadata, and vectors.
Cloned repositories are stored under STORAGE_DIR/repos.
Tree-sitter chunking, background jobs, private repo auth, evals, and CI/CD deployment are natural next steps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codebase Assistant

Live deployment:

Architecture

Stack

Project Layout

Backend Setup

Frontend Setup

Docker Compose

Serverless AWS Deployment

API

Index a repository

Ask a question

Indexing Behavior

Notes

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
backend		backend
docs		docs
frontend		frontend
infra/serverless		infra/serverless
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

Codebase Assistant

Live deployment:

Architecture

Stack

Project Layout

Backend Setup

Frontend Setup

Docker Compose

Serverless AWS Deployment

API

Index a repository

Ask a question

Indexing Behavior

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages