v1.0.0 - Serverless Codebase Assistant
Production-ready MVP for semantic GitHub repository understanding using Retrieval-Augmented Generation (RAG), vector search, and serverless AWS infrastructure.
Live Demo
https://d1femwt9slkevk.cloudfront.net/
or
Overview
Codebase Assistant allows users to:
- Paste a GitHub repository URL
- Automatically clone and index the repository
- Generate semantic embeddings for source code
- Ask natural language questions about the codebase
- Receive contextual answers with file citations
The system is designed using an event-driven serverless architecture on AWS.
Features
Repository Indexing
- GitHub repository cloning
- Shallow clone optimization
- Automatic re-indexing support
- Repository-scoped chunk management
Intelligent Code Parsing
Supports indexing for:
- Python
- JavaScript / TypeScript
- TSX / JSX
- Java
- C / C++
- Go
- Rust
- SQL
- Markdown
- YAML / JSON
Ignores:
- node_modules
- .git
- build artifacts
- lock files
- binaries/media
Semantic Chunking
- Function/class-aware chunking
- Metadata tracking
- File path + line number references
- Context-preserving segmentation
Vector Search
- pgvector-powered similarity search
- Cosine similarity retrieval
- Semantic repository querying
AI-Powered Responses
- OpenAI embedding + generation support
- Offline deterministic embedding fallback
- Source-aware contextual answers
Frontend
- Next.js frontend
- Repository indexing workflow
- Interactive chat interface
- Source citation rendering
AWS Architecture
Frontend:
- S3
- CloudFront CDN
Backend:
- API Gateway
- AWS Lambda
- SQS queue workers
Database:
- PostgreSQL RDS
- pgvector extension
Infrastructure:
- AWS SAM
- CloudFormation
- Docker-based Lambda packaging
Architecture Flow
User → CloudFront → S3 Frontend → API Gateway → Lambda Backend → SQS → Indexer Lambda → PostgreSQL + pgvector
Technical Highlights
- Serverless-first architecture
- Event-driven indexing pipeline
- Infrastructure-as-Code deployment
- Async repository processing
- Vector database integration
- RAG-based retrieval system
- Production-style AWS deployment
Planned Improvements
- GitHub OAuth
- Private repository support
- Agent workflows
- Code graph analysis
- Architecture diagram generation
- Streaming responses
- Multi-repository workspaces
- Incremental re-indexing
- Authentication and rate limiting
Tech Stack
Frontend:
- Next.js
- TypeScript
Backend:
- FastAPI
- Python
Infrastructure:
- AWS Lambda
- API Gateway
- SQS
- CloudFront
- S3
- RDS PostgreSQL
- pgvector
- AWS SAM
AI / Search:
- OpenAI API
- Vector embeddings
- Semantic retrieval
Deployment
Fully deployed on AWS using Infrastructure-as-Code via AWS SAM and CloudFormation.