╔══════════════════════════════════════════════════════════════════════════════╗
║ Building production AI systems that move beyond prototypes — ║
║ Graph-RAG engines, agentic pipelines, and full-stack AI platforms ║
║ with measurable reliability, end-to-end observability, and real impact. ║
╚══════════════════════════════════════════════════════════════════════════════╝
I'm Anupam Kumar — an AI Engineer and CS graduate from IIITM Gwalior with a focus on building intelligent systems that are reliable enough to run in production, not just impressive enough to demo.
My work sits at the intersection of three disciplines I care about equally: AI systems (retrieval, reasoning, agents), platform engineering (APIs, distributed services, containerized deployment), and ML infrastructure (evaluation pipelines, experiment tracking, observability). I'm most effective when all three are in scope at once — designing a system end-to-end rather than handing off at a boundary.
Over the past few years I've worked across Graph-RAG architectures, multi-agent reasoning workflows, computer vision pipelines, LLM evaluation infrastructure, and backend systems serving real production traffic. I have a habit of taking ideas from research papers — Graph-RAG, hybrid retrieval fusion, RAGAS-based evaluation — and engineering them into deployable systems with measurable correctness guarantees, proper failure handling, and the observability needed to trust them in production.
A few things I hold as non-negotiable in the systems I build:
- Nothing ships without evaluation. Every pipeline I build includes a benchmark loop — RAGAS, fast metrics, MLflow tracking — so quality is measurable, not assumed.
- Observability is part of the design, not an afterthought. Prometheus instrumentation, health/readiness endpoints, and structured logging go in from day one.
- Reliability means planning for failure. Cache corruption, LLM timeouts, cold-start latency spikes, connection pool exhaustion — these have all come up, and the systems I've built handle them gracefully.
I'm particularly interested in roles where the challenge is making AI systems dependable at scale — whether that means improving retrieval quality, reducing hallucination risk, tightening evaluation loops, building ML serving infrastructure, or designing the platform layer that other engineers build on top of.
🌐 Portfolio: anupai-portfolio.vercel.app · 📄 Open to: AI Engineering · Platform Engineering · MLOps · Backend Infrastructure
AI & LLM Systems
| Domain | Technologies |
|---|---|
| LLM Frameworks | LangChain, Groq, OpenRouter, HuggingFace Inference, Sentence Transformers |
| Retrieval | FAISS, BM25 (rank-bm25), RRF Fusion, Cross-Encoder Reranking |
| Vector Stores | FAISS, Pinecone, ChromaDB |
| Evaluation | RAGAS, MLflow, Fast Metrics, per-question artifact logging |
| Embeddings | BAAI/bge-base-en-v1.5, BAAI/bge-reranker-base, SentenceTransformers |
| Graph | Citation graphs, semantic graphs, hybrid BFS graph expansion |
| Training | PyTorch, LoRA/QLoRA fine-tuning, Char-CNN OCR models |
| Routing | Semantic OOD routing, domain centroid calibration, retrieval support probes |
Backend & Infrastructure
| Domain | Technologies |
|---|---|
| APIs | FastAPI, Uvicorn, Gunicorn, SSE streaming, JWT auth (Argon2), SlowAPI |
| Databases | PostgreSQL (SQLAlchemy, Alembic, NullPool/PgBouncer), SQLite, MongoDB |
| Caching | Redis (multi-layer: exact + semantic + retrieval + decomposition + intent) |
| Messaging | Celery, distributed task queues |
| Monitoring | Prometheus (ASGI middleware), Grafana dashboards, structured health/readiness |
| Experiment Tracking | MLflow (artifact logging, reproducible benchmarks, evaluation runs) |
DevOps & Deployment
| Domain | Technologies |
|---|---|
| Containers | Docker, Docker Compose, Nginx reverse proxy |
| Cloud | AWS, Render, Fly.io, VPS |
| CI/CD | GitHub Actions (in roadmap), systemd, cron scheduling |
| Security | JWT Bearer, Argon2 hashing, rate limiting, cookie management |
Automation & Frontend
| Domain | Technologies |
|---|---|
| Browser Automation | Playwright, Selenium (anti-detection, human-like simulation) |
| Scraping | BeautifulSoup, Scrapy, GROBID, pdfplumber |
| Frontend | React, Tailwind CSS, Vite, Streamlit |
| Data | pandas, NumPy, PDF/OCR pipelines |
April 2025 – December 2025 | Faridabad, Haryana (Remote)
Embedded in a product engineering team building AI-powered SaaS tools for enterprise clients across real estate, HR, and customer service verticals.
- Reduced query resolution time by 35% by engineering AI-powered enterprise chatbots with context-aware routing and domain-grounded retrieval, replacing static FAQ workflows with dynamic LLM-backed pipelines.
- Cut manual operational effort by 50–60% by designing end-to-end Agentic AI workflows — multi-step reasoning chains with tool use, structured output, and human-in-the-loop confirmation gates.
- Accelerated B2B SaaS delivery cycles by 2× by building modular, containerized service components (FastAPI + Docker + PostgreSQL) that could be configured per client without source-level changes.
- Achieved 95%+ accuracy on an automated real estate audit system that processed multi-document packages — reducing human review turnaround from days to minutes through LLM-based extraction and validation.
Most AI research assistants hallucinate citations, lose context on complex questions, and fall apart outside a notebook. AetherCV is built to solve exactly those failures — a fully deployed research engine that answers deep computer vision questions by finding and synthesizing evidence across 238 academic papers, with zero hallucinations measured across all benchmark runs.
The core idea: instead of plain vector search, AetherCV builds a knowledge graph of how papers cite and relate to each other. When you ask "How did DETR influence later transformer-based detectors?", the system doesn't just find similar text — it traverses citation lineages, surfaces method descendants, and assembles a grounded answer from the actual evidence trail.
Python 3.11 FastAPI FAISS BM25 Redis PostgreSQL Prometheus Grafana MLflow Docker Nginx
What it does:
- Accepts natural language research queries and routes them intelligently — simple factual questions go through fast retrieval, complex multi-paper comparisons go through a decomposition + graph expansion pipeline, and off-topic queries are rejected before any compute runs
- Retrieves relevant passages using both semantic (FAISS) and keyword (BM25) search, fuses results, reranks when needed, then expands context by traversing a hybrid citation + semantic graph
- Synthesizes grounded answers that cite only what was actually retrieved — no hallucinated paper IDs, no invented claims
- Tracks every benchmark run with MLflow and exposes live metrics through a Prometheus + Grafana observability stack
Results that matter:
| What was measured | Result |
|---|---|
| Hallucination rate across all benchmark runs | 0% |
| Answers grounded in retrieved evidence | 100% |
| Context recall (RAGAS evaluation) | 0.94 / 1.0 |
| End-to-end latency (CPU-only, live traffic) | ~4.9s p99 |
| Cache hit rate on repeat/paraphrased queries | 100% |
| Papers in the knowledge base | 238 across 12,288 indexed chunks |
What makes it non-trivial to build: Seven independent Redis cache layers (exact match, semantic/paraphrase, retrieval, decomposition, intent) mean the LLM is never called twice for equivalent queries. The graph expansion uses adaptive latency budgeting so traversal never becomes a tail-latency bottleneck. The router uses a multi-signal OOD decision — domain centroid, cluster similarity, retrieval support probe, and entity shape — not a single threshold, which is why it has zero false-positive escapes in production. The whole system runs on CPU-only infrastructure with no GPU dependency.
Sales and growth teams spend hours manually finding, researching, and qualifying leads — most of which turn out to be a poor fit. LeadBoost SaaS replaces that manual loop with a multi-tenant platform that discovers leads through distributed web scraping, enriches each one with LLM-powered company and contact intelligence, and surfaces a scored, prioritized shortlist ready for outreach.
LangChain Playwright Celery FastAPI PostgreSQL React Docker
What it does:
- Runs distributed Playwright scrapers on Celery workers to discover leads at scale across multiple sources simultaneously — designed to scale horizontally as workloads grow
- Enriches each discovered lead through a LangChain + Groq pipeline: semantic entity extraction, company profiling, and contact qualification that goes well beyond simple field parsing
- Scores every lead with a normalized relevance score combining LLM semantic analysis and structured metadata signals, so teams work the highest-probability prospects first
- Serves everything through a JWT-authenticated, multi-tenant FastAPI backend with role-based access control and a React dashboard showing enrichment status, scores, and pipeline analytics in real time
Impact: Automated lead generation pipeline processing 1,000+ leads/day — fully containerized with Docker Compose for one-command deployment.
Classical Ayurvedic literature spans thousands of pages of Sanskrit manuscripts — rich with medical knowledge, but locked behind a script that standard OCR tools cannot reliably process. AyurGenix makes this corpus searchable and conversational: a RAG platform that lets practitioners and researchers ask natural language questions and receive answers grounded in specific source passages, with citations.
FastAPI PyTorch Pinecone LangChain Docker
What it does:
- Processes Sanskrit-script documents through a custom Char-CNN OCR model built specifically for the script's ligature-heavy glyphs and diacritical marks — handling degraded historical document quality that off-the-shelf OCR fails on entirely
- Indexes the extracted text into Pinecone for semantic retrieval, enabling sub-second search across 10,000+ pages of structured and unstructured classical texts
- Reranks candidate passages with a cross-encoder and generates citation-grounded responses — every answer traces back to a specific source location in the corpus
- Powers multi-turn consultations through a LLaMA-3 conversational layer with long-term memory, so context is maintained across an entire session rather than resetting each turn
Impact: Sub-second retrieval across a 10,000+ page corpus with traceable citations — making a historically inaccessible knowledge base practically usable for the first time.
Job hunting at scale means doing the same thing hundreds of times: search, filter, score, apply. TalentForge AI automates that entire loop — from discovering LinkedIn listings to scoring them against your resume with an LLM, filtering out poor fits, and submitting Easy Apply applications through browser automation — while keeping you in control of every step.
The design priority throughout was safety and reliability: every action is rate-limited, every application is logged with its reason, and nothing touches LinkedIn until the system has passed your resume through an AI scoring gate.
Python 3.11 Playwright Streamlit SQLite Groq LangChain FastAPI
What it does:
- Scrapes LinkedIn for jobs matching your keywords, then scores each one against your full resume using an LLM — 70% semantic match weight, 30% keyword alignment — producing a normalized 0–1 relevance score per listing
- Applies automatically to jobs above your score threshold through headless Playwright browser automation with human-like behavior (randomized typing speed, mouse patterns, browser fingerprints)
- Tracks every job through a strict state machine —
DISCOVERED → SCORED → QUEUED → APPLIED / SKIPPED / FAILED— with the reason for every skip or failure persisted to SQLite - Surfaces everything in a real-time Streamlit dashboard: daily application trends, match score distribution, company breakdown, and a filterable application table with one-click CSV export
Built-in safety that makes it production-usable:
- Hard daily application cap enforced at the database level — server restarts never double-count
- Dry-run mode runs the complete pipeline (discovery → scoring → filtering) without touching LinkedIn, so you can validate behavior before going live
- LLM provider fallback chain (Groq → OpenRouter → HuggingFace → keyword-only) means the pipeline never breaks entirely, even without API credentials
- Clean six-layer architecture (Controller → Service → Storage → Model → Platform → Interface) means adding Indeed or Glassdoor is a single new class implementing an abstract interface — no changes to orchestration logic
Indian Institute of Information Technology and Management, Gwalior B.Tech in Computer Science | Graduated 2025
I'm open to collaborating on ambitious AI engineering projects, discussing production RAG and agentic system design, or exploring full-time opportunities in AI engineering, platform engineering, or ML infrastructure.