ANUPAM KUMAR anupamkr1708

╔══════════════════════════════════════════════════════════════════════════════╗
║   Building production AI systems that move beyond prototypes —              ║
║   Graph-RAG engines, agentic pipelines, and full-stack AI platforms         ║
║   with measurable reliability, end-to-end observability, and real impact.   ║
╚══════════════════════════════════════════════════════════════════════════════╝

About Me

I'm Anupam Kumar — an AI Engineer and CS graduate from IIITM Gwalior with a focus on building intelligent systems that are reliable enough to run in production, not just impressive enough to demo.

My work sits at the intersection of three disciplines I care about equally: AI systems (retrieval, reasoning, agents), platform engineering (APIs, distributed services, containerized deployment), and ML infrastructure (evaluation pipelines, experiment tracking, observability). I'm most effective when all three are in scope at once — designing a system end-to-end rather than handing off at a boundary.

Over the past few years I've worked across Graph-RAG architectures, multi-agent reasoning workflows, computer vision pipelines, LLM evaluation infrastructure, and backend systems serving real production traffic. I have a habit of taking ideas from research papers — Graph-RAG, hybrid retrieval fusion, RAGAS-based evaluation — and engineering them into deployable systems with measurable correctness guarantees, proper failure handling, and the observability needed to trust them in production.

A few things I hold as non-negotiable in the systems I build:

Nothing ships without evaluation. Every pipeline I build includes a benchmark loop — RAGAS, fast metrics, MLflow tracking — so quality is measurable, not assumed.
Observability is part of the design, not an afterthought. Prometheus instrumentation, health/readiness endpoints, and structured logging go in from day one.
Reliability means planning for failure. Cache corruption, LLM timeouts, cold-start latency spikes, connection pool exhaustion — these have all come up, and the systems I've built handle them gracefully.

I'm particularly interested in roles where the challenge is making AI systems dependable at scale — whether that means improving retrieval quality, reducing hallucination risk, tightening evaluation loops, building ML serving infrastructure, or designing the platform layer that other engineers build on top of.

🌐 Portfolio: anupai-portfolio.vercel.app · 📄 Open to: AI Engineering · Platform Engineering · MLOps · Backend Infrastructure

🧰 Technical Stack

AI & LLM Systems

Domain	Technologies
LLM Frameworks	LangChain, Groq, OpenRouter, HuggingFace Inference, Sentence Transformers
Retrieval	FAISS, BM25 (`rank-bm25`), RRF Fusion, Cross-Encoder Reranking
Vector Stores	FAISS, Pinecone, ChromaDB
Evaluation	RAGAS, MLflow, Fast Metrics, per-question artifact logging
Embeddings	`BAAI/bge-base-en-v1.5`, `BAAI/bge-reranker-base`, SentenceTransformers
Graph	Citation graphs, semantic graphs, hybrid BFS graph expansion
Training	PyTorch, LoRA/QLoRA fine-tuning, Char-CNN OCR models
Routing	Semantic OOD routing, domain centroid calibration, retrieval support probes

Backend & Infrastructure

Domain	Technologies
APIs	FastAPI, Uvicorn, Gunicorn, SSE streaming, JWT auth (Argon2), SlowAPI
Databases	PostgreSQL (SQLAlchemy, Alembic, NullPool/PgBouncer), SQLite, MongoDB
Caching	Redis (multi-layer: exact + semantic + retrieval + decomposition + intent)
Messaging	Celery, distributed task queues
Monitoring	Prometheus (ASGI middleware), Grafana dashboards, structured health/readiness
Experiment Tracking	MLflow (artifact logging, reproducible benchmarks, evaluation runs)

DevOps & Deployment

Domain	Technologies
Containers	Docker, Docker Compose, Nginx reverse proxy
Cloud	AWS, Render, Fly.io, VPS
CI/CD	GitHub Actions (in roadmap), systemd, cron scheduling
Security	JWT Bearer, Argon2 hashing, rate limiting, cookie management

Automation & Frontend

Domain	Technologies
Browser Automation	Playwright, Selenium (anti-detection, human-like simulation)
Scraping	BeautifulSoup, Scrapy, GROBID, pdfplumber
Frontend	React, Tailwind CSS, Vite, Streamlit
Data	pandas, NumPy, PDF/OCR pipelines

💼 Experience

AI Engineer Intern — Yahweh Innovations

April 2025 – December 2025 | Faridabad, Haryana (Remote)

Embedded in a product engineering team building AI-powered SaaS tools for enterprise clients across real estate, HR, and customer service verticals.

Reduced query resolution time by 35% by engineering AI-powered enterprise chatbots with context-aware routing and domain-grounded retrieval, replacing static FAQ workflows with dynamic LLM-backed pipelines.
Cut manual operational effort by 50–60% by designing end-to-end Agentic AI workflows — multi-step reasoning chains with tool use, structured output, and human-in-the-loop confirmation gates.
Accelerated B2B SaaS delivery cycles by 2× by building modular, containerized service components (FastAPI + Docker + PostgreSQL) that could be configured per client without source-level changes.
Achieved 95%+ accuracy on an automated real estate audit system that processed multi-document packages — reducing human review turnaround from days to minutes through LLM-based extraction and validation.

🚀 Featured Projects

🔬 AetherCV — Production Graph-RAG Research Engine

Most AI research assistants hallucinate citations, lose context on complex questions, and fall apart outside a notebook. AetherCV is built to solve exactly those failures — a fully deployed research engine that answers deep computer vision questions by finding and synthesizing evidence across 238 academic papers, with zero hallucinations measured across all benchmark runs.

The core idea: instead of plain vector search, AetherCV builds a knowledge graph of how papers cite and relate to each other. When you ask "How did DETR influence later transformer-based detectors?", the system doesn't just find similar text — it traverses citation lineages, surfaces method descendants, and assembles a grounded answer from the actual evidence trail.

Python 3.11 FastAPI FAISS BM25 Redis PostgreSQL Prometheus Grafana MLflow Docker Nginx

What it does:

Accepts natural language research queries and routes them intelligently — simple factual questions go through fast retrieval, complex multi-paper comparisons go through a decomposition + graph expansion pipeline, and off-topic queries are rejected before any compute runs
Retrieves relevant passages using both semantic (FAISS) and keyword (BM25) search, fuses results, reranks when needed, then expands context by traversing a hybrid citation + semantic graph
Synthesizes grounded answers that cite only what was actually retrieved — no hallucinated paper IDs, no invented claims
Tracks every benchmark run with MLflow and exposes live metrics through a Prometheus + Grafana observability stack

Results that matter:

What was measured	Result
Hallucination rate across all benchmark runs	0%
Answers grounded in retrieved evidence	100%
Context recall (RAGAS evaluation)	0.94 / 1.0
End-to-end latency (CPU-only, live traffic)	~4.9s p99
Cache hit rate on repeat/paraphrased queries	100%
Papers in the knowledge base	238 across 12,288 indexed chunks

What makes it non-trivial to build: Seven independent Redis cache layers (exact match, semantic/paraphrase, retrieval, decomposition, intent) mean the LLM is never called twice for equivalent queries. The graph expansion uses adaptive latency budgeting so traversal never becomes a tail-latency bottleneck. The router uses a multi-signal OOD decision — domain centroid, cluster similarity, retrieval support probe, and entity shape — not a single threshold, which is why it has zero false-positive escapes in production. The whole system runs on CPU-only infrastructure with no GPU dependency.

🚀 LeadBoost SaaS — AI Lead Intelligence Platform

Sales and growth teams spend hours manually finding, researching, and qualifying leads — most of which turn out to be a poor fit. LeadBoost SaaS replaces that manual loop with a multi-tenant platform that discovers leads through distributed web scraping, enriches each one with LLM-powered company and contact intelligence, and surfaces a scored, prioritized shortlist ready for outreach.

LangChain Playwright Celery FastAPI PostgreSQL React Docker

What it does:

Runs distributed Playwright scrapers on Celery workers to discover leads at scale across multiple sources simultaneously — designed to scale horizontally as workloads grow
Enriches each discovered lead through a LangChain + Groq pipeline: semantic entity extraction, company profiling, and contact qualification that goes well beyond simple field parsing
Scores every lead with a normalized relevance score combining LLM semantic analysis and structured metadata signals, so teams work the highest-probability prospects first
Serves everything through a JWT-authenticated, multi-tenant FastAPI backend with role-based access control and a React dashboard showing enrichment status, scores, and pipeline analytics in real time

Impact: Automated lead generation pipeline processing 1,000+ leads/day — fully containerized with Docker Compose for one-command deployment.

🌿 AyurGenix — Agentic RAG Platform for Ayurvedic Medicine

Classical Ayurvedic literature spans thousands of pages of Sanskrit manuscripts — rich with medical knowledge, but locked behind a script that standard OCR tools cannot reliably process. AyurGenix makes this corpus searchable and conversational: a RAG platform that lets practitioners and researchers ask natural language questions and receive answers grounded in specific source passages, with citations.

FastAPI PyTorch Pinecone LangChain Docker

What it does:

Processes Sanskrit-script documents through a custom Char-CNN OCR model built specifically for the script's ligature-heavy glyphs and diacritical marks — handling degraded historical document quality that off-the-shelf OCR fails on entirely
Indexes the extracted text into Pinecone for semantic retrieval, enabling sub-second search across 10,000+ pages of structured and unstructured classical texts
Reranks candidate passages with a cross-encoder and generates citation-grounded responses — every answer traces back to a specific source location in the corpus
Powers multi-turn consultations through a LLaMA-3 conversational layer with long-term memory, so context is maintained across an entire session rather than resetting each turn

Impact: Sub-second retrieval across a 10,000+ page corpus with traceable citations — making a historically inaccessible knowledge base practically usable for the first time.

🤖 TalentForge AI — Autonomous Job Application Platform

Job hunting at scale means doing the same thing hundreds of times: search, filter, score, apply. TalentForge AI automates that entire loop — from discovering LinkedIn listings to scoring them against your resume with an LLM, filtering out poor fits, and submitting Easy Apply applications through browser automation — while keeping you in control of every step.

The design priority throughout was safety and reliability: every action is rate-limited, every application is logged with its reason, and nothing touches LinkedIn until the system has passed your resume through an AI scoring gate.

Python 3.11 Playwright Streamlit SQLite Groq LangChain FastAPI

What it does:

Scrapes LinkedIn for jobs matching your keywords, then scores each one against your full resume using an LLM — 70% semantic match weight, 30% keyword alignment — producing a normalized 0–1 relevance score per listing
Applies automatically to jobs above your score threshold through headless Playwright browser automation with human-like behavior (randomized typing speed, mouse patterns, browser fingerprints)
Tracks every job through a strict state machine — DISCOVERED → SCORED → QUEUED → APPLIED / SKIPPED / FAILED — with the reason for every skip or failure persisted to SQLite
Surfaces everything in a real-time Streamlit dashboard: daily application trends, match score distribution, company breakdown, and a filterable application table with one-click CSV export

Built-in safety that makes it production-usable:

Hard daily application cap enforced at the database level — server restarts never double-count
Dry-run mode runs the complete pipeline (discovery → scoring → filtering) without touching LinkedIn, so you can validate behavior before going live
LLM provider fallback chain (Groq → OpenRouter → HuggingFace → keyword-only) means the pipeline never breaks entirely, even without API credentials
Clean six-layer architecture (Controller → Service → Storage → Model → Platform → Interface) means adding Indeed or Glassdoor is a single new class implementing an abstract interface — no changes to orchestration logic

📊 GitHub Activity — Last 12 Months

🎓 Education

Indian Institute of Information Technology and Management, Gwalior B.Tech in Computer Science | Graduated 2025

📫 Let's Connect

I'm open to collaborating on ambitious AI engineering projects, discussing production RAG and agentic system design, or exploring full-time opportunities in AI engineering, platform engineering, or ML infrastructure.

🌐 anupai-portfolio.vercel.app · 📧 anupsharma1708@gmail.com · 💼 LinkedIn · 🐙 @anupamkr1708

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ANUPAM KUMAR anupamkr1708

Block or report anupamkr1708

About Me

🧰 Technical Stack

💼 Experience

AI Engineer Intern — Yahweh Innovations

🚀 Featured Projects

🔬 AetherCV — Production Graph-RAG Research Engine

🚀 LeadBoost SaaS — AI Lead Intelligence Platform

🌿 AyurGenix — Agentic RAG Platform for Ayurvedic Medicine

🤖 TalentForge AI — Autonomous Job Application Platform

📊 GitHub Activity — Last 12 Months

🎓 Education

📫 Let's Connect

Pinned Loading

Uh oh!