Sanjay Krishna sanjaykmenon

Hi, I am Sanjay.

I work on AI systems for documents, search, and data. Most of my work is about turning messy files into answers people can check.

I write notes at sanjay.app. I keep the longer project details there.

Current work

I have been working on document extraction systems that combine OCR, large language models, search, and review tools. OCR is the step that turns scanned pages into text. The rest of the system has to decide what the text means, store it in a useful shape, and show where the system may be wrong.

The work includes:

Reading PDFs and checking text quality page by page.
Comparing OCR tools, including embedded PDF text, Tesseract, RapidOCR, and PaddleOCR.
Using large language models to extract structured data from long documents.
Designing schemas so the output can be tested and searched.
Building review flows for hard cases and model mistakes.
Running jobs locally, through hosted model APIs, and on cloud GPUs.
Tracking cost, speed, and repeatability across model choices.

I care about the parts around the model. A useful system needs test sets, review screens, error logs, and a way to rerun the same job when the prompt or model changes.

Technologies

Python, PostgreSQL, AWS, Docker, TypeScript, Next.js, OpenAI APIs, local models, vector search, dbt, Dagster, and Spark.

More

More notes and project writeups are at sanjay.app.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sanjay Krishna sanjaykmenon

Achievements

Achievements

Block or report sanjaykmenon

Hi, I am Sanjay.

Current work

Technologies

More

Pinned Loading

Uh oh!