Skip to content
View sanjaykmenon's full-sized avatar

Block or report sanjaykmenon

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sanjaykmenon/README.md

Hi, I am Sanjay.

I work on AI systems for documents, search, and data. Most of my work is about turning messy files into answers people can check.

I write notes at sanjay.app. I keep the longer project details there.

Current work

I have been working on document extraction systems that combine OCR, large language models, search, and review tools. OCR is the step that turns scanned pages into text. The rest of the system has to decide what the text means, store it in a useful shape, and show where the system may be wrong.

The work includes:

  • Reading PDFs and checking text quality page by page.
  • Comparing OCR tools, including embedded PDF text, Tesseract, RapidOCR, and PaddleOCR.
  • Using large language models to extract structured data from long documents.
  • Designing schemas so the output can be tested and searched.
  • Building review flows for hard cases and model mistakes.
  • Running jobs locally, through hosted model APIs, and on cloud GPUs.
  • Tracking cost, speed, and repeatability across model choices.

I care about the parts around the model. A useful system needs test sets, review screens, error logs, and a way to rerun the same job when the prompt or model changes.

Technologies

Python, PostgreSQL, AWS, Docker, TypeScript, Next.js, OpenAI APIs, local models, vector search, dbt, Dagster, and Spark.

More

More notes and project writeups are at sanjay.app.

Pinned Loading

  1. aws_de aws_de Public

    aws_de_project repo

    Python