Skip to content
View OmprakashSahani's full-sized avatar

Block or report OmprakashSahani

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
OmprakashSahani/README.md

Omprakash Sahani

ML Systems Engineer · Distributed Infrastructure · Performance Engineering


Portfolio LinkedIn Email GitHub

About

I build machine learning infrastructure from first principles, focusing on distributed training systems, transformer inference, observability, and performance engineering.

My work explores how communication overhead, memory scaling, synchronization cost, and inference latency shape real-world ML system behavior.


Core Infrastructure Projects

Project Focus
Atlas AI Distributed AI infrastructure platform for transformer systems, inference optimization, observability, and performance engineering
Distributed Training Profiler Systems profiler for communication overhead, scaling efficiency, memory bottlenecks, and ZeRO optimization analysis
Benchmark Guardian Automated benchmark regression detection platform with GitHub App integration and performance intelligence workflows
Distributed Training Simulator Data-parallel scaling simulation with all-reduce communication analysis
Autograd Engine Reverse-mode autodiff engine with dynamic computation graphs and scaling analysis
ML Reproducibility Auditor Systems-oriented auditor for reproducibility, engineering quality, and ML infrastructure signals

Engineering Philosophy

  • Measure before optimizing
  • Treat memory as a first-class constraint
  • Make trade-offs explicit
  • Design reproducible and observable systems

Current Focus

  • Transformer inference systems
  • Distributed runtime behavior
  • Communication and synchronization overhead
  • Memory-aware ML infrastructure
  • Benchmark automation and regression analysis
  • Observability for ML systems

Technical Focus

Area Technologies
Languages Python · C++
Infrastructure FastAPI · GitHub Apps · SQLite · CI/CD
ML Systems Distributed Training · Autograd · Transformers
Performance Profiling · Benchmarking · Memory Analysis
Systems Multiprocessing · Synchronization · Communication

Engineering Metrics








Pinned Loading

  1. atlas-ai atlas-ai Public

    Open ML systems platform for training, profiling, evaluating, and serving AI models.

    Python 1

  2. benchmark-guardian benchmark-guardian Public

    GitHub App for detecting benchmark regressions in pull requests.

    Python 1

  3. dist-training-profiler dist-training-profiler Public

    Distributed training profiler for analyzing compute, communication, memory, and scaling bottlenecks in ML training systems.

    Python 1

  4. ml-repro-audit ml-repro-audit Public

    ML Systems Reproducibility Auditor — Analyze GitHub repositories for reproducibility, benchmarking rigor, and distributed training design quality.

    Python 1

  5. autograd-engine autograd-engine Public

    A reverse-mode automatic differentiation engine built from scratch with benchmarks, gradient checks, and neural network training examples.

    Python

  6. distml-core distml-core Public

    A distributed training simulator for data-parallel learning, all-reduce, and communication vs computation trade-offs.

    Python