Probabilistic modeling · Stochastic systems · Computational mathematics
I work at the intersection of applied probability, statistical learning, and computational methods — with a focus on building mathematically grounded models of complex, uncertainty-driven systems.
| Area | Focus |
|---|---|
| Stochastic Systems | Probability theory, mathematical statistics, discrete/continuous-time stochastic processes |
| Probabilistic Forecasting | Count process modeling, calibrated uncertainty quantification, extreme-event prediction |
| Statistical Learning | Inference under model misspecification, hybrid neural-statistical architectures |
| Computational Mathematics | Numerically stable, reproducible implementations of probabilistic models |
Long-term direction: quantitative modeling at the intersection of stochastic processes, probabilistic inference, and high-dimensional statistical learning — with applications in quantitative finance and risk-driven systems.
End-to-end probabilistic pipeline for spatial-temporal earthquake occurrence modeling.
Core contributions:
- Demonstrated via likelihood-ratio test with boundary correction (
$p < 10^{-179}$ ) that the Poisson assumption is systematically violated in Central Asia seismic data (2010–2024) - Designed EarthquakeNet: per-cell overdispersion estimation via spatial embeddings + MLP, replacing the standard global-α negative binomial assumption
- Walk-forward evaluation (2018–2023): 8.6% reduction in mean pinball deviation vs. NB-GLM baseline; 12.5% lower CRPS in the tail regime (Y ≥ 5)
- Full reproducible pipeline: data download → feature engineering → training → reporting, one-command rerun
Python PyTorch NumPy SciPy Stochastic Processes Count Models Spatial-Temporal
Research pipeline for predicting IELTS-style essay band scores (0–9).
Core contributions:
- Engineered 21 linguistic features (lexical diversity, syntactic complexity, coherence proxies) as a structured tabular representation of essay quality
- Designed a hybrid DeBERTa regressor jointly processing transformer embeddings and tabular features
- Topic-grouped cross-validation with out-of-fold evaluation to prevent topic-level data leakage — a common failure mode in AES benchmarks
- Benchmarked classical ML baselines (XGBoost, LightGBM, Ridge) against the neural model; full reproducible pipeline via single-command scripts
Python PyTorch HuggingFace DeBERTa XGBoost NLP Feature Engineering
Modular asynchronous pipeline for automated long-form content generation and multimodal orchestration.
- Async data ingestion, LLM-driven narrative structuring, TTS synthesis, image generation, and video rendering
- Cost-controlled LLM orchestration (Claude / Gemini APIs) with config-driven reproducibility
Python Asyncio Claude API Gemini API Systems Design Multimodal AI
Mathematics:
Measure Theory · Probability Theory · Stochastic Processes · Statistics · Linear Algebra · Functional Analysis · Optimization · Numerical Methods
Engineering:
Languages:
Python · C++ · SQL · Bash · Java
ML / DL:
PyTorch · NumPy · SciPy · Pandas · HuggingFace Transformers · Scikit-learn · XGBoost · LightGBM · CatBoost
Tools:
Git · Docker · Linux · LaTeX
My work is grounded in a probability-first view of complex systems — where uncertainty is not treated as noise to be eliminated, but as a fundamental object of study.
Rather than black-box or purely deterministic approaches, I aim to understand and model the underlying generative mechanisms of data through rigorous probabilistic frameworks.
This connects:
- stochastic processes and time-evolving systems
- calibrated probabilistic inference and statistical learning
- numerically stable computational implementations
The goal is a consistent bridge between rigorous probability theory, computational mathematics, and modern data-driven modeling — applied to systems where getting the uncertainty right is as important as getting the point estimate right.