3rd-year CS Student @ HCMUS Β· Data Science & AI Automation Β· Building AI-ready data pipelines and end-to-end ML systems
I'm a 3rd-year Computer Science student at Ho Chi Minh University of Science (HCMUS), focused on building practical AI and data systems that connect raw data, analytics, machine learning, and automation.
I enjoy working on projects that go beyond notebooks: collecting and cleaning data, designing feature pipelines, building APIs, creating dashboards, containerizing applications, and turning models into usable products.
- π Recent focus: AI-ready data pipelines, analytics agents, realtime monitoring systems, and ML applications
- π± Currently learning: MLOps, CI/CD, data engineering fundamentals, LLM tool-calling, and scalable model serving
- π‘ Interested in: Applied ML, AI Automation, Data Science, NLP, Explainable AI, and production-oriented AI systems
- π― Next goal: Land a Data Science / AI / Automation Internship in 2026
An AI-assisted analytics platform that allows users to upload tabular datasets, profile data quality, generate KPI dashboards, and ask grounded natural-language questions over data.
- Built CSV/XLS/XLSX upload, dataset profiling, missing value checks, duplicate detection, column type inference, and metadata storage.
- Implemented semantic column mapping for business roles such as revenue, date, category, quantity, profit, campaign, and department.
- Developed deterministic Pandas-based analysis tools with an AI Copilot layer for reliable data question answering.
- Designed a production-style structure with FastAPI backend, React dashboard, Docker Compose, tests, scripts, and documentation.
FastAPI Pandas React SQLite Docker AI Copilot Data Quality Analytics Dashboard
A near real-time environmental intelligence system for monitoring AQI and weather data across Vietnam, combining data pipelines, analytics, forecasting, anomaly detection, and dashboard reporting.
- Designed an end-to-end data flow from data collection, streaming, storage, analytics, forecasting, insight generation, and reporting.
- Built dashboard views for AQI monitoring, maps, alerts, forecast trends, province comparison, and environmental insights.
- Integrated a realtime-oriented architecture using API services, streaming concepts, time-series storage, and frontend visualization.
- Focused on moving beyond passive monitoring by adding short-term prediction, anomaly detection, and explainable insights.
Python FastAPI React Kafka TimescaleDB AQI Forecasting Anomaly Detection
A fashion e-commerce forecasting project focused on predicting daily Revenue and COGS using multi-source business data.
- Processed multi-source datasets including orders, payments, products, customers, promotions, inventory, web traffic, shipping, returns, reviews, and geography.
- Built feature engineering workflows for time-series forecasting while avoiding data leakage.
- Conducted revenue trend analysis, seasonality analysis, promotion impact analysis, inventory risk analysis, and RFM customer segmentation.
- Compared multiple forecasting approaches including statistical baselines, LightGBM, XGBoost, CatBoost, Prophet, and explainability techniques.
Pandas Time Series Feature Engineering LightGBM XGBoost CatBoost Prophet SHAP
An end-to-end cardiovascular risk assessment system with ML inference, explainability, OCR, LLM-assisted report interpretation, and containerized deployment.
- Built a machine learning pipeline using preprocessing, feature engineering, XGBoost training, cross-validation, and model evaluation.
- Integrated SHAP explainability to make predictions more interpretable.
- Developed a FastAPI backend and Streamlit frontend for interactive prediction and report generation.
- Added Docker-based deployment, API validation, logging, and GitHub Actions CI/CD workflow.
XGBoost SHAP FastAPI Streamlit Docker GitHub Actions Gemini API EasyOCR
An automated NLP pipeline for parsing LaTeX scientific papers and matching in-text citations with bibliography references.
- Built a one-command pipeline for paper parsing, reference extraction, candidate generation, feature engineering, model training, prediction, and evaluation.
- Combined TF-IDF, Jaccard similarity, Sentence-BERT embeddings, and Random Forest classification.
- Engineered 44 features and evaluated ranking performance using MRR and Recall@K.
- Improved citation matching performance over a TF-IDF heuristic baseline.
Python NLP Sentence-BERT TF-IDF Random Forest LaTeX Parsing Automation Pipeline
A Vietnamese hotel recommendation system that matches user preferences with hotel features using content-based filtering, semantic search, and location-aware matching.
- Built a recommendation workflow using hotel attributes, facilities, location information, text features, and user preferences.
- Implemented semantic search and Named Entity Recognition to understand Vietnamese search queries.
- Processed and transformed hotel data for downstream recommendation tasks.
- Designed a user-friendly search experience for Vietnamese travelers.
Recommendation System Semantic Search NER Content-Based Filtering Pandas Scikit-learn
- βοΈ MLOps: experiment tracking, model registry, CI/CD, containerized model serving
- π§ LLM Engineering: tool calling, structured output, RAG, evaluation, and hallucination control
- ποΈ Data Engineering Basics: data pipelines, validation, orchestration, warehouse/lakehouse concepts
- π Production AI Systems: FastAPI, Docker, monitoring, scalable inference, and automation workflows
- πΌ Data Science / AI / Automation internship opportunities
- π€ Collaborating on ML, NLP, analytics, and AI automation projects
- π¬ Discussing ideas around applied ML, data pipelines, explainable AI, and production-ready AI systems
Feel free to reach out via LinkedIn or email!