Skip to content
View CodeDaoVietNam's full-sized avatar

Block or report CodeDaoVietNam

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
CodeDaoVietNam/README.md

Hi, I'm Nguyα»…n Đức TiαΊΏn πŸ‘‹

3rd-year CS Student @ HCMUS Β· Data Science & AI Automation Β· Building AI-ready data pipelines and end-to-end ML systems


🧠 About Me

I'm a 3rd-year Computer Science student at Ho Chi Minh University of Science (HCMUS), focused on building practical AI and data systems that connect raw data, analytics, machine learning, and automation.

I enjoy working on projects that go beyond notebooks: collecting and cleaning data, designing feature pipelines, building APIs, creating dashboards, containerizing applications, and turning models into usable products.

  • πŸ”­ Recent focus: AI-ready data pipelines, analytics agents, realtime monitoring systems, and ML applications
  • 🌱 Currently learning: MLOps, CI/CD, data engineering fundamentals, LLM tool-calling, and scalable model serving
  • πŸ’‘ Interested in: Applied ML, AI Automation, Data Science, NLP, Explainable AI, and production-oriented AI systems
  • 🎯 Next goal: Land a Data Science / AI / Automation Internship in 2026

πŸ› οΈ Tech Stack

Programming & Data

Python SQL Pandas NumPy Plotly

Machine Learning & AI

Scikit-learn XGBoost LightGBM PyTorch HuggingFace SHAP

Backend, Deployment & Automation

FastAPI Docker GitHub Actions Streamlit React

Databases & Tools

PostgreSQL SQLite Kafka Google Colab


πŸš€ Featured Projects

An AI-assisted analytics platform that allows users to upload tabular datasets, profile data quality, generate KPI dashboards, and ask grounded natural-language questions over data.

  • Built CSV/XLS/XLSX upload, dataset profiling, missing value checks, duplicate detection, column type inference, and metadata storage.
  • Implemented semantic column mapping for business roles such as revenue, date, category, quantity, profit, campaign, and department.
  • Developed deterministic Pandas-based analysis tools with an AI Copilot layer for reliable data question answering.
  • Designed a production-style structure with FastAPI backend, React dashboard, Docker Compose, tests, scripts, and documentation.

FastAPI Pandas React SQLite Docker AI Copilot Data Quality Analytics Dashboard


A near real-time environmental intelligence system for monitoring AQI and weather data across Vietnam, combining data pipelines, analytics, forecasting, anomaly detection, and dashboard reporting.

  • Designed an end-to-end data flow from data collection, streaming, storage, analytics, forecasting, insight generation, and reporting.
  • Built dashboard views for AQI monitoring, maps, alerts, forecast trends, province comparison, and environmental insights.
  • Integrated a realtime-oriented architecture using API services, streaming concepts, time-series storage, and frontend visualization.
  • Focused on moving beyond passive monitoring by adding short-term prediction, anomaly detection, and explainable insights.

Python FastAPI React Kafka TimescaleDB AQI Forecasting Anomaly Detection


A fashion e-commerce forecasting project focused on predicting daily Revenue and COGS using multi-source business data.

  • Processed multi-source datasets including orders, payments, products, customers, promotions, inventory, web traffic, shipping, returns, reviews, and geography.
  • Built feature engineering workflows for time-series forecasting while avoiding data leakage.
  • Conducted revenue trend analysis, seasonality analysis, promotion impact analysis, inventory risk analysis, and RFM customer segmentation.
  • Compared multiple forecasting approaches including statistical baselines, LightGBM, XGBoost, CatBoost, Prophet, and explainability techniques.

Pandas Time Series Feature Engineering LightGBM XGBoost CatBoost Prophet SHAP


An end-to-end cardiovascular risk assessment system with ML inference, explainability, OCR, LLM-assisted report interpretation, and containerized deployment.

  • Built a machine learning pipeline using preprocessing, feature engineering, XGBoost training, cross-validation, and model evaluation.
  • Integrated SHAP explainability to make predictions more interpretable.
  • Developed a FastAPI backend and Streamlit frontend for interactive prediction and report generation.
  • Added Docker-based deployment, API validation, logging, and GitHub Actions CI/CD workflow.

XGBoost SHAP FastAPI Streamlit Docker GitHub Actions Gemini API EasyOCR


An automated NLP pipeline for parsing LaTeX scientific papers and matching in-text citations with bibliography references.

  • Built a one-command pipeline for paper parsing, reference extraction, candidate generation, feature engineering, model training, prediction, and evaluation.
  • Combined TF-IDF, Jaccard similarity, Sentence-BERT embeddings, and Random Forest classification.
  • Engineered 44 features and evaluated ranking performance using MRR and Recall@K.
  • Improved citation matching performance over a TF-IDF heuristic baseline.

Python NLP Sentence-BERT TF-IDF Random Forest LaTeX Parsing Automation Pipeline


A Vietnamese hotel recommendation system that matches user preferences with hotel features using content-based filtering, semantic search, and location-aware matching.

  • Built a recommendation workflow using hotel attributes, facilities, location information, text features, and user preferences.
  • Implemented semantic search and Named Entity Recognition to understand Vietnamese search queries.
  • Processed and transformed hotel data for downstream recommendation tasks.
  • Designed a user-friendly search experience for Vietnamese travelers.

Recommendation System Semantic Search NER Content-Based Filtering Pandas Scikit-learn


πŸ“Š GitHub Stats


πŸ“š Currently Learning

  • βš™οΈ MLOps: experiment tracking, model registry, CI/CD, containerized model serving
  • 🧠 LLM Engineering: tool calling, structured output, RAG, evaluation, and hallucination control
  • πŸ—οΈ Data Engineering Basics: data pipelines, validation, orchestration, warehouse/lakehouse concepts
  • πŸš€ Production AI Systems: FastAPI, Docker, monitoring, scalable inference, and automation workflows

🀝 Open To

  • πŸ’Ό Data Science / AI / Automation internship opportunities
  • 🀝 Collaborating on ML, NLP, analytics, and AI automation projects
  • πŸ’¬ Discussing ideas around applied ML, data pipelines, explainable AI, and production-ready AI systems

Feel free to reach out via LinkedIn or email!


Profile views

Pinned Loading

  1. Parsing_And_Matching-References-ArXiv Parsing_And_Matching-References-ArXiv Public

    An automated end-to-end pipeline for knowledge extraction from LaTeX scientific articles. The system integrates hierarchical parsing and Citation Matching by leveraging a hybrid approach of Random …

    Jupyter Notebook

  2. British_Airway_Analysis British_Airway_Analysis Public

    This project aims to analyze flight review data from British Airways

    Jupyter Notebook

  3. Recommend_Hotel_Content_Based Recommend_Hotel_Content_Based Public

    Smart Hotel Recommendation System tailored for Vietnamese Travelers. Powered by Hybrid AI (Semantic Search + NER) and Modern Web Technologies.

    Jupyter Notebook

  4. Cleveland_Heart_Disease_Diagnosis Cleveland_Heart_Disease_Diagnosis Public

    End-to-end cardiovascular risk assessment system using XGBoost, SHAP (Explainable AI), Google Gemini LLM, and EasyOCR. Built with a Microservices architecture (FastAPI + Streamlit), containerized w…

    Python

  5. REIS REIS Public

    Realtime Environmental Intelligence System β€” Vietnam AQI Monitoring

    Jupyter Notebook 1 1

  6. AI_DATA_ANALYST_AGENT AI_DATA_ANALYST_AGENT Public

    AI Data Analyst Agent turns CSV/XLS/XLSX files into smart dashboards, data quality reports, charts, and AI-assisted answers. It uses FastAPI, Pandas, React, semantic mapping, custom metrics, and Ol…

    Jupyter Notebook