University coursework — data & ML projects

This repository collects assignments and mini-projects from my university courses in data science, machine learning, and analytics. Each folder entry is a self-contained notebook or script: exploratory analysis, models, and sometimes dashboards or pipelines tied to real (or realistic) datasets.

flowchart LR
  subgraph themes [What you will find here]
    A[Clustering & anomaly detection]
    B[NLP & sentiment]
    C[Big data & pipelines]
    D[Regression & forecasting]
    E[Visualization & BI]
  end
  A --> Net[Network firewall logs]
  B --> Rev[Hotel reviews]
  C --> Mongo[Dask / MongoDB]
  D --> Fire[Weather & wildfire risk]
  E --> Asy[Asylum seekers — Tableau]

Projects at a glance

Project	Focus	Stack (high level)
Anomaly detection in network security	Unsupervised learning on firewall-style traffic logs	`pandas`, `scikit-learn` (K-Means, DBSCAN, PCA), `seaborn` / `matplotlib`
Hotel reviews — sentiment & scraping	Text data, labeling, and optional web scraping	`pandas`, Kaggle data, Selenium (Agoda), classical ML in `hotels0.py`
Sentiment analysis (LSTM, PyTorch)	Deep learning for review sentiment	`PyTorch`, LSTM, plus optional Dask + MongoDB pipeline in the notebook
Wildfire area predictions	Regression on weather features + multi-region forecast API	`pymongo`, Weatherbit API, `sklearn` linear regression, Plotly treemap
Asylum seekers — multiview dashboard	Exploratory / policy-oriented visualization	Tableau (exported view below)

Anomaly detection in network security

Goal: Treat network log features (ports, bytes, packets, actions) as vectors, normalize and encode them, then cluster traffic to surface unusual groups.

What it does:

Loads a CSV of log-like records (log2.csv in the original workflow), checks consistency (e.g. totals vs. components), and explores the distribution of actions.
Applies MinMax scaling and one-hot encoding where needed, then uses K-Means (with elbow and PCA 2D plots) and DBSCAN for density-based clusters.
Reports silhouette and Davies–Bouldin style diagnostics to compare clustering quality.

This is a typical unsupervised learning lab: you interpret clusters rather than predicting a single “attack” label from a static dataset.

Hotel reviews (`hotels.ipynb` / `hotels0.py`)

Goal: Work with large-scale review text: combine positive and negative review columns from a public hotel dataset, optionally augment with Agoda reviews scraped via Selenium, and build features for sentiment-related tasks.

What it does:

Merges positive and negative reviews into one table with a binary positive / negative indicator.
Demonstrates an end-to-end path from CSV discovery to targeted scraping (paths and drivers are environment-specific in the script—adjust before running).

Use this as a template for NLP data prep and weak supervision from ratings.

Sentiment analysis — LSTM with PyTorch (`sentiment-analysis-lstm-pytorch.ipynb`)

Goal: Classify sentiment on review text using a neural model instead of bag-of-words alone.

What it does:

Trains an LSTM in PyTorch on review data.
The notebook also sketches a big-data style path: load CSV with Dask, push to MongoDB with dask-mongo, and read back for analysis (connection strings in the notebook must be configured locally).

Wildfire predictions (`wildfire predictions.py`)

Goal: Relate weather and environmental inputs to estimated fire area, then use a trained linear regression model to score forecast weather from an API across Australian regions.

What it does:

Optionally pulls historical rows from MongoDB; merges with CSV-based training data.
Calls the Weatherbit daily forecast API for several lat/long points (NSW, NT, QLD, SA, TAS, VIC, WA).
Outputs predictions and a Plotly treemap of regional averages, with RMSE and R² in the title for a quick sanity check.

Note: API keys and Mongo URIs in scripts are placeholders or secrets—rotate keys and use environment variables before sharing or re-running publicly.

Visualization — asylum seekers (Tableau)

A multiview Tableau workbook summarizes asylum-related indicators; the snapshot below is the image exported from that dashboard.

This piece is pure visual analytics: filters, small multiples, and composition to compare categories and trends at a glance.

How to run these

Python: Use Python 3.x with the packages each script imports (pandas, scikit-learn, torch, etc.). Notebooks expect Jupyter or VS Code.
Data & secrets: Several projects assume local CSV files, API keys, or database URIs—configure those before execution.
Browsers / Selenium: Hotel scraping requires a matching ChromeDriver and valid URL list.

License & context

Coursework artifacts are shared for portfolio and learning purposes. If you reuse ideas or code, cite the course context and adapt credentials, data paths, and dependencies to your own environment.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Anomaly Detection in Network Security.ipynb		Anomaly Detection in Network Security.ipynb
Anomaly Detection in Network Security.py		Anomaly Detection in Network Security.py
Multiview Tableau Asylum Seekers.png		Multiview Tableau Asylum Seekers.png
README.md		README.md
hotels.ipynb		hotels.ipynb
hotels0.py		hotels0.py
sentiment-analysis-lstm-pytorch.ipynb		sentiment-analysis-lstm-pytorch.ipynb
wildfire predictions.py		wildfire predictions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

University coursework — data & ML projects

Projects at a glance

Anomaly detection in network security

Hotel reviews (`hotels.ipynb` / `hotels0.py`)

Sentiment analysis — LSTM with PyTorch (`sentiment-analysis-lstm-pytorch.ipynb`)

Wildfire predictions (`wildfire predictions.py`)

Visualization — asylum seekers (Tableau)

How to run these

License & context

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

University coursework — data & ML projects

Projects at a glance

Anomaly detection in network security

Hotel reviews (hotels.ipynb / hotels0.py)

Sentiment analysis — LSTM with PyTorch (sentiment-analysis-lstm-pytorch.ipynb)

Wildfire predictions (wildfire predictions.py)

Visualization — asylum seekers (Tableau)

How to run these

License & context

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Hotel reviews (`hotels.ipynb` / `hotels0.py`)

Sentiment analysis — LSTM with PyTorch (`sentiment-analysis-lstm-pytorch.ipynb`)

Wildfire predictions (`wildfire predictions.py`)

Packages