Skip to content

DataStories-UniPi/MAT-clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Multiple Aspect Trajectory Clustering using Deep Learning Techniques

MATClus is a framework for learning fixed-length deep representations of Multiple Aspect Trajectories (MAT) that lead to the discovery of effective MAT clusters. The pipeline processes raw trajectory data, extracts behavioral features using sliding windows, learns compact vector representations, and applies clustering algorithms. It converts trajectories into behavior sequences, normalizes them, learns trajectory embeddings, and groups trajectories with similar movement patterns.


What’s inside

The repository contains a full pipeline for trajectory processing and clustering:

1. Trajectory preprocessing

  • completeTrajectories()
    Converts raw trajectory records into differential trajectory components (distance, heading change, etc.).

2. Spatiotemporal window analysis

  • stwindow_interval()
    Splits trajectories into sliding temporal windows and calculates spatial–temporal interaction scores.

3. Feature engineering

  • computeFeas()
    Computes motion features such as:

  • movement rate

  • directional change

  • curvature change

4. Behavior sequence generation

  • generate_behavior_sequences()
    Converts trajectory features into window-based behavior sequences.

  • generate_normal_behavior_sequence()
    Applies quantile normalization to the behavior vectors.

5. Trajectory embeddings

  • trajectory2Vec_torch()
    Learns trajectory embeddings using a PyTorch LSTM autoencoder.

6. Clustering

The framework supports multiple clustering approaches:

  • matric_kmeans_clustering() – KMeans clustering
  • dbscan_clustering() – density-based clustering
  • spatiotemporal_hac() – hierarchical clustering

7. Optional projection

Embeddings can be projected to 2D for clustering using:

  • PCA
  • t-SNE

Data model

The system expects trajectory data in a CSV-style file:


synthetic_vector.out

Each row represents a trajectory observation:

Field Description
trajectory_id ID of the trajectory
timestamp observation time
x x coordinate
y y coordinate
rk rotation / heading
kx x component of movement vector
ky y component of movement vector

Example:


traj_id, timestamp, x, y, rk, kx, ky
1, 10.0, 52.1, 13.2, 0.1, 0.02, 0.01
1, 20.0, 52.2, 13.3, 0.15, 0.03, 0.02
2, 11.0, 51.9, 13.0, 0.2, 0.01, 0.04


Intermediate artifacts

The pipeline stores intermediate results in the data directory:


simulated_data/
│
├─ sim_trajectories_complete
├─ sim_trajectories_feas
├─ sim_behavior_sequences
├─ sim_normal_behavior_sequences
├─ trajectory_distances
└─ encodings/
└─ traj_vec_lstm_final_state_torch.pkl


Minimal usage

Example minimal pipeline:

from MATClus import *

data_dir = "./simulated_data"

completeTrajectories(data_dir)
stwindow_interval(data_dir)

computeFeas(data_dir)

generate_behavior_sequences(data_dir)
generate_normal_behavior_sequence(data_dir)

trajectory2Vec_torch(data_dir)

result = matric_kmeans_clustering(
    data_dir,
    n_clusters=3
)

print(result["labels"])

This will:

  1. Load trajectory data
  2. Generate behavioral windows
  3. Normalize features
  4. Train the LSTM autoencoder
  5. Produce trajectory embeddings
  6. Cluster trajectories

Output

Clustering functions return a dictionary:

{
  "method": "kmeans",
  "labels": array([...]),
  "coords": array([...]),   # 2D projection (optional)
  "embeddings": array([...]),
  "meta": {...}
}
  • labels → cluster assignment for each trajectory
  • coords → 2D projection for visualization
  • embeddings → learned trajectory vectors

Synthetic dataset format

This dataset contains 2D trajectory points grouped into trajectories (tracks). Each row represents one observation (one point) along a trajectory with time, coordinates, rating, keyword coordinates and keywords.

File structure

  • One row = one timestep for one trajectory
  • Rows belonging to the same trajectory share the same trajectory_id
  • Within a trajectory, timestep is expected to be monotonic (typically increasing)

Columns (CSV)

Each line has 8 comma-separated fields:

  • trajectory_id (int) Identifier of the trajectory / track (e.g., 0).
  • time (int) Time index or frame number within the trajectory (e.g., 29, 54, 97).
  • x (float) X coordinate in the dataset’s coordinate system.
  • y (float) Y coordinate in the dataset’s coordinate system.
  • rating (float) Social rating.
  • kx (float) Keyword X coordinate.
  • ky (float) Keyword Y coordinate.
  • keyword (string) related keywords (e.g., industry, complex, …).

Example

trajectory_id,time, x, y, rating, kx, ky
0,29,34037.70761806609,9833.138607576757,8.4,3.206948757171631,34.005638122558594,industry
0,54,34782.25592839948,9270.427745288769,9.6,-9.44599723815918,16.407695770263672,complex
0,97,36103.710020125865,8309.520759541527,9.1,-19.66252326965332,-37.67576599121094,delicious

Citation

If you use this code or reproduce these results, please cite: F.Gryllakis, N. Pelekis, C. Doulkeridis and Y. Theodoridis, "Multiple Aspect Trajectory Clustering using Deep Learning Techniques", 2026. (Submitted for publication.)

About

Multiple Aspect Trajectory Clustering using Deep Learning Techniques

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages