Skip to content

KhiopsLab/saxo

Repository files navigation

test versions

SAXO

Symbolic Aggregate approXimation Optimized with MODL

SAXO is a data-driven symbolic representation for time series. Unlike standard SAX which relies on equal-sized intervals and Gaussian distributions, SAXO optimizes both time and value discretization using a non-parametric Bayesian approach (MODL).

Install

You can install from the main branch of GitHub:

pip install git+https://github.com/KhiopsLab/saxo.git@main

Requirements:

  • khiops

Usage

Compute the SAXO representation of a datasets of time-series:

from aeon.datasets import load_gunpoint
from saxo.sklearn import SAXO

X, y = load_gunpoint()
saxo = SAXO(max_intervals=10, max_symbols=5).fit(X)
X_transformed = saxo.transform(X)
>>> X_transformed
array([['b', 'd', 'b', ..., 'b', 'a', 'b'],
       ['b', 'c', 'a', ..., 'b', 'a', 'b'],
       ['a', 'b', 'b', ..., 'a', 'd', 'b'],
       ...,
       ['a', 'b', 'b', ..., 'b', 'a', 'c'],
       ['c', 'a', 'd', ..., 'd', 'b', 'e'],
       ['c', 'a', 'e', ..., 'd', 'c', 'd']], shape=(200, 10), dtype=object)

Plot SAXO time and value discretization:

from matplotlib import pyplot as plt
from saxo.viz import plot_saxo

fig, ax = plt.subplots(figsize=(5, 3), layout="constrained")
plot_saxo(saxo, [ax], X=X)
ax.set_xlim((0, X.shape[-1] - 1))
plt.show()

screenshot

Can then be used with any scikit-learn estimator:

from sklearn.manifold import TSNE
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
import matplotlib.pyplot as plt

X_projected = make_pipeline(OneHotEncoder(sparse_output=False), PCA(n_components=10), TSNE()).fit_transform(X_transformed)
y = LabelEncoder().fit_transform(y)
plt.scatter(X_projected[:, 0], X_projected[:, 1], c=y)
plt.show()
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.pipeline import make_pipeline

clf = make_pipeline(OneHotEncoder(), LogisticRegression()).fit(X_transformed)
clf.score(X_transformed, y)
from sklearn.cluster import KMeans
from sklearn.preprocessing import OneHotEncoder

y_pred = make_pipeline(OneHotEncoder(), KMeans(3)).fit_predict(X_transformed)
colors = ["red", "blue", "green"]
for i in range(3):
    plt.plot(X[y_pred == i].squeeze().transpose(), color=colors[i], alpha=0.01)
plt.show()

You can also do anomaly detection with SAXO (by computing the distance between the time series and the typical time series associated with its representation):

y_pred_saxo = saxo.score_samples(X)
ano_saxo = y_pred_saxo.sum(axis=1).argmin()

fig, ax = plt.subplots(figsize=(5, 3), layout="constrained")
plot_saxo(saxo, [ax], X=X)
ax.plot(X[ano_saxo].T, color="red")
ax.set_xlim((0, X.shape[-1] - 1))
plt.show()

screenshot

References

SAXO representation

Alexis Bondu, Marc Boullé and Benoît Grossin. "SAXO: An optimized data-driven symbolic representation of time series". International Joint Conference on Neural Networks (IJCNN). IEEE, 2013.

Alexis Bondu, Marc Boullé, and Antoine Cornuéjols. "Symbolic representation of time series: A hierarchical coclustering formalization." Advanced Analytics and Learning on Temporal Data (AALTD). Springer, 2015.

Anomaly detection with coclustering

Guigourès, Romain. "Utilisation des modèles de co-clustering pour l'analyse exploratoire des données." Diss. Université Panthéon-Sorbonne-Paris I, 2013.

Development

Create a local conda environement with khiops (skip this if you can install khiops-python with pip, but requires a global khiops-core install):

  • conda create -p .venv python=3.12
  • conda activate .venv
  • conda install -c conda-forge -c khiops-dev khiops=11.0.0.3

Formatting and linting is done with ruff as a pre-commit:

  • install: pre-commit install,
  • format and lint: pre-commit run --all-files (automatically done before a commit).

Run tests with uv: uv run pytest.

About

Symbolic Aggregate approXimation Optimized with MODL

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages