Skip to content

greatroboticslab/manufacturingAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Investigating Stack Depth in xLSTM Architectures for Vibration Time Series Prediction

Official implementation of Investigating Stack Depth in xLSTM Architectures for Vibration Time Series Prediction, trained on MTSU's HPC cluster using DeepSpeed ZeRO Stage 2.


This study evaluates five xLSTM configurations ranging from 1 to 5 stacked mLSTM-sLSTM block pairs for one-step-ahead vibration displacement forecasting. Results reveal a strongly non-monotonic relationship between stack depth and predictive accuracy, with the 1-stack configuration achieving the best performance (RΒ² = 0.9869) and the 4-stack configuration undergoing near-complete training collapse.

πŸ’₯ News πŸ’₯

  • [05.01.2026] Repository released with full training code, DeepSpeed config, and SLURM job script.

Overview

Vibration time series prediction from industrial interferometer signals presents a challenging forecasting problem due to high-frequency content, large dynamic range, and intermittent spike artifacts. This repository investigates how stacking depth in xLSTM architectures affects forecasting performance on this domain.

Five architectures are evaluated β€” a single mLSTM-sLSTM block pair (1-stack) up to five sequential block pairs (5-stack) β€” with all other architectural and training parameters held constant. All experiments were conducted on two NVIDIA RTX A5000 GPUs using DeepSpeed ZeRO Stage 2 optimization and FP16 mixed precision.


Architecture

The xLSTM model is built from two complementary recurrent block types stacked to varying depths:

mLSTM Block β€” operates on a 16Γ—16 matrix-valued memory tensor updated at each timestep via outer-product operations. Gate outputs are constrained to [0.05, 0.95] to prevent saturation, and the memory update is scaled by 0.1 to prevent exponential growth.

sLSTM Block β€” follows a conventional gated recurrent structure with scalar memory cells, applying LayerNorm to the cell state before the tanh activation for stable activation distributions across long sequences.

The stacking scheme for each configuration is: N mLSTM blocks followed by N sLSTM blocks, where N ∈ {1, 2, 3, 4, 5}. LayerNorm and Dropout (p = 0.1) are inserted between consecutive same-type blocks to stabilize gradient flow. Input sequences of 30 timesteps are projected from 1 dimension to a 128-dimensional hidden representation before being passed through the stacked blocks. The hidden state at the final timestep is projected to a single displacement prediction.


Data

Vibration displacement signals (nm) were collected via interferometer from a lathe machine over time (ms). Raw signals exhibit two primary artifacts that require correction before training: a consistent downward drift and intermittent sharp spikes from sensor noise.

Figure 1. Representative raw displacement signal showing downward slope drift and a sharp spike artifact at approximately 185,000 ms.

Preprocessing steps applied before training:

  1. Linear regression detrending to remove measurement drift.
  2. Spike correction using a 3-standard-deviation threshold in 10,000 ms windows, with linear interpolation over detected outliers.
  3. Sequence construction with length 30 and stride 1 for one-step-ahead forecasting.
  4. RobustScaler normalization fit on the training set and applied to validation.

The dataset comprises 22 CSV files for training (~7.6M sequences) and 1 CSV file for validation (~605K sequences).


Results

Quantitative Comparison Across Stack Depths

Configuration RΒ² RMSE (nm) MAE (nm) Epochs Train Time (h)
1-Stack 0.9869 74.41 38.88 35 3.16
2-Stack 0.9828 85.33 44.93 29 15.29
3-Stack 0.9650 121.60 60.26 13 3.38
4-Stack 0.5286 446.25 65.00 43 45.65
5-Stack 0.8796 225.54 72.81 13 17.54

Evaluated on 605,772 validation sequences. Training time and epoch count reflect the full training run including early stopping.

The 4-stack configuration represents a qualitatively different failure mode: despite the longest training run (45.65 hours, 43 epochs), the model became trapped in a poor local minimum and generated large-magnitude hallucinated predictions on a subset of high-amplitude inputs. The 5-stack partially recovered (RΒ² = 0.8796), suggesting a different convergence trajectory.


Temporal Tracking β€” Predicted vs. Actual Displacement

1-Stack β€” Near-perfect overlap between predicted and actual displacement across the full dataset. In the detail view, the model correctly captures rapid direction reversals, peak sharpness, and zero-crossings with only marginal underestimation at the highest-amplitude spikes.

2-Stack β€” Produces visually similar tracking but with slightly more divergence at high-amplitude burst events, consistent with its higher RMSE.

3-Stack β€” Shows increasing divergence at peak amplitude events. The detail view confirms growing phase misalignment in the 400–600 step window, consistent with the degraded RΒ² = 0.9650.

4-Stack β€” Exhibits the most visually striking failure: a hallucinated spike near time step 120,000 reaching approximately 10,000 nm with no corresponding feature in the actual signal. The predicted axis range extends to Β±10,000 nm, confirming that the model generates large-magnitude out-of-distribution predictions on a subset of inputs.

5-Stack β€” Shows a different failure signature: a spurious deep negative spike near time step 120,000 and asymmetric amplitude envelope clipping. The detail view confirms the predicted line has become noticeably smoother and less reactive than the actual signal.


Residual Analysis

All configurations that converged successfully exhibit an S-shaped nonlinear bias: residuals are increasingly negative at large negative predicted displacements and increasingly positive at large positive ones. This regression-toward-the-mean behavior is a structural property of MSE-trained xLSTM on this dataset and grows in magnitude with stack depth.

1-Stack β€” Tightest residual distribution, near-zero mean of βˆ’2.24 nm.

2-Stack β€” Almost identical S-shaped pattern with a negligible mean bias of βˆ’1.24 nm.

3-Stack β€” S-curve deepens noticeably; residuals reach approximately βˆ’2,000 nm at the most negative predicted values. Mean bias increases to βˆ’10.92 nm.

4-Stack β€” Qualitatively different failure mode. For predicted displacements beyond approximately 4,000 nm, residuals plunge in a tight diagonal arc to below βˆ’15,000 nm. Mean bias βˆ’11.63 nm.

5-Stack β€” Partially recovers with a small positive mean bias of +4.90 nm, but shows a structured downward arc for predicted values in the 0 to βˆ’2,000 nm range, with residuals dropping to approximately βˆ’7,000 nm.


Repository Structure

.
β”œβ”€β”€ xlstm_v4_deepspeed.py          # Main training script
β”œβ”€β”€ run_xlstm_v4_deepspeed.slurm   # SLURM job submission script
β”œβ”€β”€ ds_config.json                 # DeepSpeed ZeRO Stage 2 configuration
β”œβ”€β”€ figures/                       # Result plots and architecture diagram
└── README.md

Installation

Prerequisites: Python 3.8+, PyTorch with CUDA, and a SLURM-managed cluster with NVIDIA GPUs.

Install dependencies:

pip install deepspeed torch numpy pandas scikit-learn matplotlib tqdm

Data Preparation

Place CSV files (two columns: time, displacement) into the following structure:

Data/
β”œβ”€β”€ Train/    # 22 CSV files (~7.6M training sequences)
└── Test/     # 1 CSV file  (~605K validation sequences)

Update DATA_FOLDER in xlstm_v4_deepspeed.py to match your path.


Training

Configuring Stack Depth

Open xlstm_v4_deepspeed.py and set NUM_LAYERS to the desired stack depth (1–5):

NUM_LAYERS = 1   # ← Change this to 1, 2, 3, 4, or 5

Running on a SLURM Cluster

Submit the job using the provided SLURM script:

sbatch run_xlstm_v4_deepspeed.slurm

The script launches DeepSpeed across 2 GPUs on the research-gpu partition:

deepspeed --num_gpus=2 xlstm_v4_deepspeed.py \
    --deepspeed \
    --deepspeed_config ds_config.json

DeepSpeed Configuration

The ds_config.json enables ZeRO Stage 2 with FP16 mixed precision:

{
  "train_batch_size": 256,
  "train_micro_batch_size_per_gpu": 128,
  "fp16": { "enabled": true, "initial_scale_power": 16 },
  "zero_optimization": { "stage": 2, "overlap_comm": true }
}

Training Details

Setting Value
Optimizer Adam, lr = 1Γ—10⁻³, weight decay = 1Γ—10⁻⁡
LR Scheduler ReduceLROnPlateau (patience=3, factor=0.5)
Gradient Clipping max norm 1.0
Early Stopping patience = 10 epochs
Batch Size 256 (128 per GPU)
Sequence Length 30 timesteps
Hidden Size 128
Memory Dimension 16Γ—16 (mLSTM)
Dropout 0.1
Hardware 2Γ— NVIDIA RTX A5000 (24 GB each)
Precision FP16 via DeepSpeed auto loss scaling

Key Findings

  • The 1-stack configuration achieves the best performance across all metrics (RΒ² = 0.9869, RMSE = 74.41 nm, MAE = 38.88 nm) and is the recommended default for this task.
  • Performance degrades progressively through 3-stack, collapses catastrophically at 4-stack (RΒ² = 0.5286, 45.65 h training), then partially recovers at 5-stack.
  • The nonlinear S-shaped residual bias (regression toward the mean at extreme amplitudes) is a consistent structural property of MSE-trained xLSTM on high-dynamic-range vibration signals, and grows in magnitude with depth.
  • DeepSpeed ZeRO Stage 2 enabled stable training across all configurations but did not resolve fundamental optimization difficulties in deep recurrent stacks.

Citation

If you find this work useful, please consider citing:

@article{xlstm_stack_depth_vibration,
  title   = {Investigating Stack Depth in xLSTM Architectures for Vibration Time Series Prediction},
  year    = {2026}
}

References

[1] X. Fan, C. Tao, and J. Zhao, "Advanced stock price prediction with xLSTM-based models: Improving long-term forecasting," in 2024 11th International Conference on Soft Computing & Machine Intelligence (ISCMI). IEEE, 2024, pp. 117–123.

[2] M. Alharthi and A. Mahmood, "xLSTMTime: Long-Term Time Series Forecasting with xLSTM," AI, vol. 5, no. 3, pp. 1482–1495, Aug. 2024. doi: 10.3390/ai5030071

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors