SpeechGraph

SpeechGraph is a computational pipeline for analyzing the structure of transcribed speech through directed lexical graphs. It extracts graph-theoretic and recurrence-based features from discourse and evaluates their association with impulsivity dimensions measured by the Barratt Impulsiveness Scale (BIS-11).

Pipeline

Parsing of orthographic transcripts into token sequences.
Graph construction: directed lexical transition graphs where each node is a unique token and each directed edge represents a transition between consecutive tokens within the same discourse segment.
Sliding-window metrics quantifying topology, connectivity, recurrence, and path structure of the graph.
Z-scores via permutation testing, comparing observed metrics against a null distribution generated by shuffling tokens within discourse segments.
Spearman correlations (simple and partial) between graph metrics and impulsivity dimensions.
Predictive regression using Monte Carlo cross-validation to assess how well graph metrics predict impulsivity scores.
Hyperparameter optimization with Optuna, exploring multiple regression algorithms and RFE-based feature selection.

Graph representation

$$ G = (V, E) $$

where each vertex $v \in V$ is a unique lexical token and each directed edge $(v_i, v_{i+1}) \in E$ represents a transition between consecutive tokens within the same segment. Edges are weighted by their occurrence frequency:

$$ w(v_i, v_j) = \text{number of times the transition } v_i \rightarrow v_j \text{ occurs} $$

Z-score via permutation

For an observed metric $x_{\text{obs}}$ and its null distribution ${x^{(r)}}_{r=1}^{R}$ generated by within-segment permutation:

$$ z_x = \frac{x_{\text{obs}} - \mu_{\text{rand}}}{\sigma_{\text{rand}}}, \qquad \mu_{\text{rand}} = \frac{1}{R}\sum_{r=1}^{R} x^{(r)}, \qquad \sigma_{\text{rand}}^2 = \frac{1}{R}\sum_{r=1}^{R} \left(x^{(r)} - \mu_{\text{rand}}\right)^2 $$

Regression

For each target $y$ and design matrix $\mathbf{X}$ (selected features):

$$ \hat{y} = \beta_0 + \mathbf{X}\boldsymbol{\beta} $$

Performance is evaluated via:

$$ \text{MAE} = \frac{1}{n}\sum_{i=1}^{n} |y_i - \hat{y}_i|, \qquad R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2} $$

Interactive dashboard

All results —cross-experiment comparisons, distributions, feature analysis, optimization, and SHAP analysis— are available on the live dashboard:

➡️ SpeechGraph Dashboard

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
.devcontainer		.devcontainer
dashboard		dashboard
outputs/regression_optuna/task6		outputs/regression_optuna/task6
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeechGraph

Pipeline

Graph representation

Z-score via permutation

Regression

Interactive dashboard

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SpeechGraph

Pipeline

Graph representation

Z-score via permutation

Regression

Interactive dashboard

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages