transformer-interpretability

Here are 7 public repositories matching this topic...

hila-chefer / Transformer-Explainability

[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.

deep-learning vit bert perturbation attention-visualization bert-model explainability attention-matrix vision-transformer transformer-interpretability visualize-classifications cvpr2021

Updated Jan 24, 2024
Jupyter Notebook

HeegyuKim / CurseFilter

Star

Detect & Filter korean curse text using huggingface transformer, KcBERT, Transformer-Interpret

pytorch transformer bert xai huggingface streamlit transformer-interpretability streamlit-application kcbert

Updated May 13, 2022
Jupyter Notebook

fabthebest / AQLES

Star

Probing quality-evaluative geometry in transformer hidden states. GPT-2 encodes quality better than BERT, with a negativity bias that mirrors human cognition.

nlp ml bert quality-assessment gpt-2 huggingface transformer-interpretability mechanistic-interpretability probing-classifiers

Updated Apr 7, 2026
Jupyter Notebook

Configurable character-level transformer training suite with built-in mechanistic interpretability toolkit — scale to 150M+ parameters and beyond, no ceilings, only hardware limits. Inspect attention weights, hidden states, and head specialisation across all layers. Documented circuit findings included.

nlp deep-learning pytorch transformer gpt language-model attention-mechanism circuit-analysis interpretability character-level-language-model attention-visualization attention-heads transformer-interpretability mechanistic-interpretability residual-stream hidden-state-analysis

Updated Jun 5, 2026
Jupyter Notebook

RyoSpiralArchitect / spiral-hodge

Star

Fourier, graph, Hodge, and signed-circulation probes for transformer hidden-state trajectories.

fourier-analysis transformer-interpretability mechanistic-interpretability hodge-decomposition

Updated May 19, 2026
Python

Seqev / spectral-gap-statement

Star

The Spectral Gap-Statement: when the negative subspace of attention transport is a well-posed invariant

machine-learning attention optimal-transport transformer-interpretability spectral-theory wasserstein-geometry

Updated May 17, 2026
TeX

ugail / LLM-Cross-Replay-Decomposition

Star

A diagnostic control paradigm for activation measurements in transformer language models. Cross-replay separates text-bound from architecture-bound components by replaying generated sequences through intact and perturbed model variants.

language-models pythia interpretability variance-decomposition linear-probing permutation-test detrended-fluctuation-analysis gpt-2 activation-analysis representational-similarity transformer-interpretability gpt-j centered-kernel-alignment mechanistic-interpretability attention-analysis cross-replay procrustes-distance

Updated May 17, 2026
Jupyter Notebook

Improve this page

Add a description, image, and links to the transformer-interpretability topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the transformer-interpretability topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transformer-interpretability

Here are 7 public repositories matching this topic...

hila-chefer / Transformer-Explainability

HeegyuKim / CurseFilter

fabthebest / AQLES

AdityaSinghDevs / nanolens

RyoSpiralArchitect / spiral-hodge

Seqev / spectral-gap-statement

ugail / LLM-Cross-Replay-Decomposition

Improve this page

Add this topic to your repo