Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
-
Updated
Oct 30, 2025 - Jupyter Notebook
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
[SIGIR 2022] Source code and datasets for "Bias Mitigation for Evidence-aware Fake News Detection by Causal Intervention".
Causal Intervention on Modality-specific Biases for Medical Visual Question Answering
Demystifying Verbatim Memorization in Large Language Models
A causal intervention framework to learn robust and interpretable character representations inside subword-based language models
A framework for evaluating auto-interp pipelines, i.e., natural language explanations of neurons.
[EMNLP 2023] A Causal View of Entity Bias in (Large) Language Models
Capture macOS dictation accurately without rewriting your words, keeping your input true to what you speak and avoiding common app issues.
Add a description, image, and links to the causal-intervention topic page so that developers can more easily learn about it.
To associate your repository with the causal-intervention topic, visit your repo's landing page and select "manage topics."