Layerwise diagnostic showing instruction-tuned LLMs settle on next-token predictions later.
reproducibility large-language-models instruction-tuning mechanistic-interpretability llm-interpretability model-diffing tuned-lens convergence-gap
-
Updated
May 8, 2026 - Python