Mechanistic Interpretability of In-Context Learning in Transformers

Mini T V

Authors

Mini T V Sacred Heart College (Autonomous), Chalakudy, India Author

Keywords:

In-Context Learning, Transformer Models, Mechanistic Interpretability, Induction Heads, Meta-Learning, Attention Pattern Visualization

Abstract

Transformer models demonstrate remarkable in-context learning capabilities, adapting to novel tasks from mere examples without parameter updates. Despite widespread deployment, the internal mechanisms enabling this emergent behavior remain poorly understood. We present comprehensive mechanistic analysis revealing that in-context learning emerges from discrete circuit structures called induction heads that form during a sharp phase transition in training. Through systematic ablation studies, attention pattern visualization, and activation space analysis across models from 125M to 52B parameters, we identify the precise architectural components responsible for in-context learning and characterize their formation dynamics. Our findings demonstrate that induction heads implement approximate Bayesian inference by maintaining task-relevant statistics in attention patterns, providing algorithmic understanding of how transformers perform meta-learning. We validate these mechanisms across diverse tasks including translation, arithmetic, and logical reasoning, revealing universal computational motifs underlying in-context learning. These insights enable targeted architectural modifications that enhance in-context learning efficiency by 3x while reducing computational requirements, with significant implications for model design, training efficiency, and interpretability research.

Author Biography

Mini T V, Sacred Heart College (Autonomous), Chalakudy, India

Associate Professor, Department of Computer Science

Mechanistic Interpretability of In-Context Learning in Transformers

Authors

Keywords:

Abstract

Author Biography

Downloads

Published

Issue

Section

eb

Information

Make a Submission

Language

Keywords