Understanding attention in the mean-field

Aditya Cowsik; Surya Ganguli; Tamra Nebabu; Xiao-Liang Qi

Understanding attention in the mean-field

ORAL

Abstract

Attention is the vehicle driving forward progress in modern deep learning. Understanding how it works is a central question in machine learning. One avenue to tackle this problem is to understand how the geometry of the token vectors changes as they propagate deeper and deeper into the network. We analyze this recursion by means of a replica-symmetric ansatz which gives a starting point for perturbation theory. We reproduce using simple derivations already known facts such as the necessity of initializing near a trivial model, and the importance of pre-layer norm.

^* Stanford Graduate Fellowship, National Science Foundation (grant No.211199).

March 5, 2024, 2:30 PM – March 5, 2024, 2:42 PM

Presenters

Aditya Cowsik

Stanford University

Authors

Aditya Cowsik

Stanford University
Surya Ganguli

Stanford University
Tamra Nebabu

Stanford University
Xiao-Liang Qi

Stanford University