Dynamic metastability in the self-attention model

Borjan Geshkovski; Hugo Koubbi; Yury Polyanskiy; Philippe Rigollet

Pré-Publication, Document De Travail Année : 2024

Dynamic metastability in the self-attention model

(1, 2) , (3, 4) , (5) , (6)

1
2
3
4
5
6

Borjan Geshkovski

Fonction : Auteur

Laboratoire Jacques-Louis Lions

Inria de Paris

Hugo Koubbi

Fonction : Auteur

Yale University [New Haven]

Ecole Normale Supérieure Paris-Saclay

Yury Polyanskiy

Fonction : Auteur

Massachusetts Institute of Technology

Philippe Rigollet

Fonction : Auteur

Department of Mathematics [MIT]

Résumé

We consider the self-attention model - an interacting particle system on the unit sphere, which serves as a toy model for Transformers, the deep neural network architecture behind the recent successes of large language models. We prove the appearance of dynamic metastability conjectured in [GLPR23] - although particles collapse to a single cluster in infinite time, they remain trapped near a configuration of several clusters for an exponentially long period of time. By leveraging a gradient flow interpretation of the system, we also connect our result to an overarching framework of slow motion of gradient flows proposed by Otto and Reznikoff [OR07] in the context of coarsening and the Allen-Cahn equation. We finally probe the dynamics beyond the exponentially long period of metastability, and illustrate that, under an appropriate time-rescaling, the energy reaches its global maximum in finite time and has a staircase profile, with trajectories manifesting saddle-to-saddle-like behavior, reminiscent of recent works in the analysis of training dynamics via gradient descent for two-layer neural networks.

Mots clés

Machine Learning (cs.LG) Analysis of PDEs (math.AP) Dynamical Systems (math.DS) FOS: Computer and information sciences FOS: Mathematics

Domaines

Mathématiques [math] Informatique [cs]

Fichier principal

GKPR24.pdf (915.39 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Borjan Geshkovski : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04731856

Soumis le : mardi 15 octobre 2024-22:23:12

Dernière modification le : jeudi 28 novembre 2024-16:28:03

Dates et versions

hal-04731856 , version 1 (15-10-2024)

Identifiants

HAL Id : hal-04731856 , version 1

Citer

Borjan Geshkovski, Hugo Koubbi, Yury Polyanskiy, Philippe Rigollet. Dynamic metastability in the self-attention model. 2024. ⟨hal-04731856⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA ENS-CACHAN INSMI LJLL INRIA2 SORBONNE-UNIVERSITE SU-SCIENCES UP-SCIENCES ENS-PARIS-SACLAY

110 Consultations

16 Téléchargements

Dynamic metastability in the self-attention model

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager