HomeOVERVIEWHow Smooth Is Attention? - Apple Machine Learning Research

How Smooth Is Attention? – Apple Machine Learning Research

Self-attention and masked self-attention are at the heart of Transformers’ outstanding success. Still, our mathematical understanding of attention, in particular of its Lipschitz properties — which are key when it comes to analyzing robustness and expressive power — is incomplete. We provide a detailed study of the Lipschitz constant of self-attention in several practical scenarios, discussing the impact of the sequence length and layer normalization on the local Lipschitz constant of both unmasked and masked self-attention. In particular, we show that for inputs of length n in any compact set, the Lipschitz constant of self-attention is bounded by sqrt(n) up to a constant factor and that this bound is tight for reasonable sequence lengths. When the sequence length n is too large for the previous bound to be tight, which we refer to as the mean-field regime, we provide an upper bound and a matching lower bound which are independent of n. Our mean-field framework for masked self-attention is novel and of independent interest. Our experiments on pretrained and randomly initialized BERT and GPT-2 support our theoretical findings.
Figure 1: Regularity of the attention layer as a function of sequence length for different architectures.

Latest articles

Newbury BS cuts resi, expat, landlord rates by up to 30bps  – Mortgage Strategy

Newbury Building Society has cut fixed-rate offers by up to 30 basis points...

Rate and Term Refinances Are Up a Whopping 300% from a Year Ago

What a difference a year makes.While the mortgage industry has been purchase loan-heavy for...

Goldman Sachs loses profit after hits from GreenSky, real estate

Second-quarter profit fell 58% to $1.22 billion, or $3.08 a share, due to steep...

Why Do AIs Lie?

Zeroth Principles can clarify many issues in the ML/AI domain. As discussed in a...

More like this

The Power/Role of Emotional Intelligence (EI) in Business Analysis

Hey Guys,Let’s begin by understanding Emotional Intelligence (EI) and then dive into how EI...

AI’s Role in Navigating Climate Risk for Insurers  – expert.ai

Read any risk report in 2024 and one thing is clear: climate related risks...

chat gpt login free | Best Tool For Free OpenAI 2024

Introducing ChatGPTOpenAI created the advanced computer program chat gpt login free. It specializes in...