What is Google's new Infinite Attention and what impacts on SEO and UX?

Recently published in a research paper entitled “Leave No Context Behind: Efficient Infinite Context Transformers with Infinite-attention”*, Infinite-attention is a new technology from Google that allows the search engine to improve its current artificial intelligence models in processing massive amounts of data with infinitely long contexts. 

Design by Freepik 

What is Infinite Attention?

To fully understand the importance of this new technology, it is important to know that LLM (LLarge Language Models) are limited in the amount of data they can process at the same time. Indeed, in current models the use of memory can increase considerably while the working power decreases as soon as the calculation becomes more complex. “Memory is the cornerstone of intelligence”, explain the researchers, it is therefore imperative to reduce the financial costs. 

Therefore, the researchers also emphasize that: 

  • “Transformers-based LLMs […] have limited contextual memory, due to the nature of the attention mechanism…” 
  • “Scaling LLMs to longer sequences (i.e. 1M tokens) is challenging with standard Transformer architectures, and serving longer and longer context models becomes financially costly.”
  • “Current transformer models are limited in their ability to process long sequences due to the quadratic increase in computational and memory costs. Infinite-attention aims to solve this scalability problem.”

 

Characteristics of Infinite-Attention

Google's Infinite-Attention is ready to use, it easily fits into other LLM models, particularly those used by Google's main algorithm. These main characteristics are: 

  • Conclusion compressive memory system which allows it to compress information during a long data sequence. When data is introduced, the oldest data is reduced in order to optimize data storage.
  • Conclusion long-term linear attention which allows it to process data that exists earlier in the long data sequence during tasks where context exists on a wide data plane. From a user perspective, this is like discussing a book in the context of all the chapters while still being able to explain the overall plan and connections between the chapters.
  • Conclusion local masked attention which processes nearby (localized) parts of the input data. This attention is very useful for answers that depend on the closest parts of the data.

 

Test results

The researchers explain that the Transformers problem can be solved by combining the features of Infinite-Attention (compression, long-term and local attention) into a single Transformer block. They call this attention mechanism the "vanilla attention mechanism”. As they point out, “Infinite-Attention incorporates compressive memory into the vanilla attention mechanism and integrates both masked local attention and long-term linear attention mechanisms into a single Transformer block”.

Three tests were carried out by the researchers:

  • Linguistic modeling in long context with its perplexity score: The researchers report that with increasing training sequence, the perplexity score of models with Infinite-attention drops, which is the first indicator that they perform better than baseline models. 
  • The master key: The results of the boilerplate test follow, i.e. the ability to find hidden text at the beginning, middle or end of a long sequence, which confirm the quality of the models with Infini-attention. 
  • The summary of the book: The excellence of the models with Infini-Attention is confirmed with the book summary test results which outperform the main benchmarks to reach new levels of SOTA performance**.

“Our model outperforms previous best results and achieves a new SOTA on BookSum, processing the entire book text. […] There is a clear trend showing that with more text provided as input from books, our Infinite-Transformers improve their summary performance metric.”

Infinite-Attention is a breakthrough in modeling long- and short-term attention. THE “plug-and-play continuous pre-workout” et “adaptation to the long context by design”, means it can be easily integrated into existing models.

 

Impacts for SEO and UX

Having understood all this, it is entirely legitimate to ask the question of the impacts for SEO and UX. Here are the first ideas:

  • Infinite attention may be integrated into Google's core algorithm quite easily. We may see it implemented quickly.
  • This technology could allow the engine totrain your AI as new content is discovered on the web and understand the importance of each new content whether it is at the beginning, in the middle or at the end of the long sequence (on a particular subject for example). Therefore, it is not surprising that researchers talk “infinitely long entries”.
  • It is important for the engine from the point of view EEAT***, since the engine wants “leave no context behind”, that is to say, better judge the experience and expertise of an author on a specific subject based on all the information he has on a subject. 
  • From a user experience point of view, Infinite-attention will allow the engine to better adapt to the needs of its users and therefore to keep them on its site as long as possible. Indeed, there was no need to read all the books on a subject if AI could answer all our questions, including the most advanced, in a long and complex context. 

 

In a nutshell

The future of SEO lies in the creation of content relevant to the user experience by integrating expertise, but above all experience and an ability to adapt to developments in AI.  

 

References :

* Leave No Context Behind: Efficient Infinite Context Transformers with Infinite-attention (Leaving No Context Behind: Efficient Infinite Context Transformers with Infinite-Attention) 

** A DNN (Deep Neural Networks) network can obtain the SOTA (state-of-the-art) label based on its accuracy, speed or any other relevant metric . State-of-the-art Deep Neural Networks (DNN) (SOTA) are the best models you can use for a specific task.

*** EEAT (Experience, Expertise, Authoritativeness, and Trustworthiness) is part of “Google’s Search Quality Rater Guidelines”.

 

 

Rossitza Mavreau, Lead Traffic Manager SEO SEA Analytics at UX-Republic