Transformers as Universal Turing Machines: Explaining Through Lag Systems
Transformers have revolutionized the field of natural language processing (NLP) and beyond, powering models like GPT, BERT, and T5. But beyond their practical applications, there’s a fascinating theoretical aspect that positions Transformers as more than just effective language models. Some research suggests that Transformers can be considered Universal Turing Machines — in other words, they have the computational power to perform any task that a Turing machine can.
This bold claim rests on connecting Transformers to Lag systems, simplified models of computation that are known to be Turing-complete. In this blog, we’ll explore:
1. What Lag systems are, with examples.
2. Why Lag systems are equivalent to Universal Turing Machines.
3. How Transformers can simulate Lag systems, thereby proving their Turing-completeness.
1. What Are Lag Systems?
At first glance, Lag systems might sound obscure, but they are actually elegant, simplified computational models that bear a strong resemblance to Turing machines. The main difference lies in how they operate.
How Lag Systems Work:
A Lag system manipulates a memory string of symbols using a set of simple rules. The rules act on a fixed-length prefix (the first few symbols of the string), replacing that prefix with new symbols. The system then shifts the string forward and repeats the process…