Un laboratorio en 1991 estableció los cuatro pilares de los LLM modernos

Four of the core techniques powering every large language model today -- transformers, pre-training, distillation, and residual connections -- were all published from a single lab in Munich over a five-month stretch in 1991.

Jürgen Schmidhuber's team at the Technical University Munich filed the unnormalized linear Transformer on March 26, 1991. That's the T in ChatGPT. While today's quadratic-attention transformers get all the hype, the 1991 variant already had linear computational cost scaling, not quadratic. That efficiency advantage is still relevant.

Pre-Training and Distillation Came Seven Months Before the First Modern LLM Paper

On April 30, 1991, Schmidhuber's lab published two more techniques in the same day: unsupervised pre-training for deep neural networks (the P in ChatGPT) and neural network distillation. Both were filed as part of the same technical report series (UN0-UN2). Distillation became central to DeepSeek's 2025 'Sputnik' model, which compressed a teacher network into a smaller student while retaining most task performance. The 1991 work predates the well-known Hinton et al. distillation paper by over two decades.

Deep Residual Learning: The Missing Link that Made 1000-Layer Networks Possible

The June 15, 1991 entry marks Sepp Hochreiter's diploma thesis introducing deep residual learning with residual connections. That single idea became the core ingredient of Long Short-Term Memory (the most cited AI paper of the 20th century) and later of ResNet (the most cited scientific article of the 21st century). The Highway Net variant from 2015, which Schmidhuber notes was directly inspired by the 1991 residual connections, was 10 times deeper than any previous feedforward network. Residual learning is now baked into every LLM architecture.

The First GAN Paper Arrived August 31, 1991

The lab's peer-reviewed publication on generative and adversarial networks predates Ian Goodfellow's 2014 GAN paper by 23 years. Schmidhuber frames it as part of a broader effort toward neural world models trained through artificial curiosity. That work fed directly into David Ha's 2018 world models paper at Google Brain and continues at Sakana AI with recursive self-improvement research.

Schmidhuber is blunt about the limitations: "In 1991, it was already totally obvious that LLM-like NNs alone are not enough to achieve AGI." His lab started working on planning with adaptive world models and meta-learning the same year. Munich also hosted Ernst Dickmanns's self-driving cars hitting 175 km/h on the autobahn.

The full timeline is on Schmidhuber's site with citations to the original TU Munich technical reports. If you think the AI boom started with the 2017 Transformer paper, you owe it to yourself to read what happened in 1991 -- and ask why those techniques took thirty years to scale.

Source: Munich 1991: the Roots of the Current AI Boom
Domain: people.idsia.ch

Un laboratorio en 1991 estableció los cuatro pilares de los LLM modernos

Pre-Training and Distillation Came Seven Months Before the First Modern LLM Paper

Deep Residual Learning: The Missing Link that Made 1000-Layer Networks Possible

The First GAN Paper Arrived August 31, 1991

More in Artificial Intelligence