From Sequential Chains to Parallel Minds
An interactive journey into the neural network architectures that power Large Language Models. Explore the key innovations that moved from processing words one-by-one to understanding entire contexts at once.
Part 1: The Evolution of Sequence Models
Early models processed text sequentially, like reading a book one word at a time. This created a fundamental bottleneck. This section explores the progression from simple Recurrent Neural Networks (RNNs) to the more sophisticated Long Short-Term Memory (LSTM) networks, and highlights the core problem—long-range dependencies—that paved the way for the Transformer.
Architectural Trade-offs
Select an architecture to see its characteristics and how it handles information over long sequences.
Visualizing Long-Range Dependencies
This chart illustrates how well each architecture retains information from earlier in a sequence. The Transformer's self-attention mechanism allows it to maintain perfect "memory" regardless of distance.
Part 2: A Deep Dive into the Transformer
The Transformer's power comes from a few core components that work together. Instead of recurrence, it uses **self-attention** to weigh the importance of all words in the input simultaneously. This section provides interactive visualizations to build an intuition for these key mechanisms.
The Core Mechanism: Scaled Dot-Product Attention
Self-attention allows the model to determine how important other words in a sentence are to the meaning of a specific word. Hover over a word below to see its "attention scores" relative to the other words. A darker shade means a higher score.
Part 3: Building a Modern LLM
A powerful base model is not enough. Modern LLMs undergo a multi-stage training process to align their vast knowledge with human intent, making them helpful and safe. This pipeline transforms a raw text-completer into a sophisticated instruction-following assistant. Click on each stage to learn more.
1. Pre-training
Building General Knowledge
2. Supervised Fine-Tuning
Learning to Follow Instructions
3. Reinforcement Learning
Aligning with Human Feedback