From Sequential Chains to Parallel Minds

An interactive journey into the neural network architectures that power Large Language Models. Explore the key innovations that moved from processing words one-by-one to understanding entire contexts at once.

Part 1: The Evolution of Sequence Models

Early models processed text sequentially, like reading a book one word at a time. This created a fundamental bottleneck. This section explores the progression from simple Recurrent Neural Networks (RNNs) to the more sophisticated Long Short-Term Memory (LSTM) networks, and highlights the core problem—long-range dependencies—that paved the way for the Transformer.

Architectural Trade-offs

Select an architecture to see its characteristics and how it handles information over long sequences.

Visualizing Long-Range Dependencies

This chart illustrates how well each architecture retains information from earlier in a sequence. The Transformer's self-attention mechanism allows it to maintain perfect "memory" regardless of distance.

Part 2: A Deep Dive into the Transformer

The Transformer's power comes from a few core components that work together. Instead of recurrence, it uses **self-attention** to weigh the importance of all words in the input simultaneously. This section provides interactive visualizations to build an intuition for these key mechanisms.

The Core Mechanism: Scaled Dot-Product Attention

Self-attention allows the model to determine how important other words in a sentence are to the meaning of a specific word. Hover over a word below to see its "attention scores" relative to the other words. A darker shade means a higher score.

Attention(Q, K, V) = softmax( (QKT) / √dk ) V

Part 3: Building a Modern LLM

A powerful base model is not enough. Modern LLMs undergo a multi-stage training process to align their vast knowledge with human intent, making them helpful and safe. This pipeline transforms a raw text-completer into a sophisticated instruction-following assistant. Click on each stage to learn more.

1. Pre-training

Building General Knowledge

2. Supervised Fine-Tuning

Learning to Follow Instructions

3. Reinforcement Learning

Aligning with Human Feedback