Interactive Primer on LLM Neural Network Theory

From Sequential Chains to Parallel Minds

An interactive journey into the neural network architectures that power Large Language Models. Explore the key innovations that moved from processing words one-by-one to understanding entire contexts at once.

Part 1: The Evolution of Sequence Models

Early models processed text sequentially, like reading a book one word at a time. This created a fundamental bottleneck. This section explores the progression from simple Recurrent Neural Networks (RNNs) to the more sophisticated Long Short-Term Memory (LSTM) networks, and highlights the core problem—long-range dependencies—that paved the way for the Transformer.

Architectural Trade-offs

Select an architecture to see its characteristics and how it handles information over long sequences.

Visualizing Long-Range Dependencies

This chart illustrates how well each architecture retains information from earlier in a sequence. The Transformer's self-attention mechanism allows it to maintain perfect "memory" regardless of distance.

Part 2: A Deep Dive into the Transformer

The Transformer's power comes from a few core components that work together. Instead of recurrence, it uses **self-attention** to weigh the importance of all words in the input simultaneously. This section provides interactive visualizations to build an intuition for these key mechanisms.

The Core Mechanism: Scaled Dot-Product Attention

Self-attention allows the model to determine how important other words in a sentence are to the meaning of a specific word. Hover over a word below to see its "attention scores" relative to the other words. A darker shade means a higher score.

Attention(Q, K, V) = softmax( (QK^T) / √d_k ) V

Part 3: Building a Modern LLM

A powerful base model is not enough. Modern LLMs undergo a multi-stage training process to align their vast knowledge with human intent, making them helpful and safe. This pipeline transforms a raw text-completer into a sophisticated instruction-following assistant. Click on each stage to learn more.

1. Pre-training

Building General Knowledge

→

2. Supervised Fine-Tuning

Learning to Follow Instructions

→

3. Reinforcement Learning

Aligning with Human Feedback