No math. No PhD. Just interactive demos that show you what's happening under the hood when you talk to AI.
1. What's a Token?
AI doesn't read words — it reads tokens. A token is usually a piece of a word, a whole word, or punctuation. Type anything below and watch it get split into tokens in real-time.
An LLM can only "remember" a fixed amount of text at once — its context window. Everything you send (your prompt, the conversation history, system instructions) has to fit inside it. When it fills up, older stuff gets forgotten.
0 tokens usedClaude 3.5 — 200K tokens
Claude (Anthropic)
200,000 tokens — fits ~150,000 words (~300 pages)
GPT-4o (OpenAI)
128,000 tokens — fits ~96,000 words (~190 pages)
Gemini 1.5 (Google)
1,000,000 tokens — fits ~750,000 words (~1,500 pages)
GPT-3 (2020)
4,096 tokens — fits ~3,000 words (~6 pages)
Click a document type above to see how much of the context window it fills. Then switch models to compare.
3. Attention: How AI Connects the Dots
When generating each word, the model doesn't treat all input equally. It pays attention to the most relevant parts. Click any word below to see what the model would focus on when generating from that position.
Click a word to see which other words the model pays attention to. Brighter = stronger attention.
4. What Happens When You Hit Send
From the moment you press Enter to the moment you see a response — here's every step, in order.
1
Your text gets tokenized
Your message is split into tokens — pieces of words the model can process. "I love pizza" becomes something like ["I", " love", " pizza"]. This is the same process you saw in the tokenizer above.
2
Tokens become numbers (embeddings)
Each token gets converted into a long list of numbers (a vector) that captures its meaning. "King" and "queen" end up as nearby coordinates in this number space. It's how the model understands that words relate to each other.
3
Attention layers process context
The model runs your tokens through dozens of attention layers. Each layer figures out how tokens relate to each other — which words modify which, what "it" refers to, how the sentence structure works. This is the expensive part.
4
Model predicts the next token
After processing, the model outputs a probability for every possible next token. It might say: "the" (32%), "a" (18%), "my" (12%), "their" (8%)... It picks one (influenced by temperature) and that's the first token of the response.
5
Repeat — one token at a time
The predicted token gets added to the sequence, and the whole process runs again to predict the next token. This is why you see AI "typing" word by word — it's literally generating one token at a time. A 500-word response = ~375 prediction cycles.
6
Tokens decode back into text
The generated token IDs get converted back into readable text and streamed to your screen. The model has no memory of this conversation unless you send the whole history again next time — it's stateless.
5. Temperature = Creativity Dial
Every AI model has a temperature setting (0.0 to 1.0) that controls randomness. At low temperature, the model always picks the most likely next word — safe, consistent, boring. At high temperature, it takes bigger creative risks — sometimes brilliant, sometimes nonsense. Move the slider and watch the same prompt produce wildly different outputs.
Prompt "Write a one-sentence tagline for a new productivity app."
FocusedCreative0.5
0.0 — Always picks safest word1.0 — Rolls the dice
Run 1
Run 2
Run 3
Key Takeaways
1
LLMs are autocomplete on steroids
They predict the next token based on everything before it. That's the entire trick — it just works shockingly well at scale.
2
Context windows are short-term memory
The model forgets everything between conversations. If you want it to "remember," you have to send the context every time (or use RAG — that's the next lesson).
3
Structure your prompts because attention is literal
The model physically "attends" to different parts of your input. Clear structure (headers, XML tags, examples) gives it better anchors to focus on.
4
Hallucinations are confidence without knowledge
The model always picks the most probable next token. It has no way to say "I don't know" — it will always generate something, even if it's wrong.