Reference

AI Glossary

Plain-English definitions of AI terms you'll actually encounter. No PhD required.

43 terms

Agent: An AI system that can take actions autonomously — reading files, calling APIs, running code, making decisions — not just generating text. Agents do work; chatbots give answers.
API (Application Programming Interface): A structured way for software to talk to other software. When you send a prompt to Claude through code instead of a chat window, you're using an API.
Attention: The mechanism that lets AI focus on relevant parts of the input when generating output. It's why AI can connect "the cat sat on the mat" to a question about cats, even if the question comes 1,000 words later.
Chain of Thought: A prompting technique where you ask AI to reason step by step instead of jumping to the answer. It improves accuracy on complex tasks by forcing the model to show its work.
Completion: The text that an AI model generates in response to a prompt. You send a prompt; you get back a completion.
Context Window: The maximum amount of text an AI can "see" at once — both your input and its output combined. Think of it as the model's working memory. Bigger windows let you feed more information.
Diffusion Model: An AI that generates images by starting with random noise and gradually refining it into a picture. Used by DALL-E, Midjourney, and Stable Diffusion.
Embedding: A way to convert text (or images) into a list of numbers that captures meaning. Similar concepts get similar numbers, which is how AI can understand that "dog" and "puppy" are related.
Fine-Tuning: Training an existing AI model on your own specific data to make it better at a particular task. Like teaching a general-purpose chef to specialize in Italian cuisine.
Guardrails: Rules and constraints you set to keep AI output safe, on-topic, and within boundaries. "Do NOT modify existing tests" is a guardrail. They prevent AI from going rogue on your codebase.
Hallucination: When AI generates information that sounds confident but is factually wrong. It doesn't "know" it's making things up — it's pattern-matching, not fact-checking. Always verify critical claims.
Inference: The process of running a trained AI model to generate output. When you send a prompt and get a response, that's inference. Training teaches the model; inference uses it.
Latency: The time between sending a prompt and receiving the first token of the response. Lower latency means faster responses. Affected by model size, server load, and network distance.
LLM (Large Language Model): A massive AI model trained on enormous amounts of text data. Claude, GPT, and Gemini are all LLMs. "Large" refers to the billions of parameters (adjustable weights) that make them work.
MCP (Model Context Protocol): An open protocol that lets AI models connect to external tools and data sources in a standardized way. Like USB for AI — one protocol, many tools. Created by Anthropic.
Multimodal: An AI that can process multiple types of input — text, images, audio, video — not just words. A multimodal model can look at a photo and describe what's in it.
Parameters: The adjustable numbers inside a neural network that the model learns during training. More parameters generally means more capability, but also more compute cost. Claude has billions of them.
Prompt: The input you give to an AI model — the question, instruction, or context that tells it what to do. Better prompts get better outputs. This is the skill most people underinvest in.
Prompt Engineering: The art and science of crafting prompts that get the best results from AI. Includes techniques like role prompting, chain-of-thought, few-shot examples, and structured output.
Prompt Injection: A security attack where malicious instructions are hidden in data that AI processes. Like SQL injection, but for AI prompts. Important to understand if you're building AI-powered tools.
RAG (Retrieval-Augmented Generation): A technique that gives AI access to external data (documents, databases) at query time. Instead of relying only on training data, the model retrieves relevant information first, then generates an answer grounded in facts.
Rate Limit: A cap on how many API requests you can make in a given time period. Prevents abuse and ensures fair access. If you hit one, you need to slow down or upgrade your plan.
RLHF (Reinforcement Learning from Human Feedback): A training technique where humans rate AI outputs, and the model learns to produce responses humans prefer. It's why modern AI feels more helpful and less robotic than earlier models.
SDK (Software Development Kit): A collection of tools, libraries, and documentation that makes it easier to build with an API. The Anthropic SDK lets you use Claude in your Python or JavaScript code with a few lines.
Subagent: A secondary AI agent spawned by a primary agent to handle a specific subtask. Like delegating work to a team member. Multiple subagents can run in parallel on different parts of a project.
System Prompt: Hidden instructions given to an AI that set its behavior, personality, and rules before the user starts chatting. It's the difference between "generic AI" and "your custom agent."
Temperature: A setting that controls how random or creative AI output is. Low temperature (0.0-0.3) = deterministic and focused. High temperature (0.7-1.0) = more creative and varied. Use low for code, high for brainstorming.
Token: The basic unit AI uses to process text. A token is roughly 3/4 of a word. "Hamburger" is 3 tokens. Tokens determine cost (you pay per token) and fit within the context window.
Tool Use: The ability of AI to call external functions — search the web, read files, run code, access APIs. This is what separates agents from chatbots. Without tools, AI can only talk. With tools, it can act.
Transformer: The neural network architecture behind all modern LLMs. Introduced in the 2017 paper "Attention Is All You Need." Uses attention mechanisms to process text in parallel rather than word by word.
Vector Database: A database designed to store and search embeddings (number representations of text/images). Used in RAG systems to find relevant documents quickly. Examples: Pinecone, Weaviate, Chroma.
Weights: Synonym for parameters. The numerical values inside a neural network that determine how it processes input. Training adjusts the weights; inference uses them.
Zero-Shot: Asking AI to do something without giving it any examples. "Translate this to French" is zero-shot. Contrast with few-shot (providing examples first). Modern LLMs are surprisingly good at zero-shot tasks.
Few-Shot: A prompting technique where you provide 2-5 examples of the desired input/output format before asking AI to process new input. Dramatically improves consistency and accuracy for structured tasks.
Context Engineering: The practice of carefully curating what information you feed to AI — not just how you ask, but what data you include. More strategic than prompt engineering. The skill that separates casual users from power users.
Agentic Workflow: A process where AI operates with autonomy: planning steps, using tools, evaluating results, and iterating without human intervention at each step. The opposite of chat-mode, where every exchange requires human input.
Skill: A multi-step workflow an agent knows how to execute by chaining tools together. "Deploy a website" is a skill (read files → build → deploy → verify). "Read file" is a tool. Tools are individual actions; skills are choreography. A good agent has: personality, goals, tools, and skills.
pRAG (Personal RAG): A RAG system built on your own knowledge base — blog posts, talks, documents, investor memos — so an AI can answer questions grounded in your actual expertise. Instead of generic AI answers, you get responses backed by your work. The new resume.
Chunking: Splitting documents into smaller pieces for storage in a vector database. Chunk size is a critical design choice: too big and you retrieve irrelevant context, too small and you lose meaning. Typical sizes: 200-1,000 tokens per chunk.
Grounding: Anchoring AI responses in real data (retrieved documents, search results, database records) rather than relying on training data alone. RAG is the most common grounding technique. Grounded answers cite sources; ungrounded answers hallucinate.
Human-in-the-Loop: A pattern where sensitive AI actions (send email, delete files, deploy code) require human approval before execution. The agent proposes, the human approves. Critical for building trust and preventing costly mistakes.
MCP Server: A lightweight program that exposes tools to AI via the Model Context Protocol. One MCP server for Gmail means every AI app can send/read email. Build the server once; any MCP-compatible agent can use it.
Orchestrator: An AI agent that coordinates other agents, breaking complex tasks into subtasks and assigning them. Like a project manager for AI teams. In multi-agent systems, the orchestrator decides who does what.
Tokenization: The process of splitting text into tokens — the fundamental units an LLM processes. Different models use different tokenizers, so the same text can produce different token counts. "Strawberry" might be 1 token or 3, depending on the tokenizer.