What the model actually does.
Two hours that replace a year of magical thinking. By the end you will have written your first prompt with all three ingredients, run your first API call from a terminal, and seen — in your own browser — why the context window matters.
- Explain in one sentence what an LLM is doing on every token
- Predict when a long conversation will start to lose context
- Recognize a hallucination and choose the right defense for it
- Write a prompt with all three ingredients and run it from your terminal
What an LLM actually is.
Large Language Models are pattern-matchers trained to predict the next word. That framing is almost everything you need.
A model like Claude takes the text you give it and produces the most plausible next text, one token at a time. It has no memory between conversations, no live access to the internet unless given tools, and no preferences of its own.
Watch a single sentence get assembled below — each highlighted token was chosen from a small set of plausible candidates.
Each yellow token is the one chosen. The faint ones beside it are runner-ups the model considered and rejected. Multiply this decision by every word in every response and you have the model.
Why this matters in your work
- The model has only what you put in front of it. If you want it to know your style, your project, your constraints — you have to tell it.
- It can sound certain about things it has no way to know. Pattern-plausible isn’t the same as true. Your judgment is the check.
- It does not remember. Each conversation is a fresh sheet unless you save it, name it, or paste it back in next time.
Beat the model at next-word prediction.
- Open claude.ai (free account is fine).
- On paper, write the single next word you think comes after each opener: (a) “The opposite of hot is”; (b) “In 2019 the company quietly”; (c) “My favorite thing about Mondays is”.
- Paste each opener into claude.ai, one at a time, and add: “Continue with exactly one word, then stop.”
- Write the word Claude returned next to your guess for all three.
Stretch. The deterministic opener (a) should give the same word every run; the open-ended one (c) varies. Run each three times and confirm which is stable — that is temperature, previewed (§01.04).
The context window.
Everything the model can “see” right now is in its context window. Think of it as working memory for this conversation.
From the last unit: Now that you know the model is doing next-token prediction, here’s its biggest constraint — and how to design around it.
The context window holds your prompt, prior turns, any files Claude has read, and the response being generated. Once a conversation grows past the cap, older content falls outside the window — and the model loses track of it.
Drag the slider below. Watch what falls out the back.
Validation: this interaction is based on Claude’s context-window docs: platform.claude.com/docs/en/build-with-claude/context-windows.
Practical moves
- Put the most important context at the top of long prompts — it stays visible longest.
- Re-state constraints if you’re three or more turns deep. The model hasn’t “forgotten”; it can no longer see them.
- When you sense drift — wrong tone, lost requirement — start fresh. A new conversation is cheaper than fighting an old one.
Find where the thread snaps.
- In a new claude.ai chat, send: “For the rest of this chat, end every reply with the word PINEAPPLE. Confirm.”
- Have ~15 short back-and-forth turns about any topic (paste a long article, ask follow-ups — whatever fills the window fastest).
- Watch for the first reply that omits PINEAPPLE. Note the turn number.
- Then send: “What word did I ask you to end every reply with?”
Stretch. Re-paste the instruction at the point of failure (Practical move #2 above). Confirm PINEAPPLE returns for the next few turns — restating beats fighting an old thread.
Hallucinations.
The model sometimes produces confident, plausible-sounding text that is wrong. Not a bug — a consequence of how prediction works.
From the last unit: The model can only attend to what is in its window. When you ask about something outside that window, the next-token machine still produces plausible text. That is what hallucination is.
Why they happen
The model is predicting plausible text. When you ask for a fact it does not reliably know, the most plausible-sounding text is often almost-right text — a real citation that doesn’t exist, a quote attributed to the wrong author, a statistic close to but not the published number. It fills the gap rather than admit not knowing.
The three defenses
- Ground the model. Want a summary of a paper? Give it the paper. Want analysis of your contract? Paste the contract. Don’t ask the model to recall facts; give it the facts and ask it to think.
- Ask for sources, then check. If you cannot verify a claim, treat it as a hypothesis, not a fact.
- Reduce confident framing. Tell the model: “If you’re not sure, say so.” Models obey instructions to express uncertainty.
Catch a hallucination in the act.
- Pick a narrow fact you can verify in one click — a specific clause in a standard you work with, the year a niche regulation changed, a stat from a report you know. Ask Claude for it plainly and save its answer (answer A).
- Open the actual source (the standard, the report, the regulator’s page) and check the real value.
- Now paste the source text into a fresh chat and re-ask, prefixed with the grounding prompt from this unit (the copy-able block above). Save that answer (answer B).
- Compare A and B against the source.
Stretch. Add “cite the exact line you used” to the grounded prompt. A real citation you can point to in the source is the difference between recall and grounding.
Tokens and temperature.
Two technical knobs you don’t need to obsess over, but should understand once.
From the last unit: You’ve seen what the model does (U01), what limits it (U02), and how it fails (U03). Two technical knobs change how all three of those behave.
Tokens
The model reads and writes in chunks called tokens — roughly 4 characters or about three-quarters of a word in English. A 500-word email is around 700 tokens. The context window and the model’s price are both measured in tokens.
750 words ≈ 1,000 tokens.
Claude Opus 4.7 & Sonnet 4.6: 1,000,000 tokens — roughly a 2,500-page book in working memory at once.
Claude Haiku 4.5: 200,000 tokens.
Temperature
Controls how creative versus deterministic the model is. Low = consistent and predictable. High = varied and surprising.
| Setting | Use for |
|---|---|
| Low (0 – 0.3) | Code, factual extraction, classification — anywhere the same input should produce the same output. |
| Mid (0.4 – 0.7) | Most everyday work. The default for chat products. |
| High (0.8 – 1.0) | Brainstorming, creative writing — where you want variety across runs. |
Try the 3 model tiers.
- Pick a task with a clear right answer (e.g. “Extract the 3 dates from this paragraph as ISO YYYY-MM-DD” with a paragraph you supply).
- Send it on Opus. Note: correct? (yes/no) and how many seconds until the reply finished.
- Switch the model picker to Sonnet. Send the same prompt. Same two notes.
- Switch to Haiku. Send the same prompt. Same two notes.
Stretch. In code (after §01.07), the cost gap is explicit: Haiku is $1/$5 per million in/out tokens, Sonnet $3/$15, Opus $5/$25. Most production should be the cheapest tier that clears your quality bar.
Models vs products.
“Claude” is a family of models. The Claude app, Code, Cowork, and Design are products built on top of those models.
From the last unit: You now have the mental model of how the model works. The next question is which Claude product to point it at.
| Product | What it is | Reach for it when |
|---|---|---|
| Claude (chat) | Conversational text in, text out. | You need to think. |
| Claude Cowork | Agentic. Reads files, takes action. | You need to do. |
| Claude Code | Command-line. Automation, scale, engineering. | You need to automate. |
| Claude Design | Visual prototyping with AI. | You need to make. |
| The API | Raw access from your own code. | You need to build. |
By the end of Day 04 you will have used Claude (chat) to think through a problem, Claude Code to scaffold an agent, and the API to make the agent run autonomously. Different products, same model.
Same task, two products.
- Pick a task that touches a file or your screen — e.g. “summarize the key risks in this document” (have the file ready).
- Do it in Claude chat (claude.ai): paste the text, get the answer.
- Do the same task in a do-it surface — Cowork or Claude Code if you have access; if not, in chat enable a connector or upload the file directly so Claude reads it itself instead of you pasting.
- Note which surface needed you to fetch and paste, and which one reached the material on its own.
Stretch. Map the other three rows of the table to a task you actually have this week: when would you reach for Code (automate), Design (make), or the raw API (build)?
Anatomy of a prompt.
Almost every prompt that disappoints is missing one of three ingredients: context, goal, or constraint.
From the last unit: Whichever product you reach for, the input is a prompt. The same three ingredients make a prompt useful in any of them.
- Context — who you are, what you’re working on.
- Goal — what you want the response to do.
- Constraint — what NOT to do, or the shape of the answer.
Missing all three
The model has to guess who the team is, what news to deliver, what tone you want. So it returns the most generic possible email — and you blame the model.
All three present
Now the model has a real job. Same model, same minute, twenty times the value.
Pick the one prompt you’ll keep iterating.
- Pick a real task from your week (not a hypothetical).
- Write the first-pass prompt as you’d normally type it.
- Save it in a file:
prompts/<name>.md. - You’ll come back to this in Day 2 (patterns), Day 3 (tools), and Day 4 (agent).
Stretch. Pair: have a colleague also write a baseline for the same task. Compare on Day 4.
Your first API call.
Every concept above becomes real when you call the model from your own code. Fifteen minutes. No frameworks.
From the last unit: Everything from Units 01–06 becomes real when you call the model from your own code. Here is the smallest possible program that does it.
- hello_claude.py — this unit’s first call
- tool_use_demo.py — the tool loop (Day 03)
- research_agent.py — the capstone agent (Day 04)
- requirements.txt — the one dependency (
anthropic)
pip install -r requirements.txt installs everything these scripts need.
Step 1 — Set up
Install the Anthropic Python SDK and set your API key as an environment variable. Get a key at console.anthropic.com — first $5 of usage is free.
# macOS / Linux python3 -m venv .venv source .venv/bin/activate pip install anthropic export ANTHROPIC_API_KEY="sk-ant-..."
Step 2 — Make the call
Save this as hello_claude.py. Then run python hello_claude.py.
from anthropic import Anthropic client = Anthropic() # reads ANTHROPIC_API_KEY from your env response = client.messages.create( model="claude-sonnet-4-6", max_tokens=400, messages=[ { "role": "user", "content": ( "You are a senior product reviewer at a hard-news " "publication. You care about clarity and the absence " "of marketing language.\n\n" "Critique this sentence: \"We leverage AI to unlock " "transformative outcomes for our customers.\"" ), } ], ) print(response.content[0].text)
What just happened
- You used Role + Goal + Constraint in a single message. The model knew who to be, what to do, and what to look for.
- The model received exactly your text and nothing else. No memory of past conversations. No web. Just your prompt and its training.
- You got back text — assembled token-by-token, like the animation at the top of this page, only faster.
Make your first API call — and print what it cost.
hello_claude.py end-to-end from a fresh virtual environment, then make it print the exact cost of the call.- In the folder where you saved the files (see Get the code above), create and activate the venv, then install: run the four lines from Step 1.
- Set your key:
export ANTHROPIC_API_KEY="sk-ant-..."(get one free at console.anthropic.com). - Run it:
python hello_claude.py. Read the critique it prints. - Add two lines to the bottom of the file and re-run to see the cost — Sonnet is $3 / $15 per million input / output tokens:
u = response.usage print(f"cost: ${u.input_tokens/1e6*3 + u.output_tokens/1e6*15:.5f}")
cost: $0.005… — a real number, under one cent. You have called the model from your own code and measured the bill.Stretch. Swap the prompt for a real piece of your own writing and re-run. The cost line moves with the token counts — longer input and output cost more.