Today you walk out with an agent.
The full loop — Plan, Act, Observe, Reflect. Multi-step workflows. Reversibility. Safety. By the time you close your laptop, you will have a research agent running on your machine that turns a single sentence into a 500-word briefing on any topic you give it.
- Trace the four agent stages (Plan → Act → Observe → Reflect) in a real run
- Read agents/research_agent.py and explain what each line of the loop does
- Run the capstone research agent on a question from your week
- Sketch the four stages for one workflow you do every week, on paper
The agent loop.
Every agent — research bot, coding assistant, sales SDR, browser automator — runs the same four-stage cycle. Once you see it, every agent in the world is the same diagram.
Plan
break down the goal
Act
call a tool
Observe
read the result
Reflect
am I done?
What happens at each stage
Plan. The model breaks a goal into smaller steps it can actually execute. “Write a briefing on the EU AI Act” becomes: find recent articles, extract claims, identify points of disagreement, draft, critique, polish.
Act. The model picks the next step and calls a tool — search, read a file, run a command, query an API. Exactly the loop from Day 03.
Observe. The model reads what came back. The result is rarely what was expected. Empty arrays, errors, garbage data, contradictions.
Reflect. Is the goal met? Do I have what I need? If not, revise the plan. This is the stage most home-grown agents skip — and it is the stage that separates a stuck agent from a useful one.
Name the four stages.
- Watch the ring above complete one full turn: Plan → Act → Observe → Reflect.
- Scroll to the trace in §04.02. For each row, write down which of the four stages it is.
- The agent you run in §04.03 prints these as the labels
[PLAN],[ACT],[OBSERVE],[REFLECT]at the start of each turn — the same four, in the same order.
[PLAN], [ACT], [OBSERVE], [REFLECT] — the same strings the agent prints in §04.03. No fifth stage.Stretch. Which stage do home-grown agents skip most? (Reflect.) Find the one row in the §04.02 trace where Reflect caught a mistake.
A real trace.
What the loop looks like in practice — a single research task, end to end.
From the last unit: You know the four stages. Watch them happen on a real research task — a handful of tool calls, one polished briefing, the loop printing its own thinking as it goes.
Here is the actual terminal output from one run — trimmed for length, but every line is a line the script really prints. The [goal] banner, the → tool calls, and the closing [done] line come from the loop itself; the [PLAN]/[ACT]/[OBSERVE]/[REFLECT] labels are the model narrating each turn, exactly as the system prompt instructs.
$ python research_agent.py "How is Phoenix planning for the 2027 Colorado River cuts?" [goal] How is Phoenix planning for the 2027 Colorado River cuts? [PLAN] I will search for recent reporting, read the two or three most authoritative sources, then draft and critique a briefing. → tool web_search({"query": "Phoenix 2027 Colorado River cuts plan"}) [OBSERVE] Five results. Arizona Republic and the state water-resources department look authoritative; two blog posts look low-signal. [ACT] Reading the two strongest sources before I draft. → tool fetch_url({"url": "https://www.azcentral.com/..."}) → tool fetch_url({"url": "https://new.azwater.gov/..."}) [REFLECT] Two sources agree on the CAP allocation figure; one is from 2024 and may be stale, so I will flag it rather than lean on it. [ACT] Drafting three paragraphs, then critiquing my own draft. [REFLECT] My skeptic caught one unsourced number. Removing it. Saving. → tool save_briefing({"title": "Phoenix and the 2027 Colorado River Cuts", "body": "..."}) Saved a 3-paragraph briefing on Phoenix water policy to briefings/2026-06-10-1432-phoenix-and-the-2027-colorado-river-cuts.md. [done] agent halted naturally.
Four tool calls — one search, two reads, one save (the draft and critique happen inside a single model turn, no tool needed). No human in the loop until the end. This is the artifact you build in §04.03.
Read the trace.
- Find every
→ toolline in the transcript — those are the only moments the agent reached outside itself. - Write the tool names in the order they fire.
- Mark which lines the loop printed (
[goal],→ tool,[done]) versus which the model wrote (the[PLAN]/[ACT]/[OBSERVE]/[REFLECT]lines).
web_search, fetch_url, fetch_url, save_briefing — and the draft/critique step has no tool line, because it happens entirely inside one model turn.Stretch. The system prompt says “read no more than 4 URLs.” This run read two. Why might the agent stop reading early even when it’s allowed more? (It judged it had enough at Reflect.)
Build your research agent.
A working multi-step research agent in under 250 lines. Reads, searches, drafts, critiques, polishes. Saves the output. Lives on your machine.
From the last unit: You have seen the loop and the trace. The capstone is your own — ~250 lines of Python you read, modify, and run on a question of your choosing.
What it does
- Takes a research question as a CLI argument.
- Plans its sub-steps, then loops Plan → Act → Observe → Reflect.
- Calls
web_searchandfetch_urlas needed (DuckDuckGo, no search key required). - Drafts a 3-paragraph briefing.
- Critiques its own draft (Day 02’s Critique pattern, baked into the system prompt).
- Saves the polished briefing to
briefings/with a timestamped filename. - Prints every
[PLAN] / [ACT] / [OBSERVE] / [REFLECT]step to the terminal so you can watch it think.
Get the code
You don’t need to clone anything. Save these two files into a new folder — right-click each link and choose Save Link As…, or use the curl commands below.
- research_agent.py — the agent (the file you just watched run).
- requirements.txt — its one dependency, the
anthropicSDK.
Prefer the terminal? The same two files live at the URLs below — curl them into a fresh folder:
# make a folder and pull both files into it mkdir research-agent && cd research-agent curl -O https://d154gd40skpa9c.cloudfront.net/agents/research_agent.py curl -O https://d154gd40skpa9c.cloudfront.net/agents/requirements.txt
Set up and run it
From inside that folder, make a virtual environment, install the one dependency, set your key, and run. On modern macOS the venv is not optional — a bare pip install is blocked (see troubleshooting below).
# 1 · create and activate an isolated environment python3 -m venv .venv source .venv/bin/activate # 2 · install the anthropic SDK pip install -r requirements.txt # 3 · set your key (from console.anthropic.com) export ANTHROPIC_API_KEY="sk-ant-..." # 4 · run it on a real question from your week python research_agent.py "a question from your week"
A full run typically costs a few cents and finishes in well under a minute.
Run the capstone agent.
research_agent.py end-to-end on a real question from your week and find the briefing it writes to disk.- Download both files (research_agent.py, requirements.txt) into a new folder.
- Run the four setup commands above (venv → install → export key → run). If you don’t have a question handy, use the Phoenix one from §04.02.
- Watch the terminal: you should see
[goal], then[PLAN]/[ACT]/[OBSERVE]/[REFLECT]labels and→ toollines, ending in[done] agent halted naturally. - Open the new
briefings/folder the run created next to the script.
2026-06-10-1432-your-topic.md exists in briefings/. Open it: it is three paragraphs, and at least two sentences quote a fact followed by a source URL (the system prompt requires it).Stretch. Run it twice on the same question. The two briefings get different timestamps and may cite different sources — the web moved, or the model picked different pages to read.
ANTHROPIC_API_KEY is not set — you skipped the export, or opened a new terminal tab that didn’t inherit it; re-run the export line in the same shell. (2) error: externally-managed-environment on pip install — modern macOS blocks installs into system Python; create and activate the .venv first (the first two lines above), then re-run pip install. (3) The agent reports an empty search or “no results parsed” — DuckDuckGo occasionally returns nothing to a scripted request; just re-run, or rephrase the question with more distinctive words.
Read the source
The file is written so each section maps to one part of the agent. Open research_agent.py in your editor and search for these four banner comments as you read:
# === TOOL IMPLEMENTATIONS ===— the three tools (web_search,fetch_url,save_briefing) that run on your machine.# === TOOL DECLARATIONS ===— the JSON schemas the model actually sees. The descriptions are how it decides when to call each tool.# === SYSTEM PROMPT ===— the agent’s discipline: the four labels, the “search first, read second” rules, and Day 02’s Critique pattern.# === THE LOOP ===—_run(), the Plan → Act → Observe → Reflect cycle that ties it all together.
# open the agent in your editor (macOS)
open research_agent.py
Make it yours.
- Open
research_agent.pyand find the# === SYSTEM PROMPT ===block (theSYSTEM_PROMPTstring). - Make one change. Either tighten the source rule — change “Quote a specific fact, with its source URL, at least twice” to at least four times — or change the output beat: swap “Your output is three paragraphs, ~500 words” for “Your output is five bullet points, each with a source URL.”
- Save the file. Rerun on the same question you used before.
- Open the newest file in
briefings/and compare it to the previous one.
Stretch. Add a brand-new tool. Copy the save_briefing pattern into a save_to_notion stub (it can just print for now), declare it in TOOLS, and tell the system prompt when to use it.
Reversibility, first.
A research agent reads. The next agent you build will write. The discipline that lets you ship a writing agent without losing sleep is reversibility.
From the last unit: A research agent reads. The next agent you build will write. The discipline that lets you ship a writing agent without losing sleep is reversibility.
| Default | Why it’s safer |
|---|---|
| Draft, don’t send | You see the message before it hits the inbox. |
| Copy, don’t move | The source file is still there if the move is wrong. |
| Append, don’t overwrite | Yesterday’s version is recoverable. |
| Dry-run, then run | The agent prints what it would do, then waits for go-ahead. |
Scope to a folder, not ~ | Blast radius is one directory, not your whole machine. |
| Log every tool call | When something goes sideways, you can read the trace. |
Most of these are one-line changes in the tool implementation. They are the difference between an agent that is useful and an agent that is dangerous.
Audit your agent for reversibility.
- Open
research_agent.pyatdef save_briefing(under# === TOOL IMPLEMENTATIONS ===). - Confirm two reversibility defaults are already there: it writes a new timestamped filename every run (append, never overwrite), and it only ever writes inside
briefings/(scoped folder, not~). - Find the third default in the loop: every tool call is printed as a
→ toolline (log every tool call). Note the line that does it. - Now imagine swapping
save_briefingforsend_email. Write one sentence: which row of the table would you apply, and what is the one-line change?
briefings/ scope, → tool logging), and your send_email sentence names a specific row (“draft, don’t send”) and a concrete change (write a .eml draft instead of calling the send API).Stretch. Make save_briefing refuse to write anywhere except briefings/ — add a one-line check that the resolved path is inside BRIEFINGS_DIR and rerun to confirm normal runs still pass.
Where the craft goes from here.
You have the loop. You have a working agent. The next frontiers are about how agents compose — with each other, with the rest of your stack, with the world.
From the last unit: You can build agents. You can run them. The frontiers from here are about how agents compose — with each other, with your stack, with the world.
The Model Context Protocol (MCP)
A standard for connecting agents to tools and data. Instead of writing a read_file tool inside every agent, you write an MCP server once and any agent — Claude, Cowork, Code — can use it. Same shape as today, but reusable across apps.
Multi-agent systems
One agent plans. Another writes. A third critiques. A fourth posts. They share state through messages or a shared filesystem. The Capable Series ships the foundations; the Council Day extension explores when this matters and when it is over-engineering.
Evaluation
How do you know your agent is getting better? You write a suite of test cases — questions with known good answers — and run your agent against them on every change. The agent equivalent of unit tests. Most teams skip this and regret it.
Production deployment
The script you ran today is the shape. The production version adds persistence (a database), observability (every step logged), authentication, rate limits, retries, and a UI. None of it changes the core loop.
Your commitment, by Monday
The loop only pays off if it leaves the page. Before you close your laptop, turn one workflow from your week into a written plan with a date on it.
Schedule one agent into your week.
- Pick one workflow you repeat every week: the meeting brief, the weekly report, the customer triage.
- Write its four stages — Plan, Act, Observe, Reflect — in one line each, and list the tools the agent would need.
- Name its trigger: the event or time that should kick it off (“every Friday 4pm,” “each new support email”).
- Put its first run on your real calendar this week. Write today’s date and the run date at the top of the note.
Stretch. Re-open research_agent.py and decide which of its three tools your workflow reuses unchanged, and which one tool you’d have to write.
From curious to capable.