Workshops/ Capable Series/ Day 04 · Council
04 / 04
Pradhya Day 04 · Council 5 units · ~30 min reading + ~70 min hands-on

Today you walk out with an agent.

The full loop — Plan, Act, Observe, Reflect. Multi-step workflows. Reversibility. Safety. By the time you close your laptop, you will have a research agent running on your machine that turns a single sentence into a 500-word briefing on any topic you give it.

By the end of Day 04 · Council
  • Trace the four agent stages (Plan → Act → Observe → Reflect) in a real run
  • Read agents/research_agent.py and explain what each line of the loop does
  • Run the capstone research agent on a question from your week
  • Sketch the four stages for one workflow you do every week, on paper
§ 04.01 · Unit 24

The agent loop.

Every agent — research bot, coding assistant, sales SDR, browser automator — runs the same four-stage cycle. Once you see it, every agent in the world is the same diagram.

Plan

break down the goal

Act

call a tool

Observe

read the result

Reflect

am I done?

What happens at each stage

Plan. The model breaks a goal into smaller steps it can actually execute. “Write a briefing on the EU AI Act” becomes: find recent articles, extract claims, identify points of disagreement, draft, critique, polish.

Act. The model picks the next step and calls a tool — search, read a file, run a command, query an API. Exactly the loop from Day 03.

Observe. The model reads what came back. The result is rarely what was expected. Empty arrays, errors, garbage data, contradictions.

Reflect. Is the goal met? Do I have what I need? If not, revise the plan. This is the stage most home-grown agents skip — and it is the stage that separates a stuck agent from a useful one.

Why the loop The model has no memory between turns, no access to the world without tools, and a tendency to declare victory too early. The four stages are the corrective for each of those weaknesses — in order.

Name the four stages.

You’ll do
Watch the loop animation, then label each step of the §04.02 trace with the stage it belongs to.
Steps
  1. Watch the ring above complete one full turn: Plan → Act → Observe → Reflect.
  2. Scroll to the trace in §04.02. For each row, write down which of the four stages it is.
  3. The agent you run in §04.03 prints these as the labels [PLAN], [ACT], [OBSERVE], [REFLECT] at the start of each turn — the same four, in the same order.
Verify
Your four labels are exactly [PLAN], [ACT], [OBSERVE], [REFLECT] — the same strings the agent prints in §04.03. No fifth stage.

Stretch. Which stage do home-grown agents skip most? (Reflect.) Find the one row in the §04.02 trace where Reflect caught a mistake.

§ 04.02 · Unit 25

A real trace.

What the loop looks like in practice — a single research task, end to end.

From the last unit: You know the four stages. Watch them happen on a real research task — a handful of tool calls, one polished briefing, the loop printing its own thinking as it goes.

plan search observe read ×2 reflect draft save one search · two reads · a draft it critiques · one saved briefing
Trace · the loop left to right · each dot a model turn

Here is the actual terminal output from one run — trimmed for length, but every line is a line the script really prints. The [goal] banner, the → tool calls, and the closing [done] line come from the loop itself; the [PLAN]/[ACT]/[OBSERVE]/[REFLECT] labels are the model narrating each turn, exactly as the system prompt instructs.

$ python research_agent.py "How is Phoenix planning for the 2027 Colorado River cuts?"

[goal] How is Phoenix planning for the 2027 Colorado River cuts?

[PLAN] I will search for recent reporting, read the two or three most
authoritative sources, then draft and critique a briefing.
  → tool web_search({"query": "Phoenix 2027 Colorado River cuts plan"})
[OBSERVE] Five results. Arizona Republic and the state water-resources
department look authoritative; two blog posts look low-signal.
[ACT] Reading the two strongest sources before I draft.
  → tool fetch_url({"url": "https://www.azcentral.com/..."})
  → tool fetch_url({"url": "https://new.azwater.gov/..."})
[REFLECT] Two sources agree on the CAP allocation figure; one is from 2024
and may be stale, so I will flag it rather than lean on it.
[ACT] Drafting three paragraphs, then critiquing my own draft.
[REFLECT] My skeptic caught one unsourced number. Removing it. Saving.
  → tool save_briefing({"title": "Phoenix and the 2027 Colorado River Cuts", "body": "..."})
Saved a 3-paragraph briefing on Phoenix water policy to
briefings/2026-06-10-1432-phoenix-and-the-2027-colorado-river-cuts.md.

[done] agent halted naturally.

Four tool calls — one search, two reads, one save (the draft and critique happen inside a single model turn, no tool needed). No human in the loop until the end. This is the artifact you build in §04.03.

Read the trace.

You’ll do
Without scrolling away, count the tool calls in the transcript above and name them in order.
Steps
  1. Find every → tool line in the transcript — those are the only moments the agent reached outside itself.
  2. Write the tool names in the order they fire.
  3. Mark which lines the loop printed ([goal], → tool, [done]) versus which the model wrote (the [PLAN]/[ACT]/[OBSERVE]/[REFLECT] lines).
Verify
Your list is exactly four, in this order: web_search, fetch_url, fetch_url, save_briefing — and the draft/critique step has no tool line, because it happens entirely inside one model turn.

Stretch. The system prompt says “read no more than 4 URLs.” This run read two. Why might the agent stop reading early even when it’s allowed more? (It judged it had enough at Reflect.)

§ 04.03 · Hands-on · 40 min · the capstone

Build your research agent.

A working multi-step research agent in under 250 lines. Reads, searches, drafts, critiques, polishes. Saves the output. Lives on your machine.

From the last unit: You have seen the loop and the trace. The capstone is your own — ~250 lines of Python you read, modify, and run on a question of your choosing.

question search fetch draft briefing.md one file · runs on your laptop · output saved to disk
The capstone pipeline · what your script does

What it does

  1. Takes a research question as a CLI argument.
  2. Plans its sub-steps, then loops Plan → Act → Observe → Reflect.
  3. Calls web_search and fetch_url as needed (DuckDuckGo, no search key required).
  4. Drafts a 3-paragraph briefing.
  5. Critiques its own draft (Day 02’s Critique pattern, baked into the system prompt).
  6. Saves the polished briefing to briefings/ with a timestamped filename.
  7. Prints every [PLAN] / [ACT] / [OBSERVE] / [REFLECT] step to the terminal so you can watch it think.

Get the code

You don’t need to clone anything. Save these two files into a new folder — right-click each link and choose Save Link As…, or use the curl commands below.

Prefer the terminal? The same two files live at the URLs below — curl them into a fresh folder:

# make a folder and pull both files into it
mkdir research-agent && cd research-agent
curl -O https://d154gd40skpa9c.cloudfront.net/agents/research_agent.py
curl -O https://d154gd40skpa9c.cloudfront.net/agents/requirements.txt

Set up and run it

From inside that folder, make a virtual environment, install the one dependency, set your key, and run. On modern macOS the venv is not optional — a bare pip install is blocked (see troubleshooting below).

# 1 · create and activate an isolated environment
python3 -m venv .venv
source .venv/bin/activate
# 2 · install the anthropic SDK
pip install -r requirements.txt
# 3 · set your key (from console.anthropic.com)
export ANTHROPIC_API_KEY="sk-ant-..."
# 4 · run it on a real question from your week
python research_agent.py "a question from your week"

A full run typically costs a few cents and finishes in well under a minute.

Run the capstone agent.

You’ll do
Run research_agent.py end-to-end on a real question from your week and find the briefing it writes to disk.
Steps
  1. Download both files (research_agent.py, requirements.txt) into a new folder.
  2. Run the four setup commands above (venv → install → export key → run). If you don’t have a question handy, use the Phoenix one from §04.02.
  3. Watch the terminal: you should see [goal], then [PLAN]/[ACT]/[OBSERVE]/[REFLECT] labels and → tool lines, ending in [done] agent halted naturally.
  4. Open the new briefings/ folder the run created next to the script.
Verify
A file named like 2026-06-10-1432-your-topic.md exists in briefings/. Open it: it is three paragraphs, and at least two sentences quote a fact followed by a source URL (the system prompt requires it).

Stretch. Run it twice on the same question. The two briefings get different timestamps and may cite different sources — the web moved, or the model picked different pages to read.

If it doesn’t run Three failures cover almost every first run. (1) ANTHROPIC_API_KEY is not set — you skipped the export, or opened a new terminal tab that didn’t inherit it; re-run the export line in the same shell. (2) error: externally-managed-environment on pip install — modern macOS blocks installs into system Python; create and activate the .venv first (the first two lines above), then re-run pip install. (3) The agent reports an empty search or “no results parsed” — DuckDuckGo occasionally returns nothing to a scripted request; just re-run, or rephrase the question with more distinctive words.

Read the source

The file is written so each section maps to one part of the agent. Open research_agent.py in your editor and search for these four banner comments as you read:

  • # === TOOL IMPLEMENTATIONS === — the three tools (web_search, fetch_url, save_briefing) that run on your machine.
  • # === TOOL DECLARATIONS === — the JSON schemas the model actually sees. The descriptions are how it decides when to call each tool.
  • # === SYSTEM PROMPT === — the agent’s discipline: the four labels, the “search first, read second” rules, and Day 02’s Critique pattern.
  • # === THE LOOP ===_run(), the Plan → Act → Observe → Reflect cycle that ties it all together.
# open the agent in your editor (macOS)
open research_agent.py

Make it yours.

You’ll do
Change one rule in the system prompt, rerun on the same question, and see the output change shape.
Steps
  1. Open research_agent.py and find the # === SYSTEM PROMPT === block (the SYSTEM_PROMPT string).
  2. Make one change. Either tighten the source rule — change “Quote a specific fact, with its source URL, at least twice” to at least four times — or change the output beat: swap “Your output is three paragraphs, ~500 words” for “Your output is five bullet points, each with a source URL.”
  3. Save the file. Rerun on the same question you used before.
  4. Open the newest file in briefings/ and compare it to the previous one.
Verify
The new briefing visibly reflects your edit — four+ source URLs instead of two, or five bullets instead of three paragraphs. Same agent, same loop, different output because you changed the instructions, not the code.

Stretch. Add a brand-new tool. Copy the save_briefing pattern into a save_to_notion stub (it can just print for now), declare it in TOOLS, and tell the system prompt when to use it.

§ 04.04 · Unit 27

Reversibility, first.

A research agent reads. The next agent you build will write. The discipline that lets you ship a writing agent without losing sleep is reversibility.

From the last unit: A research agent reads. The next agent you build will write. The discipline that lets you ship a writing agent without losing sleep is reversibility.

Dangerous send · delete no undo production Reversible draft · copy · append human in the loop your agent does these · for now prefer → the cost of one extra approval click is nothing
Draft, not send · copy, not move · the agent earns trust slowly
DefaultWhy it’s safer
Draft, don’t send You see the message before it hits the inbox.
Copy, don’t move The source file is still there if the move is wrong.
Append, don’t overwrite Yesterday’s version is recoverable.
Dry-run, then run The agent prints what it would do, then waits for go-ahead.
Scope to a folder, not ~Blast radius is one directory, not your whole machine.
Log every tool call When something goes sideways, you can read the trace.

Most of these are one-line changes in the tool implementation. They are the difference between an agent that is useful and an agent that is dangerous.

Audit your agent for reversibility.

You’ll do
Check the three tools in the agent you built against this table, and name the one guardrail that would make a write-capable version safe.
Steps
  1. Open research_agent.py at def save_briefing (under # === TOOL IMPLEMENTATIONS ===).
  2. Confirm two reversibility defaults are already there: it writes a new timestamped filename every run (append, never overwrite), and it only ever writes inside briefings/ (scoped folder, not ~).
  3. Find the third default in the loop: every tool call is printed as a → tool line (log every tool call). Note the line that does it.
  4. Now imagine swapping save_briefing for send_email. Write one sentence: which row of the table would you apply, and what is the one-line change?
Verify
You can point to the exact lines for all three existing defaults (new-file write, briefings/ scope, → tool logging), and your send_email sentence names a specific row (“draft, don’t send”) and a concrete change (write a .eml draft instead of calling the send API).

Stretch. Make save_briefing refuse to write anywhere except briefings/ — add a one-line check that the resolved path is inside BRIEFINGS_DIR and rerun to confirm normal runs still pass.

§ 04.05 · Unit 28

Where the craft goes from here.

You have the loop. You have a working agent. The next frontiers are about how agents compose — with each other, with the rest of your stack, with the world.

From the last unit: You can build agents. You can run them. The frontiers from here are about how agents compose — with each other, with your stack, with the world.

you MCP multi-agent evaluation production
Four frontiers · all of them start from the loop you built

The Model Context Protocol (MCP)

A standard for connecting agents to tools and data. Instead of writing a read_file tool inside every agent, you write an MCP server once and any agent — Claude, Cowork, Code — can use it. Same shape as today, but reusable across apps.

Multi-agent systems

One agent plans. Another writes. A third critiques. A fourth posts. They share state through messages or a shared filesystem. The Capable Series ships the foundations; the Council Day extension explores when this matters and when it is over-engineering.

Evaluation

How do you know your agent is getting better? You write a suite of test cases — questions with known good answers — and run your agent against them on every change. The agent equivalent of unit tests. Most teams skip this and regret it.

Production deployment

The script you ran today is the shape. The production version adds persistence (a database), observability (every step logged), authentication, rate limits, retries, and a UI. None of it changes the core loop.


Your commitment, by Monday

The loop only pays off if it leaves the page. Before you close your laptop, turn one workflow from your week into a written plan with a date on it.

Schedule one agent into your week.

You’ll do
Pick one recurring workflow, sketch its four stages, and commit it to a calendar slot — on paper or in a note.
Steps
  1. Pick one workflow you repeat every week: the meeting brief, the weekly report, the customer triage.
  2. Write its four stages — Plan, Act, Observe, Reflect — in one line each, and list the tools the agent would need.
  3. Name its trigger: the event or time that should kick it off (“every Friday 4pm,” “each new support email”).
  4. Put its first run on your real calendar this week. Write today’s date and the run date at the top of the note.
Verify
A dated note exists (today’s date at the top) that names three things: the workflow, its trigger, and its first scheduled run — and that run is a real entry on your calendar, not just an intention.

Stretch. Re-open research_agent.py and decide which of its three tools your workflow reuses unchanged, and which one tool you’d have to write.

The reference for the week Reading this won’t make you better. The note you just dated — and the calendar slot it points to — will.

From curious to capable.