WEEK 5 · AI FRIDAY · MAY 15, 2026

Under
the Hood.

Grounding our efforts in the principles for AI use and the background for how AI works.

Preston Magouirk · DC CAP Enterprise AI Leadership Pilot · The mechanics behind four weeks of reps.

The setup

Terms thrown around in the
AI conversation. Mostly nowhere else.

Context. Tokens. Memory. Models. Effort. None of these show up in a comparable way anywhere outside this room. So they feel mysterious, even when you have been using them daily for four weeks.

What is context?
What are tokens?
What is memory?
What's model drift?
Which model do I use?
What is hallucination?
How do I verify?
What's effort?

These are not theoretical anymore. You have been experiencing them. Today we put names on what you have already lived.

The principles you've been operating

Four weeks of reps.
Same scaffold.

The 4Ds are how we operate. The 3As are the shape your delegation takes. The mechanics underneath them are why those moves work — or fail.

Delegation
Choosing the modality.
Description
Naming the work precisely.
Discernment
Judging what's worth building.
Diligence
Verifying before it ships.
AutomationClaude runs, you review.
AugmentationYou and Claude think together.
AgencyClaude operates inside your guardrails.

You have done the reps. The next eight slides put a real-world parallel beside each mechanic so the names stick.

Q1 · Description

What is a context window?

A real work scenario

An executive prepping for a board meeting.

An executive walks into a board meeting with whatever they reviewed before the door closed. The financials they pulled. The director questions from last quarter. The funder dossiers their team sent over. Whatever they didn't open isn't available when a board member asks a curveball.

What's loaded into the prep window decides what they can answer in the room. Everything else is a best guess.

Universal across executive practice. Studied directly in decision research on information constraints (Kahneman, Klein, and others).
In your Claude work

The context window is the same constraint.

The window holds what you've pasted, what Claude has said back, and any reference files loaded. Whatever's not in the window can't shape the answer.

Treat each chat like the prep window for one decision. Load names, dates, voice rules, audience, format. When the task ends, close the chat. For knowledge that should persist across meetings, use a Project.

The takeaway: Description quality is bounded by what sits in the window.

Q2 · Description

Why does context matter?

A real work scenario

A doctor without your history.

A patient walks in with chest pain. With no medical history, that symptom could be heartburn, anxiety, a heart attack, or a dozen other things. The doctor guesses.

Add history — cardiac stent six months ago. Add family — father died of MI at 52. Add behavior — ran six miles before symptoms started. Each piece of context narrows the diagnosis to the one that fits.

Diagnostic accuracy correlates more strongly with quality of patient history than with imaging or labs. (Hampton et al., BMJ 1975; replicated repeatedly since.)
In your Claude work

Claude is the doctor.

"Write me an email to a parent" gets you the average of the internet. Add the named family, the specific deadline, the cycle date, the voice rules — and Claude narrows to the answer that fits DC CAP.

If you keep retyping the same context every week, that context belongs in a Project. The doctor doesn't ask for your history every time. They have the chart.

The takeaway: Empty context, generic answer. Rich context, sharp answer.

Q3 · Internal & external memory

Memory comes in two layers.

A real work scenario

An attorney joining a new case.

External memory: law school, the bar exam, fifteen years of practice. Fixed. Brought to every case.

Internal memory: the case file — complaint, depositions, discovery, prior motions. Variable. Different in every matter.

Strong external + empty case file = malpractice. Empty external + full file = rookie. Real lawyers run both.

Standard legal practice. Explicit in CLE curricula on case preparation.
In your Claude work

Same two-layer architecture.

Training data is Claude's external memory. What the model learned from the public internet through its training cutoff. Fixed.

Context is Claude's internal memory. What's loaded in this chat right now. Your data, voice rules, files. You decide what goes in.

Empty context = Claude guessing from generic training. Asking about events after the cutoff = Claude inventing.

The takeaway: External memory is what the model knows about the world. Internal memory is what you've told it about your work. Both layers, every time.

Q3 · Precision & recall

Two ways memory can fail.

A real work scenario

The spam filter in your inbox.

Precision: of the emails marked as spam, what percent are spam? Low precision = important emails in the spam folder. False positives.

Recall: of all the spam coming in, what percent does the filter catch? Low recall = spam piling up in the inbox. False negatives.

Push precision up, recall drops. Push recall up, precision drops. Every filter trades them off.

Standard information retrieval metrics. Salton & McGill, Introduction to Modern Information Retrieval (1983).
In your Claude work

Two failure modes, two fixes.

Precision failure = hallucination. The model confidently "remembers" something and is wrong. The Mata case was a precision failure — six fake citations asserted as real.

Recall failure = forgetting what you told it earlier. The model should have used something from earlier and didn't. Drift, mechanically.

Verify catches precision failures. Re-anchoring catches recall failures.

The takeaway: Precision and recall are the two ways memory lets you down. Both have fixes — once you can name which one you're seeing.

Q4 · Diligence signal

What is model drift?

A real work scenario

Tay, 2016.

Microsoft launched a chatbot named Tay on Twitter on March 23, 2016. Tay was designed to learn from conversation. Within 16 hours, trolls had steered her into producing racist and offensive content. Microsoft pulled her offline.

The mechanism is the same one that happens in every long chat — quieter, slower. The longer a conversation runs, the more recent inputs dominate the original instructions. Newer signals crowd out older ones.

Microsoft Tay launch and shutdown, March 23–24, 2016. Widely covered; case study in every responsible-AI curriculum since.
In your Claude work

The chat that quietly drifted.

You correct Claude on something at message 5. By message 30, the same error is back. Tone shifts. Earlier rules dilute. That's drift.

The fix: start fresh when the trail feels off. Re-anchor inside long chats with a reminder. A Project loads the foundational context fresh every time you open a new conversation inside it.

The takeaway: When the answer thins, that's a Diligence signal — not a Claude failure.

Q5 · Delegation

What model should I use?

A hospital matches case to clinician. A specialist for the complex procedure. A general practitioner for the routine visit. A triage nurse for the initial intake. Same building. Different jobs. Claude works the same way.

Specialist · careful

Opus 4.6

Deep multi-step reasoning. Slowest. Most expensive.

Highest-stakes drafting. Board memos. Funder LOIs. Cross-document audits. Anything where being wrong is costly. Don't use it for routine emails or simple summaries.

General practitioner · balanced

Sonnet 4.6

The DC CAP default. Strong reasoning, everyday speed.

Counselor emails. Partnership updates. Briefings. First drafts of grant narratives. Most Projects and most Skills. If you don't know which to pick, this is the answer.

Triage nurse · fast

Haiku 4.5

Built for speed and volume.

Quick lookups. Classifications. Pulling structured data out of unstructured text. Don't use it for anything signed by a person or anything that needs voice.

The takeaway: Model selection is a Delegation decision. The right level of care for the case — not the most expensive option in the building.

Q6 · Diligence

What is hallucination?

A real work scenario

Mata v. Avianca, 2023.

Two New York lawyers used ChatGPT to write a personal injury brief. ChatGPT cited six prior cases — including Varghese v. China Southern Airlines — with fabricated judge names, quotes, and citation numbers.

Asked if the cases were real, ChatGPT said yes and pointed to "Westlaw and LexisNexis." None existed. Judge Castel sanctioned the lawyers $5,000. He called the analysis "gibberish."

Mata v. Avianca, Inc., No. 1:22-cv-01461 (S.D.N.Y. 2023).
In your Claude work

Pattern completion, not knowledge.

Hallucination is when a model confidently states something untrue. A fabricated quote. A made-up statistic. A paper that doesn't exist.

It happens because LLMs are pattern completers. When they don't know, they generate text that sounds like the right kind of answer. Names, numbers, dates, citations are highest-risk.

The takeaway: The lawyers asked ChatGPT to verify itself. ChatGPT said yes. Don't ask the model to grade its own homework.

Q6 · Diligence — what it costs

Two more, with the bill attached.

Google · February 2023

Bard's first demo.

In its public launch ad, Bard was asked what JWST discoveries to tell a 9-year-old. Bard said JWST "took the very first pictures" of a planet outside our solar system.

Wrong. The first exoplanet image came from the European Southern Observatory's VLT in 2004. Reuters caught it the day of Google's Paris AI event.

Alphabet stock dropped 7.7%. ~$100 billion in market cap erased in 24 hours.
CNN Business, NPR, Reuters (Feb 8–9, 2023).
Air Canada · February 2024

The bereavement-fare chatbot.

Jake Moffatt asked Air Canada's chatbot about bereavement fares after his grandmother died. The chatbot told him he could apply for a refund within 90 days. The actual policy required pre-booking.

The BC Civil Resolution Tribunal sided with Moffatt: a company is liable for what its chatbot says. "Air Canada is responsible for all the information on its website."

Air Canada ordered to pay $812.02. Precedent set across Canadian jurisdictions.
Moffatt v. Air Canada, 2024 BCCRT 149.

The takeaway: Hallucination has a market price, a legal price, and a reputational price. Verify before you let it ship.

Q7 · Diligence

How do I verify and validate?

A real work scenario

The pre-flight checklist + the flight plan.

Before every takeoff, a commercial pilot runs two checks. The pre-flight checklist verifies every part works — engines, controls, fuel, instruments. The flight plan validates the route fits the mission — weather, weight, fuel reserves.

A plane in perfect condition flying into a thunderstorm is verified, not validated. A flawless plan in a plane with a broken altimeter is validated, not verified. Both mandatory under FAA rules.

14 CFR § 91.103 (Preflight action); 14 CFR § 91.213 (Inoperative instruments).
In your Claude work

Two gates. Both required.

Verify — are the facts true? Names, numbers, dates, citations. Cross-check anything material. Don't ask the model to grade itself.

Validate — does it fit the work? Right audience, right format, right tone. Read it as the audience before you send.

For high-stakes work, run the adversarial-audit Skill from Week 4 — or hand it to a teammate.

The takeaway: Diligence is two moves, not one. Design the gates into the workflow.

Q8 · Description

How do I write a better prompt?

A real work scenario

Briefing a designer to build a brochure.

Vague: "Make us a brochure." Designer makes the choices you didn't make. Revisions take six rounds. The project misses deadline.

Sharp: "Trifold brochure, 8.5×11 folded to three panels. Audience: high school juniors. Reading level: grade 9. Logo at 1″ minimum. Five program features bulleted. A CTA at the bottom. Photos from our approved set. First draft Friday, two revision rounds, final by the 28th."

One round. The vendor delivers what was needed.

Universal across knowledge work involving creative or specialist vendors. The first hour of a project decides whether the next hundred go to plan.
In your Claude work

Five parts.

1. State the work, not the task.
2. Anchor in concrete facts — named partners, real numbers, real dates.
3. Tell Claude the audience and the format.
4. Iterate in dialogue. Show Claude what's wrong; don't start over.
5. Save what works.

Every great prompt is a SKILL.md waiting to be written. The more reps you put in on the five-part structure, the less you have to rewrite it next time.

The takeaway: Description is the most valuable move in this whole pilot.

The compound

Eight mechanics. Two effects.

When we get these right, two things happen at once. Work quality goes up. The error surface drops. Every new tool we pick up from here — Skills, automations, agents — amplifies both effects.

01
Better chats
Focused windows. The right model. Drift caught early. Verification before send.
02
Better tasks
Five-part prompts. Anchored in facts. Named audience. Iteration replaces restart.
03
Better Projects
Reference files do the memory. Voice carries. Twelve runs inherit what you built once.
04
Better Skills
Procedures with verify-and-validate gates inside them. Anyone on the team runs the same move.

The new tools scale the quality when the basics are solid. They scale the errors when they're not. Same mechanism, two directions.

Your turn

Bring an example.

Walk us through a moment from this week. Pick one of the eight. The reasoning matters more than the polish.

01
Name the mechanic.

Which of the eight did you wrestle with this week — context, memory, drift, model choice, hallucination, verify, validate, or the prompt itself?

02
Show the move.

What did you do? What changed when you got it right — or what showed up when you didn't?

03
Pass it forward.

What's in your capstone scope now? Where is a Skill or Project you're building this week — and what would a teammate need from it to run it cold?