Why your assistant gets it wrong with a straight face — and the verification discipline that catches it before it ships.
Preston Magouirk · DC CAP Enterprise AI Leadership Pilot · Diligence in depth.
Week 5 named the mechanics. Today we go deeper on one — Claude getting things confidently wrong — and build the stack newer users can run on every draft. Eight questions you've all asked once you've watched a hallucination land.
You have all seen a confident wrong answer. Today we put names on what to do about it.
External memory — what the model absorbed in training — is strong on patterns and weak on specifics. These five are where the pattern fails most reliably. They are also where the verification stack lives.
The closer the question gets to your specific world, the further it lands from Claude's strong patterns. Your world is the highest-risk zone of all — it never appeared in training data.
When eyewitnesses describe a crime, they don't replay a recording. They reconstruct what likely happened from fragments and patterns. Elizabeth Loftus's classic work showed witnesses inserting details — a yield sign that wasn't there, a "stop" sign that was — because the pattern of a traffic accident filled the gaps.
The witnesses weren't lying. They were generating. And the gaps filled in the most plausible-sounding way, with full subjective confidence.
Claude predicts the next word that fits the pattern of an answer. There's no database query happening underneath. When the pattern is strong — "the capital of France is" — the prediction is reliable. When the pattern is weak — "the most-cited paper on nonprofit AI adoption is" — the model generates something that looks like an answer.
A plausible title. A plausible author. A plausible journal. None of which need to exist.
The takeaway: Every confident answer is a completion. Same mechanism the witness uses — at machine speed and machine scale.
Between 2009 and 2011, Death Valley National Park logged a cluster of incidents — drivers following GPS routes onto closed roads, dry lakebeds, and dead-end canyons. People died. Park rangers started calling it "death by GPS." The National Park Service eventually posted warning signs at the boundaries.
The GPS didn't say "I'm 30% sure this turn is right." It said "Turn right in 200 feet" in the same confident tone whether the road existed or not. The interface had one register — certain — regardless of what was actually underneath.
Claude has no internal sense of how sure it is. There's no quiet "I'm 30% confident" threaded through every sentence. Training data is mostly written by humans who sound confident, so the model does too. A fabricated case citation reads the same as a real one — to the model, both are answers shaped like answers.
The fix is to ask: "For each claim in this draft, mark high / medium / low confidence and tell me why." The hedging the model wouldn't volunteer comes out when you ask.
The takeaway: Confidence is the model's default register. Calibration is something you have to request.
A simultaneous interpreter carries roughly 95% of general meaning across a fast speech. But the 5% loss isn't even. Studies of UN-level interpreters — Gerver in the 1970s, Christoffels and others since — show the failure mode is specific. Names, numbers, and dates are where interpreters trip. Even at the highest level, those three categories produce the most errors and the most "I'll come back to that" pauses.
Why those three? They carry no surrounding pattern. A name is itself or it isn't. A number has no synonym. Specifics have nothing to recover from.
Names, numbers, dates, citations, your world — that's where Claude's pattern-matching fails most reliably. The interpreter's three weak spots are three of yours. Citations sit alongside as a related risk: a paper title and citation number is just another specific.
Your world is the deepest weakness. Claude wasn't trained on your partner list, scholar count, or renewal calendar. It pattern-matches to the nearest thing — "education nonprofit in DC" — and invents plausible specifics.
The takeaway: Specifics carry no pattern to recover from. That's why the five danger zones are where every verification pass starts.
Mock-jury research has shown a stable pattern for decades. Jurors recall opening arguments and closing arguments at high fidelity. Middle testimony — even when it carries the decisive evidence — fades. Reid Hastie's mock-jury studies found jurors' final verdicts correlated more strongly with opening framing than with witness specifics, in part because the decisive witnesses landed in the attention-weakest part of the trial.
The mechanism is serial position. The first and last items in any sequence get more attention than the middle — Bennet Murdock named the effect in 1962 and it has replicated under hundreds of conditions since.
Loading a 50-page PDF doesn't mean Claude reads it evenly. Attention degrades through long contexts. The "Lost in the Middle" finding (Liu et al., 2023) showed models score highest on questions answered at the start or end of a long document — and noticeably worse on questions answered by the middle.
Three plays: excerpt the three pages that matter rather than dumping the PDF; surface key facts twice (top and bottom of long context); make Claude quote before answering. If it can't quote it cleanly, the answer isn't grounded.
The takeaway: Sharper context produces sharper answers. The middle is where errors hide.
Frederic Bartlett gave English subjects a Native American folktale — "The War of the Ghosts" — and asked each to retell it from memory after a delay, then to retell their retelling. Each pass drifted further from the original. Specifics dropped out. Concepts that didn't fit European cultural patterns — canoes, ghosts, hunting magic — got smoothed into "boats," "phantoms," "fishing."
Bartlett called this serial reproduction. Each retelling pulled the story toward the listener's prior expectations. Drift wasn't error. It was pattern completion running on people.
By message 30, the rule you set at message 2 is gone. Attention weighting tends to favor recent tokens — that's the architecture. And the model's own outputs become its future inputs, so a small wobble in message 10 feeds the wobble in message 15. By message 30, the wobbles have compounded into a different voice and a different set of working assumptions.
Three plays: re-anchor every 8–10 messages; start fresh more often (one task per chat); move durable rules into a Project's reference files — internal memory beats re-anchoring every time.
The takeaway: Drift is built into the architecture. Build the workflow that fights it.
In 2008, Atul Gawande led the development of a 19-item checklist for surgical teams. Three pause-points: before anesthesia, before incision, before the patient leaves. The checks themselves were obvious — verify the patient, confirm the procedure, count the instruments.
The result, published in the New England Journal of Medicine: a 36% drop in major complications and a 47% drop in deaths across eight hospitals from Tanzania to Toronto. The discipline was the small fast check that catches the obvious mistake.
Three checks. Scan for the danger zones — names, numbers, dates, citations, your-world specifics. Ask Claude for the claim table — "list every factual claim, your confidence, the source." Open one source. Just one. The one you check is worth more than the ten you assume.
Manual verification is the floor. The checks always run; heavier gates stack on top for higher stakes.
The takeaway: A checklist works because you run it every time.
A judge cannot rule on a fact that isn't entered into evidence. A witness who testifies but isn't sworn doesn't count. A document referenced but not introduced is invisible. The record is what the court can use. Everything outside the record is hearsay.
The rule governs what's available for use. The same fact — out of the record, useless; in the record, decisive.
Most hallucinations happen when Claude is recalling. Loading the source — making the answer retrieve from internal memory instead of generate from external memory — is the single highest-impact move in the stack.
Three patterns: paste the source inline; constrain the answer to the source ("if a fact isn't in the document, say 'not in source' instead of guessing"); two-pass self-audit ("for every claim, mark whether you traced it to a source I gave you or generated it from training").
The takeaway: Grounded answers are retrievals dressed as generations. Move the fact from training to context, and the model can find it.
A gate is a checkpoint where work doesn't pass until specific checks have run. Three gates, light to heavy. The cadence you pick is what your team's defaults become — for every artifact, every week.
"List every name, number, date, and citation you intend to use, with the source. Wait for me to confirm before drafting."
Forces the danger zones open before the model commits to a draft. Runs in about a minute. Default for every Tier 4 (public) artifact.
One chat drafts. A second, fresh chat audits the draft cold.
The audit chat has no investment in the draft and catches what the drafting chat protected. Closest thing to a peer in one person's hands. Default for Tier 3 internal strategy.
The adversarial-audit Skill from Week 4. Convergent flags are the verdict.
Two lenses landing on the same line for different reasons = the line is the problem. Reserved for board, funder, external publication. Tier 2 sensitive partner work earns this gate.
The takeaway: Manual checks are what you do. Gates are what your team inherits.
A recap note for pilot teammates who couldn't make Friday's session, summarizing the Moffatt v. Air Canada case from Week 5. Generated from recall on the left. The same paragraph after the danger-zone scan, claim table, and source-loaded rewrite on the right. Every v2 fact is in the public ruling — go check it yourself.
Five highlighted spans — year, name and role, court, settlement amount, citation scope — all flagged by the danger-zone scan and confessed as low confidence in the claim table.
Sources loaded: the actual BCCRT decision and the Week 5 deck slide with the verified facts. Five fixes in three minutes. Same draft, grounded in the public record.
The compound: The diligence stack — danger-zone scan, claim table, source-loaded rewrite — is what every leader at this table can run on every draft, every week. Q8's heavier gates stack on top for higher stakes.
One artifact you wrote with Claude this week — counselor email, partner update, internal memo, LinkedIn post. The simpler the better. We walk through one together.
Which of the five danger zones — names, numbers, dates, citations, your world — did the stack catch on your draft? Which one is your blindspot?
What did the claim table reveal that the draft hid? What was different about the prompt that produced the sourced rewrite?
The verification cadence for your capstone build is ______ gate, run by ______, before ______ ships. Where does it sit in your scope?