DC CAP Scholars

Co-Designing Organizational Intelligence

DC CAP Enterprise AI Leadership Pilot

April 6 – June 10, 2026

65
Days
10
Leaders
6
Units
5
Tasks

No technical experience required. This is a learning environment where questions are welcomed, mistakes are expected, and every participant brings expertise that matters. You are here because your leadership perspective shapes how DC CAP uses AI responsibly.

Purpose

This pilot produces leaders who use AI effectively, govern it responsibly, and build the unit-level environments that scale to the full organization. This initiative is funded by a $600K KPMG grant and represents DC CAP's position at the leading edge of nonprofit AI adoption.

Pilot Goals

AI Fluency

Advance all participants toward regular AI use in daily workflows, measured by pre/post assessment across five fluency dimensions. 70%+ demonstrate three or more observable fluency behaviors by Day 60.

Psychological Safety

Create conditions where participants experiment openly, share failures productively, and build confidence through practice. Weekly confidence scores trend upward. Facilitation transitions from leader-led to champion-facilitated.

Shared Learning

Build a peer learning ecosystem where knowledge transfers through champion networks and distributed facilitation. Each unit produces at least one champion candidate. Participants teach techniques to peers.

Productivity

Document measurable time savings and workflow improvements. Each participant captures at least one before/after comparison. Minimum two workflow redesigns per unit, six total across the pilot.

Innovation

Move beyond doing the same things faster toward building fundamentally new capabilities. At least one unit demonstrates an AI-enabled capability that could not exist at current quality or scale without it.

Governance

All participants complete governance orientation and demonstrate responsible data classification. Zero Tier 1 or Tier 2 data incidents. AI-assisted outputs carry clear human ownership and review checkpoints.

Full goal detail and measurement framework: KPI Framework | Governance goals: AI Governance Framework

What Success Looks Like

By Week 4

100% of participants have attempted Claude on a real work task

By Week 8

Each participant has identified at least one workflow where Claude saves time or improves quality

By June 10

Measurable fluency growth on pre/post assessment; 3+ reusable workflow templates documented; capstone presentations completed

What This Investment Produces

Funded by a $600K KPMG AI Innovation Grant. Designed for replication.

Organizational Capacity Gains
Governance Infrastructure

4-tier data classification, acceptable use policy, incident response protocol, and security configuration. Built once, applies to every AI tool the organization adopts.

Leadership Fluency

10 organizational leaders across 6 functional units trained through the Anthropic 4D framework (Delegation, Description, Discernment, Diligence) with measurable pre/post assessment.

Reusable Workflows

24+ custom Cowork Skills encoding institutional knowledge into repeatable AI-assisted processes. Student outreach, partner communications, data analysis, and governance QA.

Measurement System

KPI framework with SCALE/PAUSE/PIVOT decision thresholds, weekly pulse data, and pre/post fluency assessment. Evidence base for board reporting and funder accountability.

Replication Blueprint
What the Grant Funded

Claude Enterprise licenses for 10 pilot seats, Innovation Hub staff capacity for program design and technical architecture, interactive training platform development, custom skill engineering, and a dedicated analytics pipeline for pilot measurement.

What Transfers to Any Nonprofit

The governance framework, the 4D competency model (developed by Anthropic), the phased adoption sequence, the facilitation-to-champion handoff model, and the KPI structure. These are the hardest parts to build from scratch. DC CAP built them, tested them, and is documenting the results.

95% of enterprise AI pilots fail. The primary reasons: no governance structure, no measurement discipline, and no plan for how adoption scales beyond early adopters. This pilot was designed to address all three.

Pilot Timeline

Day 0 of 65
Phase 1
Foundation
Apr 6 – Apr 25
Key Deliverables
• Onboarding & Prerequisites
• Governance & Delegation Fundamentals
Phase 2
Application
Apr 28 – May 16
Key Deliverables
• Description, Discernment & Diligence in Practice
• Capstone Project Design
Phase 3
Mastery & Capstone
May 19 – Jun 10
Key Deliverables
• Team Project Delivery
• Capstone Presentations

Getting Started

Onboarding runs the week of April 6–10. Complete these prerequisites at your own pace, then join lunchtime office hours to discuss, collaborate, and get started.

Total: ~1.5 hours at your own pace
1

Take the Pre-Launch Assessment

Captures your baseline across 5 constructs: AI orientation, learning orientation, current AI use, AI knowledge, and applied skills. Takes about 8 minutes. Complete this first so we have your starting point before you dive in.

~8 min
2

Review Your Start Here Guide and AI Governance Framework

Your operational foundation. One guide covers how Claude works at DC CAP — interfaces, file management, Skills, and responsible use. The other covers data handling, approved use cases, and organizational guardrails. Review both before moving to Step 3.

~30 min
3

Complete Two Courses

Two free Anthropic Academy courses. Claude 101 teaches the basics of working with Claude: features, prompts, and navigation. AI Fluency for Nonprofits applies the 4D framework to mission-driven work. Both earn certificates. Required before your first Claude session.

~1 hour

Verify each course separately. Paste your certificate URL or enter the name shown on your certificate.

🏆
Ready to go!
Account provisioning is now available.

Lunchtime Office Hours

Once you've completed your prerequisites, jump into our lunchtime office hours with Angela Cammack and Preston Magouirk. These are open, collaborative sessions where pilot leaders discuss the process, ask questions, showcase what they're thinking about and creating, and learn from each other. Bring your ideas, your questions, and your first experiments. This is a leadership pilot — the energy comes from the room.

Account provisioning is conditioned on completing all three prerequisites.

Your Capstone: Build Something Your Team Will Use

By Week 8, each leader delivers a working Claude Project configured for their team — with at least one documented workflow and a governance configuration that fits their unit's data. First, you become a fluent user. Then, you build the tool your team inherits.

Weeks 1–3: Build Your Fluency

Try Claude on real deliverables from your actual week. Build confidence through practice, notice what works and what doesn't, and start seeing where AI fits your team's work.

Weeks 4–5: Identify and Design

Develop judgment through peer exchange. Build your first Claude Project. Notice how your work connects to colleagues' — that awareness shapes what your team inherits.

Weeks 6–7: Peer Review

Present your project design to the cohort. Cross-unit reviewers evaluate governance compliance, workflow completeness, and team readiness. The cohort's collective judgment makes every project stronger.

Week 8: Ship It

Finalize your Claude Project, workflow documentation, and governance configuration. Present at the capstone session. Your team inherits what you built.

AI Fridays

Starting Week 2

Begins post-break

35 Minutes

Structured session time

Optional, Highly Encouraged

We are stronger when learning together.

Check your calendar for the recurring invite. Week 1 is asynchronous onboarding; AI Fridays begin Week 2. Come when you can — each session builds on the last, and we are stronger when learning together.

Session Structure ▾

Opening (5 min)

One question from this week's challenge: "What did you try? What happened?"

Core (20 min)

Share what you noticed: patterns, surprises, connections to your team's work. Peer exchange and cross-unit learning.

Close (10 min)

Governance pulse check + preview of next week's challenge.

Discussion Prompts by Week

WEEK 2 — DELEGATION

Personal output (H1) — What is something you did with AI this week to strengthen your own work output? What worked, what did you have to fix?
Delegation — Was it Automation or Augmentation? How did you decide? Where did you hesitate?
Team & enterprise (H2 / H3) — Did this also create something your team could use or benefit from? Did it make possible an organization-wide skill or capability DC CAP doesn't have today? If not, could it have — and how?

WEEK 3 — DESCRIPTION

What context did you share with Claude this week that changed what came back? What did you know that Claude didn't?
Where did you spend the most time — setting up the task or fixing the output? What does that tell you?
Did anyone notice a connection between what they shared and how useful Claude was?

WEEK 4 — BUILD WORTH KEEPING

The filter — Walk us through your five scores. Which criterion almost killed the build, and why did you build anyway?
The customization — What outside source did you load? Where did it raise the floor on what Claude could do?
The hand-off — Where did the teammate get confused? What does that tell you about what's still tribal knowledge?

WEEK 5 — ITERATE & INHERIT

What broke — Where did your teammate stumble, and what did v2 cut, sharpen, or rename?
The capstone seed — Is this build the capstone, or a piece of something bigger?
Inheritance circle — Who should and shouldn't have access to your build, and what governance decision did that surface?

WEEK 6 — DILIGENCE (Part 1)

Walk us through your output trace: what did Claude produce, what did you change, and why?
Has anyone had a moment where they almost sent something without reviewing it? What happened?

WEEK 7 — DILIGENCE (Part 2) + PEER REVIEW

Present your capstone draft. Peer reviewers: Is the governance config specific enough? Is the workflow complete? Would you know how to follow it?
What cross-unit connections do you see?

WEEK 8 — CAPSTONE

Capstone presentations (3 slides max). After each: What would you steal from this project for your unit?
Closing round: What's one thing you know now that you didn't know 8 weeks ago?

Complete Your Pre-Assessment First

Resources below unlock after you complete the Pre-Launch Assessment. This ensures we capture your genuine starting point before any content exposure. Open Assessment →

Pilot Resources

What you need, when you need it.

Week 1
Foundation
Available

Start Here Reference

Interfaces, files, skills, governance, responsible use

View Reference →
Available

Kickoff Deck

The frameworks, the why, and the 60-day roadmap

View Deck →
Available

User Guide

Comprehensive guide to Chat, Cowork, Projects, Skills, and Connectors

View Guide →
Week 2
Delegation — Automation and Augmentation

Every time you hand work to Claude, you're making a delegation decision. Are you automating — giving Claude a defined task and reviewing the result? Or augmenting — thinking alongside Claude to shape something together? Start with the work your team does every week and notice which mode fits.

Leadership Lens — Three Horizons

Every leader in this pilot is climbing the same ladder:

H1 · Personal gain
Faster, better drafts for the work on your own plate.
H2 · Team gain
Patterns your team inherits — a Project, a workflow, a shared move.
H3 · Enterprise gain
Moves that scale across DC CAP and become how we operate.

This week is H1 by design. Get reps on your own deliverables. H2 arrives in Week 5 when you design your Claude Project. H3 is the capstone — what your team ships becomes a pattern the enterprise can adopt.

Available

AI Fluency Framework

The 4D model, three modalities (Automation, Augmentation, Agency), and the competency-to-modality matrix

View Framework →
Available

AI Governance Framework

Data classification is the first delegation filter. Know what you can and can't hand to Claude.

View Framework →
Worked Example — One Task, Three Horizons ▾

Task: Prepare DC CAP's quarterly funder update for KPMG.

H1 · Personal (this week)

Paste this quarter's three data points — retention at 85%, the April 6 pilot launch, the new partnership — into Claude: "Draft a one-page KPMG update. Professional tone. Include a subject line." You review, edit, send. The draft is faster. The judgment stays yours.

Delegation: Automation. Your review is where judgment enters.

H2 · Team (within 4–6 weeks)

You notice every funder update your team writes follows the same shape. You turn the prompt into a Claude Project: system instructions that name KPMG's priorities, the quarterly cadence, DC CAP's tone, and a link to the latest retention data. Anyone on the development team now produces a funder-ready first draft in ten minutes.

Delegation: Augmentation. Your judgment shapes the Project so the team inherits a pattern, not a one-off.

H3 · Enterprise (this pilot cycle)

Three units surface the same pattern — development, academic affairs, operations. DC CAP stands up one external-update engine: a shared Claude Project with governance-tier rules, voice calibration, and required review steps. The whole organization produces on-brand external communications at speed.

Delegation: Agency. Claude operates inside guardrails leadership sets. This is what the capstone climbs toward.

Takeaway: The same task lives at three horizons. Week 2 is H1 reps. The H2 moment is when your own work starts looking like a pattern. Notice it — that's the capstone signal.

This Week's Challenge

Take a real deliverable from your week and run it through Claude. Then bring three answers to Friday:

  1. Personal output (H1): What is something you did with AI this week to strengthen your own work output?
  2. Delegation: Was this Automation or Augmentation — and how did you decide?
  3. Team & enterprise (H2/H3): Did this also create something your team could use or benefit from? Did it make possible an organization-wide skill or capability DC CAP doesn't have today? If not, could it have — and how?

Governance Pulse

Before you start, check: what data tier does this task involve? If Tier 1 or 2, stop and consult the framework. Leader scope: what tier does most of your team's recurring work sit in — and where's the biggest gain if you handle that tier well?

Week 3
From Reps to Patterns

Week 2 was reps. By now you've seen Claude help with real work — and you've also seen drafts that fell flat. This week is the move that turns reps into patterns: describing the unit of work precisely enough to know what kind of thing you're building. A one-shot? A Project? A Skill? An automation? Each one reuses something different — the context, the move, or the trigger. Naming what's reusable is how your H1 reps become H2 inheritance.

Leadership Lens — The Four Containers

Every Claude move you make falls into one of four containers. The container is defined by what gets reused:

One-shot
Nothing reusable. You won't do this again — and you can say why in one sentence.
Project
The context is reused. Same body of knowledge, many different asks, for weeks.
Skill (the artifact, not the human capability)
The move is reused. A short written procedure — when to trigger it, what good looks like, what to avoid — that anyone on the team can run.
Automation
The trigger is reused. Runs on a cadence or an event — no human has to remember.

The Description sub-skill: you can't pick the right container until you describe the unit of work precisely. What is the work? Who's it for? How often does it happen? What does good look like? Description is the design step that comes before container choice.

Available

Container Decision Worksheet

A four-step decision tool for this week's challenge. Describe the work, name what's reusable, pick the container, draft the artifact. Print or fill in the browser.

Start Worksheet →
Available

AI Fluency Framework

The 4D model and three modalities. Container choice maps directly onto Automation, Augmentation, and Agency.

View Framework →
Available

AI Governance Framework

Containers bake governance in. A Project's reference files carry the data tier of whatever you load. Know the rules before you build.

View Framework →
Worked Example — One Job, Four Containers ▾

The job: Stay in genuine, on-brand contact with DC CAP's 13 university partners.

One-shot

"Draft a note to Georgetown's interim dean explaining why our timeline shifted this cycle." The dean is interim. The timeline shift was a one-time event. You'll never write this exact note again.

What's reused: nothing. The discipline: name why it's a one-shot, so you don't accidentally rebuild it next week.

Project — "Partner Engagement"

Stand up a Claude Project loaded with the partner contact list, MOU terms, every email sent in the last year, the DC CAP voice guide, and current cycle dates. Now any partner-related ask runs against the same body of knowledge: a draft email, a meeting brief, "what did Howard say in February?", a side-by-side of how three partners responded to the same ask.

What's reused: the context, across many different tasks, for weeks.

Skill — "Draft a Partner Email in DC CAP Voice"

Write the move down once. When to trigger it (any partner email, any topic). What good looks like (warm, professional, names the relationship, never asks twice). What to avoid (generic openers, formal closers that read cold). Now any teammate can run the same move for any partner — they describe the partner and the topic inline; the procedure stays consistent.

What's reused: the move, across all 13 partners and every topic.

Automation — "Quiet Partner Watch"

Every Monday morning, surface partners DC CAP hasn't touched in 30 days, with a one-line suggested reach-out for each. The trigger fires on its own. The reach-outs still pass through a human — but no one has to remember to look.

What's reused: the trigger. The work runs on a cadence, not on memory.

Takeaway: Same job, four shapes. The Project is the workspace. The Skill is the move. The Automation is the trigger. The one-shot is the deliberate exception. Picking the wrong container is how teams end up rebuilding the same prompt every Tuesday.

This Week's Challenge — Container Design

Pick one Claude move from your last two weeks. Answer in three steps:

  1. Describe the work. What's the unit? Who's it for? How often does it happen?
  2. Name the reusable thing. Context, move, trigger, or nothing?
  3. Build the matching artifact. Project → set up the workspace with system prompt and reference files. Skill → write a short SKILL.md (when to trigger, what good looks like, what to avoid). Automation → sketch trigger, input, output, human checkpoint. One-shot → write the one sentence that says why it's a one-shot.

Bring the artifact to AI Friday. Friday's question: whose Skill should we steal? Whose Project should be a Skill? What did someone automate that the rest of us are still doing by hand?

Governance Pulse

Containers bake data classification in. The reference files in your Project carry their tier with them — anyone with Project access has access to that data. Before you build, ask: what tier is the most sensitive thing I'm about to load? Who should and shouldn't be in this Project? Leader scope: if your team builds three Projects this month, that's three governance decisions you own.

Weekly Reflection ▾
Confidence level (1-5)
Week 4
Build Worth Keeping

Week 3 named four containers. Week 4 answers the harder question: not everything that could be a Project should be a Project. This week you decide what passes the bar — then you build one team-level asset that earns its keep. The sharper your customization, the more your build replaces tribal knowledge instead of duplicating ChatGPT.

Leadership Lens — The Build Filter

Five questions decide whether the work in front of you deserves a Project or Skill. If three or more are clearly yes, build it. If not, put it down — pick a different candidate.

01 · Frequency
Does this work happen often enough to justify the build? Less than once a month is a one-shot, not a Skill.
02 · Variability
Does the move repeat with new inputs? If every instance is genuinely different, lean Project. If the procedure is identical, lean Skill.
03 · Inheritance
Will the team actually use this — or am I building a private convenience? Private is fine; just don't pretend it's a team asset.
04 · Stakes
What does getting this wrong cost? Higher stakes earn more customization investment and tighter governance.
05 · Substitution
Does a better tool already do this? Don't build a Claude version of Salesforce. Use Salesforce.

The Discernment cut: the wrong build wastes the team's time. The deliberate "no" is just as much of a leadership move as the build itself.

The Customization Play — Make it the team's, not Claude's

A team build is only worth keeping if it carries something Claude doesn't already know. Three vectors decide depth:

Org context
DC CAP voice, ground-truth metrics, partner list, the rules of how we talk to families. Encoded in custom instructions or SKILL.md.
Outside authority
Research papers, federal datasets, peer-org playbooks, brand guides, Anthropic docs. Loaded as reference files or cited inside the procedure.
Workflow shape
When to trigger. What good looks like. What to avoid. Where the human checkpoint sits. The procedural spine.
Available

Build Filter Worksheet

Score your candidate against the five criteria. Three "yes" or more, you build. Print or fill in the browser.

Run the Filter →
Available

Three Builds You Can Steal

Three real anatomies — a team Project, a team Skill, and a deeply customized Skill — with the actual specs. Steal the structure for your own.

Open Gallery →
Available

AI Governance Framework

Custom instructions and reference files carry data tiers. Verify the tier of every file you load before the team inherits it.

View Framework →
Worked Example — How Deep Customization Goes ▾

The candidate: a Skill that drafts and edits anything Preston will sign — board memos, donor briefs, LinkedIn posts, conference remarks — in his voice.

Without customization

Generic AI prose. Banned words sprinkled through every paragraph. Em-dash chains. Hedged closes. Drafts that sound like every other LinkedIn post — and require a full rewrite before Preston will put his name on them.

Three sources loaded

Org context (DC CAP voice rules; the explicit banned-word list; the accessibility-and-impact framing). Outside authority (a curated .md archive of Preston's own published writing — fifty paragraphs from prior work that the voice was actually built on; the BLUF structural pattern from executive-communication standards). Workflow shape (BLUF order enforced, banned constructions named, the human checkpoint before anything goes out).

What the team inherits

First-pass drafts arrive with the voice intact. Light edits replace structural rewrites. The structural pattern — BLUF, separate banned-words and banned-constructions lists, a curated archive of own work as reference — is portable. Any DC CAP author can lift the shape and fill it with their own voice. The Skill replaces the implicit judgment that used to mean re-reading every draft three times.

Takeaway: the depth of customization is the difference between a Skill that saves five minutes and one that replaces a whole layer of tribal knowledge. See three full anatomies →

All three builds in full — steal the structure ▾

Three real Preston-owned skill files in active use at DC CAP — the SKILL.md is the part you can copy. Show the structure to Claude, describe your own work, and Claude will help you build a version of it for your team. Reference files are named so you can see they exist; their contents live in BRAIN.

Build 01 · One skill inside a 5-agent system · Reuses context

dev-office-director

Compiles the week's grant prospecting activity into a 5–7 slide branded briefing deck for Eric. Reads from a shared workspace (pipeline tracker, weekly scan, recent proposals, strategic brief, last week's briefing) and produces the Friday CEO deck.

This is one of five agents. The Director sits inside a larger Development Office system that also runs Scout (finds funding opportunities each Monday), Fit Analyst (scores opportunities + builds funder dossiers), Grant Writer (drafts proposals), and Strategic Auditor (4-lens review of every draft). Each agent is its own SKILL.md with strict context boundaries — no agent reads everything. The system runs on a Mon 8am scan / Fri 3pm briefing cadence with a shared .learn/ memory so the whole system gets smarter over time.

Container: Skill (1 of 5 in the Dev Office agent system) Owner: Preston (BRAIN-original) Inheritance: Eric · weekly Highest tier: Tier 2 (writes briefings on live pipeline)

SKILL.md (condensed from the real 173-line file — copy this structure and Claude can help you adapt it to your team's weekly artifact)

---
name: dev-office-director
description: Development Director agent for DC CAP's Grant Prospecting
  & Proposal Pipeline — compiles the week's activity into a branded
  CEO briefing deck for Eric. Use whenever someone asks to build the
  weekly briefing, compile the CEO update, summarize what happened
  this week, or run the director.
---

# Development Director

You compile the week's development activity into a clear, branded
briefing deck for Eric. You read pipeline state, scan results, and
proposal progress, then produce a 5–7 slide deck Eric can read in
five minutes. You don't search for grants, assess fit, write
proposals, or audit drafts.

## Pre-Run: Load Learning Files (MANDATORY)
Before any other context loading, read these four files:
  .learn/errors.md      — past failure patterns to screen against
  .learn/canonical.md   — single source of truth for every DC CAP figure
  .learn/glossary.md    — load-bearing phrases to use verbatim
  .learn/lessons.md     — durable lessons from prior runs

These files are the durable memory of the Development Office. Reading
them at session start prevents recurring errors. Do not skip.

## Context Loading
Read these files (and only these):
  1. pipeline.md                 — living pipeline tracker
  2. scans/YYYY-WNN.md           — this week's scan (most recent)
  3. proposals/ (last 7 days)    — recent proposal activity
  4. strategic_brief.md          — pipeline table only
  5. briefings/ (last)           — for tracking week-over-week movement

Do NOT read strategy.md, preston.md, org_intelligence, or scan_criteria.md.

## Skills to Invoke
- dccap-brand                  → colors, typography, logo (CEO palette)
- pptx                         → build the actual file
- preston-writing              → MANDATORY voice check before saving
- data-interpreter             → metric framing for CEO audience
- checking-communications      → final policy/voice/brand pass

## Briefing Deck Structure (5–7 slides, skip if empty)
  1. Title — week of [Mon] – [Fri], DC CAP branded
  2. Executive Summary — BLUF: what happened → so what → decision needed
  3. New Opportunities — Funder | Opportunity | Lane | Deadline | Amount
  4. Fit Assessment — Funder | Tier | Rationale (highlight Pursue tier)
  5. Pipeline Status — full pipeline + week-over-week movement
  6. Proposals in Progress — Funder | Status | Version | Findings | Next
  7. Recommended Actions — numbered, specific, owner-named

## What You Do Not Do
- Search for grants (Scout's job)
- Assess fit (Analyst's job)
- Write proposals (Writer's job)
- Audit drafts (Auditor's job)
- Editorialize beyond the data — report, recommend, stop

## Verification Gate (MANDATORY — Gate 4)
No Tier 4–5 claim reaches Eric without an explicit flag.
1. Cross-check every dollar figure against the dossier's Confidence Summary.
   Unconfirmed figures get [unconfirmed] in the deck OR get omitted.
2. Cross-check every deadline. Unverified deadlines get [unconfirmed].
3. Append a ## Verification Note section listing flagged claims, any
   figures corrected since last week, and the [T1-3 audited] sentinel
   if every material claim is Tier 1–3 sourced.
4. The PostToolUse hook blocks saves missing the sentinel.

## When You're Done
Report: "Weekly briefing complete for Week [NN]. Deck saved.
[N] new opportunities, [M] assessed, [K] proposals in flight.
Gate 4: [T1-3 audited / X claims flagged]. [X] actions for Eric."

Filled example — what the Director's actual output looks like (W18 briefing, funder names sanitized)

# Development Office Weekly Briefing
# Week of April 27 – May 1, 2026 | W18 Friday Operating Briefing
# Read time: 4 minutes

## Bottom Line Up Front

- Pursue-tier pipeline holds at 4 funders. Confirmed ceiling ~$1.2M.
  No new Pursue-tier opportunities this week. July 1 submission
  window is 61 days out.
- W18 scan added 3 Watch prospects. [Foundation A] (June 19 deadline,
  49 days), [Foundation B] (rolling), [Trust C] (rolling). All Tier 5
  sourced. Tier 2 verification required before actioning any.
- [Pursue Funder X] cold intro must go out today. No reply by
  mid-May = July 1 application window closes.
- Two Eric actions remain open from W17. [Funder Y] intro email
  (sign and send) and [Funder Z] Q10 paragraph are outstanding.

## Pursue Tier — Full Status

| # | Funder        | Fit    | Status                    | Deadline                | Ask                                       | Next Action                  |
|---|---------------|--------|---------------------------|-------------------------|-------------------------------------------|------------------------------|
| 1 | [Foundation 1]| 37/40  | In Review                 | July 1 (61d)            | $1M                                       | Q10 paragraph from Eric      |
| 2 | [Foundation 2]| 35.5/40| Greenlit — intro drafted  | July 1 (61d)            | Up to $125K                               | Preston sends cold intro     |
| 3 | [Foundation 3]| 32/40  | Writer Unblocked          | July 1 (portal June 1)  | $75K                                      | Writer drafts LOI when open  |
| 4 | [Foundation 4]| 30.5/40| Greenlit — intro ready    | Rolling                 | Est. $25K–$250K [UNCONFIRMED — T5 only]   | Eric reviews and sends       |

Confirmed Pursue-tier ceiling: ~$1.2M
(Foundation 1 $1M + Foundation 2 up to $125K + Foundation 3 $75K.
Foundation 4 excluded — dollar range is Tier 5 only.)

## Verification Note
All material claims Tier 1-3 sourced. [T1-3 audited]
Tier 5 unverified items flagged in-place with [UNCONFIRMED — T5 only].

Steal this: a Project-flavored Skill earns its keep when (a) it loads from a fixed list of files, not "all of BRAIN," (b) it has an explicit "What You Do Not Do" so it doesn't drift into other agents' lanes, (c) it ships with a verification gate that blocks saves of unverified claims, and (d) the output flags its own confidence — the [UNCONFIRMED — T5 only] tag in the table is the discipline that lets Eric trust the briefing cadence. The boundaries are the design.

Build 02 · Skill · Reuses the move

adversarial-audit

Multi-agent adversarial review for any artifact — grants, strategy docs, board materials, presentations, emails, thought leadership. Deploys 3–4 expert lenses (matched to the artifact type) to evaluate independently, then triangulates findings into a scored assessment with actionable fixes. Stage-based progressive disclosure: rapid triage first, full breakdown on request, literal rewrites in deep mode. The skill that lets you stress-test your own work before anyone else sees it.

Container: Skill Owner: Preston (BRAIN-original, v1.8) Inheritance: Anyone at DC CAP shipping a deliverable that needs a stress-test Highest tier: Tier 4 (the structure is shareable) Eval-tested: 79% with-skill vs 33–56% baseline

SKILL.md (condensed from the real 147-line file — copy this structure and Claude can help you adapt the panels to your team's artifacts)

---
name: adversarial-audit-cowork
description: Multi-agent adversarial review for nonprofit and education
  teams — deploys 3-4 expert lenses to evaluate any artifact (grants,
  strategy docs, presentations, emails, thought leadership), then
  triangulates findings into scored assessment with actionable fixes.
  Includes "The Adversary" panel for general argument, logic, and
  language QA. Trigger on "audit this," "stress test," "QA this,"
  "poke holes," "run the adversary," "check this before I send it,"
  "does this hold up," "is this AI-sounding," "full audit," "quick audit."
---

# THE ADVERSARY — Adversarial Audit
You are an expert multi-lens auditor. When triggered, run this protocol exactly.

## STEP 1: INTAKE
Read the artifact silently. Then ask, before anything else:
  1. Purpose & audience: What is this for, and who reads it?
  2. Stakes & deadline: How high are the stakes? Any deadline?
  3. Anything I should know? (criteria, sensitivities, prior feedback)
  Or say "just go" and proceed on the artifact alone.

If "just go" → proceed, note missing context briefly at top of output.
If answered → use answers to select panel and mode. Then go. No follow-ups.

## STEP 2: SELECT MODE
| Mode      | When                                          | Agents | Output                             |
|-----------|-----------------------------------------------|--------|------------------------------------|
| Light     | "quick audit," low stakes, short content      | 3      | Stage 1 triage only — 600–900 wds  |
| Standard  | Default. Reports, applications, strategy.     | 3–4    | Stage 1 → ask → Stage 2            |
| Full      | "full audit," high stakes (>$100K, board)     | 4      | Stage 1 → Stage 2 → offer Stage 3  |

## STEP 3: SELECT PANEL (match artifact to panel)

Grant / LOI / Funding Proposal:
  - Program Officer        → "Would I advance this or decline it?"
  - Evidence Reviewer      → "Can I verify every claim?"
  - Budget Analyst         → "Does the budget prove they can do this?"
  - Persuasion Editor      → "Does this read like a winner or a compliance exercise?"

Strategy Document:
  - Logic Auditor          → "If I challenge every 'why this not that,' do answers hold?"
  - Evidence Checker       → "What's the evidence this strategy will work?"
  - Operator               → "Can they actually pull this off with what they have?"
  - Stakeholder Reader     → "When partners and funders read this, do they see themselves in it?"

Board / Executive / Funder Presentation:
  - Board Veteran          → "Can I make a decision from this?"
  - Data Integrity Auditor → "Is every number traceable to a source I could check?"
  - Strategy Translator    → "Does this tell a coherent story?"
  - Ask Evaluator          → "Am I ready to say yes, or do I need more?"

Written Content / Thought Leadership:
  - Editor                 → "Is there an argument here, or just observations?"
  - Fact-Checker           → "Which claims survive a challenge?"
  - Audience Proxy         → "Did I learn something, or did I read someone thinking out loud?"
  - Originality Critic     → "Have I read this before under a different byline?"

THE ADVERSARY — works on any artifact:
  - Logician               → "If I challenge every 'therefore,' which survive?"
  - Claim Auditor          → "Which claims is the author hoping I won't check?"
  - Language Critic        → "If I removed every AI or committee-written sentence, what's left?"
  - Devil's Advocate       → "If I were trying to discredit this publicly, where would I start?"

(Full skill includes panels for Application, Data Analysis, and a custom-agent path.)

## STEP 4: RUN AGENTS (sequentially, in strict character discipline)
For each agent: write the header, commit fully to the lens, complete the
evaluation, close before opening the next. Agents do NOT see each other's
outputs. Each produces:
  - Verdict (one sentence answering the driving question)
  - Score (X/10 with one-sentence justification)
  - Top 3 strengths + top 3 weaknesses (each with direct quote + specific fix)
  - Critical flags — anything that would cause rejection or embarrassment

## STEP 5: PRESENT — PROGRESSIVE DISCLOSURE

Stage 1 — Rapid Triage (always first):
  Verdict: [safe to send / needs work / do not send yet]
  Score: [X/10 composite]
  Critical flags: [bulleted; "None" if clean]

  Light mode + no critical flags → end here. Two specific fixes. Done.
  Standard mode → ask: "Want the full breakdown, or is this enough?"
  Full mode → proceed to Stage 2.

Stage 2 — Full Synthesis:
  1. Scorecard table (Agent | Score | Rationale | Critical flag Y/N)
  2. Where agents agreed (label "Convergent (N agents)")
  3. Where agents disagreed (each position + your read)
  4. Action items ranked: Critical → Recommended → Polish
  5. Residual risk — what this audit can't catch

Stage 3 — Deep Dive (Full mode, on request only):
  Per-agent full evaluations + literal rewrites for requested weaknesses.
  Show the improved version. Don't describe it.

## OPERATING RULES
1. Quote the artifact — every finding needs a direct quote, not a paraphrase
2. No cheerleading — if it scores 9/10, justify it rigorously
3. No scaffolding in output — output reads as expert human panel, not AI process
4. Verify before citing — never fabricate org facts; flag uncertainty
5. External research — when the artifact names a funder/program, search their
   current priorities before running agents. Surface contradictions as critical flags
6. Full mode = literal rewrites, not "this could be stronger"
7. One intake, then go — make reasonable assumptions, note them, proceed

Filled example — what Stage 1 Rapid Triage actually looks like (sample run on a hypothetical Q3 strategy memo, Standard mode)

Audit run: "Q3 Strategy Update" memo
Mode: Standard | Panel: Strategy Document (4 agents)
Intake: Audience = senior leadership; stakes = mid; deadline = Friday.

────────────────────────────────────────────────────────
Stage 1 — Rapid Triage
────────────────────────────────────────────────────────

Verdict: Needs work — convergent flag on the central growth claim.
Score: 6.5/10 composite
       (Logic Auditor 7 · Evidence Checker 5 · Operator 6 · Stakeholder Reader 8)

Critical flags:
• CONVERGENT (2 agents): Logic Auditor + Evidence Checker independently
  challenged "we expect 30% growth in Q4." Logic Auditor: "the 'therefore'
  doesn't follow from Q1–Q2 trend." Evidence Checker: "no source, no
  precedent, no model." When two lenses flag the same sentence cold, the
  sentence is the problem.

• Operator: the August timeline assumes hiring two roles. No hiring plan
  attached. The strategy is downstream of a capacity assumption that
  hasn't been made. Not fatal — but the dependency needs to be named.

• Stakeholder Reader: borderline. Liked the framing for the leadership
  audience; questioned whether the partner section reads as collaborative
  or transactional. Flag for tone review, not a blocker.

Two specific fixes before going further:
  1. Source the 30% Q4 growth claim, OR downgrade to "we are aiming for"
     and name the assumptions.
  2. Add a one-line capacity dependency under the August milestone.

Want the full breakdown, or is this enough to work from?

Steal this: a review Skill earns its keep when it (a) matches the panel to the artifact (a grant gets a Program Officer + Budget Analyst, a strategy doc gets a Logic Auditor + Operator), (b) runs each lens in character discipline — agents don't see each other's outputs, so disagreement becomes a signal, not a wash, and (c) presents in stages — triage first, full breakdown on request, rewrites only in deep mode. The convergence math is the magic: when 2+ independent agents flag the same line cold, the line is the problem.

Build 03 · Deeply customized Skill · Reuses the move with depth

preston-writing

The voice engine for everything Preston signs — board memos, LinkedIn posts, donor briefs, policy comments, conference remarks. Voice DNA extracted from his actual published writing, an explicit anti-AI rules file, three voice registers, and a curated archive of fifty published paragraphs as outside reference. Encodes the layer of judgment that used to live only in re-reading every draft three times.

Container: Skill (273 lines) Owner: Preston (BRAIN-original, v2.4) Inheritance: Pattern usable by any DC CAP author Highest tier: Tier 3 (sample drafts)

SKILL.md — frontmatter + Voice DNA spine (condensed from the real 273-line file)

---
name: preston-writing
description: >
  Preston Magouirk's writing voice, style, and anti-AI rules for all
  content creation. Use whenever drafting, editing, or reviewing any
  written content — LinkedIn posts, grant narratives, board materials,
  thought leadership, policy briefs, blog posts, conference remarks,
  donor communications, emails, or any deliverable that will carry
  Preston's name or voice. Trigger on any content creation task.
---

# Preston Writing
Two jobs: (1) Match Preston's voice. (2) Eliminate AI-generated patterns.

## Voice DNA (extracted from Preston's actual published writing)

### How Preston Opens
Lead with a concrete fact, outcome, or situation. Never a question,
never a sweeping claim, never throat-clearing.
✓ "DC CAP had a breakthrough year in 2025."
✗ "In today's rapidly changing education landscape..."

### How Preston Closes
Conditional invitation that positions the writer as a learner.
Pattern: "If you [share this context], I'd [welcome/love to hear]..."

### Pronouns
"We" is the default for actions and outcomes. "I" is reserved for
personal reflection or learning. Roughly 4:1 we-to-I ratio in org content.

### How Preston Handles Data
Numbers bare, no adjectives. Signature phrase: "For context."
✓ "75-95% earned degrees. For context, ~37% of D.C. students do."
✗ "An impressive 75-95% — a staggering improvement..."

### The Deflation Principle
Where AI inflates, Preston deflates. Where AI declares, Preston invites.
Diagnostic: if a sentence would sound good in a TED talk, rewrite it.

### Transitions
Zero explicit transition words. No However, Moreover, Furthermore,
Additionally, That said. Transition by starting a new paragraph with
a new concrete fact.

## Anti-AI Rules (the short version)
Read reference/anti_ai_rules.md for the complete list. Quick scan:

Forbidden phrases: "doing the heavy lifting," "the real question is,"
"here's the thing nobody is talking about," "it's not about X, it's
about Y," "load-bearing" (metaphorical).

Vocabulary red flags: leverage, utilize, delve, navigate (metaphorical),
landscape (metaphorical), ecosystem, synergy, innovative, groundbreaking,
transformative, game-changing, revolutionary, robust, holistic, unpack,
deep dive.

Structure red flags: question openers, "In today's..." openers,
"As someone who..." credibility claims, triple parallel constructions,
em-dashes >2 per piece.

## Three Voice Registers
1. External Research — policy briefs, research reports, partner evals
2. Internal Strategy — strategic plans, board materials, vision docs
3. Leadership Reflection — LinkedIn, thought leadership, conference remarks

## Forbidden Content
- No partner or system disparagement (use "preparation gaps" not "fails")
- No emergency/crisis language (DC CAP is academic support, not crisis services)
- Frame around accessibility and impact

Reference files (the SKILL.md points to these; team copies the structure, builds their own)

Lives at BRAIN/skills/skills/preston-writing/reference/ — four guide files (voice_style_guide.md, anti_ai_rules.md, email_style_guide.md, source_paths.yml) plus eight sample files (LinkedIn drafts and finals, blog posts, policy briefs, research papers, reports). The SKILL.md routes to whichever sample matches the deliverable type. Mostly Tier 4Drafts in progress: Tier 3

Steal this: depth means every failure mode you have ever fixed by hand has a named home in the file. Voice rules go in their own list. Anti-AI rules go in their own file. Forbidden content patterns (partner disparagement, crisis language) get their own section because they fail differently. Walk through your last ten edits to AI drafts and ask: which corrections are in the file, and which are still in your head? The corrections still in your head are the next layer to encode.

Want the full side-by-side comparison and the cost/value table? Open the gallery →

This Week's Challenge — Build Worth Keeping

Take last week's container candidate. Three steps:

  1. Run the Build Filter. Score the five criteria. If three or more are clearly yes, you build.
  2. Build the v1. Project or Skill. Load at least one outside source — a research doc, a peer-org playbook, a verified data table, the brand guide, an Anthropic skill — and write it into the procedure.
  3. Hand it to a teammate. Watch them run it cold. Where it confuses them is where your customization is thin.

Bring the build to AI Friday. Friday's question: whose build do you want to inherit, and what would you change before adopting it?

Governance Pulse

Outside sources have data tiers too. Before you load a research PDF, a partner playbook, or a dataset, check: is it Tier 4 (public), Tier 3 (org strategy), or Tier 2 (sensitive)? Anyone with access to your build inherits access to its references. Leader scope: the build you ship this week is a governance decision, not just a productivity decision.

Weekly Reflection ▾
Confidence level (1-5)
Week 5
Under the Hood

Four weeks of reps. You've been operating on principles — the 4Ds and 3As — without naming the layer underneath. Week 5 is that layer. Eight questions every early user trips over, answered plainly, with the move that fixes what's broken. Same principles, sturdier reasons. Better chats. Better tasks. Better Projects. Better Skills. Better outcomes.

Grounding our efforts in the principles for AI use and the background for how AI works.

The Principles You've Been Operating

Delegation
Choosing the modality.
Description
Naming the work precisely.
Discernment
Judging what's worth building.
Diligence
Verifying before it ships.
Automation — Claude runs, you review.
Augmentation — you and Claude think together.
Agency — Claude operates inside your guardrails.

The 4Ds are how we operate. The 3As are the shape your delegation takes. The eight questions below are why those moves work — or fail.

Part 1 of 3

Context & Memory

Q1 Description

What is a context window?

The plain answer. The context window is Claude's working memory for one conversation. It holds everything you've pasted, everything Claude has said back, and any reference files loaded. It has a ceiling — not a target.

Where it breaks. People treat the window like email — long, sprawling, accumulating for weeks. Past a certain point, things on page three get missed.

The move. Treat each chat like a focused work session. One task, clean window. When the task ends, the chat ends. For anything that needs to persist across sessions, use a Project — that's what Projects exist for.

Description quality is bounded by what sits in this window.

Q2 Description

Why does context matter?

The plain answer. Context is everything you load into the window before you ask. The single biggest predictor of a good Claude answer is whether you gave it the right context up front.

Where it breaks. "Write me an email to a parent" gets the average of the internet. "Draft a warm-professional email to a parent of a 12th grader who missed the May 1 renewal deadline, framed around the next step, signed by me" gets DC CAP work.

The move. Load names, numbers, dates, voice rules, audience, and format every time. Or load it once into a Project and stop retyping. Anytime you find yourself thinking "Claude should already know this," that's the signal — Claude doesn't, and the gap belongs in a file.

Empty context, generic answer. Rich context, sharp answer.

Q3 Description carried forward

Why does memory matter?

The plain answer. Claude doesn't remember you across chats. Each new conversation starts fresh unless you've built something that carries memory forward. Three layers: in this chat (yes), in a Project (yes, via reference files), across chats in general (no).

Two kinds of memory — and why the difference matters

External memory

What Claude absorbed from training data — vast but fuzzy. Lower precision and recall on the specifics of your world: names, numbers, dates, recent events.

This is where most hallucination comes from.

Internal memory

What's in the context window right now — your prompt, the chat history, the Project's reference files, anything you paste in.

High precision, high recall. Claude can quote it back verbatim.

The leader move: when accuracy matters, load the fact. Don't ask Claude to recall it. Internal memory beats external memory every time.

Where it breaks. The months-long chat. People keep one conversation open because they're trying to compensate for missing memory. The chat degrades long before the month is up. Performance thins, errors compound, and the team inherits a brittle artifact.

The move. Stop making one chat remember everything. Memory belongs in files. A Project carries context across tasks. A Skill carries a procedure across runs. If you retype the same instructions every week, that's a Skill waiting to be written.

Every Skill you write is Description carried forward.

Part 2 of 3

Model Behavior

Q4 Diligence signal

What is model drift?

The plain answer. Two meanings. The one to know first: within a long conversation, Claude's output drifts from your original intent. Early instructions get diluted. Tone shifts. Errors compound. (The second meaning — model behavior changes across versions — matters at the org level. DC CAP pins Opus 4.6 and Sonnet 4.6 as current defaults.)

Where it breaks. You correct Claude on something at message 5. By message 30, the same error is back. Newer signals in the window crowd out older ones.

The move. When the trail feels off, start fresh. Re-anchor inside long chats with an explicit reminder: "remember — audience is counselors, voice is warm-professional, deadline is May 1." A Project loads the foundational context fresh every time you open a new chat inside it.

When the answer thins, that's a Diligence signal — not a Claude failure.

Q5 Delegation

What model should I use?

The plain answer. Claude comes in three sizes. Each handles a different kind of job: fast, balanced, careful.

Opus 4.6 — careful
Highest-stakes drafting. Deep multi-step reasoning. Board and funder work where being wrong is costly.
Sonnet 4.6 — balanced
The DC CAP default. Strong reasoning, fast enough for everyday work. Handles roughly 80% of what you do.
Haiku 4.5 — fast
Quick lookups. Classifications. Light summaries. The wrong tool for anything judgment-heavy.

Where it breaks. Defaulting to Opus by reflex (slow, expensive, often unnecessary) or to Haiku by habit (ships sloppy work).

The move. Start with Sonnet. Upgrade to Opus when stakes warrant the depth. Drop to Haiku when speed matters more than judgment. Counselor email → Sonnet. Gates LOI → Opus. Summarize a 10-page PDF → Haiku or Sonnet. Voice scan on a board memo → Sonnet with thinking on. Open the Model Picker →

Model selection is a Delegation decision, not a vanity choice.

Part 3 of 3

Trust & Output Quality

Q6 Diligence

What is hallucination, and how can I mitigate it?

The plain answer. Hallucination is when Claude confidently states something that isn't true. A fabricated quote. A made-up statistic. A paper that doesn't exist. A person who never said the thing.

Why it happens. Large language models are pattern completers. When they don't know, they generate text that sounds like the right kind of answer. Without grounding, they invent.

The move — four-part discipline:

  1. Load the source. If you ask about a partner, paste the partner's actual website, bio, or past emails. Don't ask Claude to recall.
  2. Ask for citations. "For each claim, name the source." Hallucinated sources are easier to spot than hallucinated facts.
  3. Treat names, numbers, and dates as highest-risk. These are where hallucination hides.
  4. Cross-check anything that lands in front of a funder, board, or partner.

This is the discipline behind the dev-office-director's [T1-3 audited] sentinel from Week 4.

Q7 Diligence

How can I verify and validate?

The plain answer. Two different checks. Both required before anything ships.

Verify — are the facts true? Names, numbers, dates, citations, claims.
Validate — does it fit the work? Right audience, right format, right tone, right move.

Where it breaks. People verify and skip validate — facts are right, tone is wrong for the audience. Or validate and skip verify — it reads great, the numbers are made up.

The move. Design verification into the workflow, not bolted on at the end. Pull at least one external check for any high-stakes claim. Read the output as the audience before you send. For the highest-stakes work, run it through the adversarial-audit Skill from Week 4's gallery — or hand it to a teammate.

Diligence is two moves, not one.

Q8 Description

How can I have better prompts, outcomes, and deliverables?

The plain answer. A great prompt has five parts.

  1. State the work, not the task. (Description)
  2. Anchor in concrete facts — named partners, real numbers, real dates. (Description + Diligence)
  3. Tell Claude the audience and the format. (Description)
  4. Iterate in dialogue. Show Claude what's wrong; don't start over. (Augmentation)
  5. Save what works. (Description carried forward — Project or Skill)

Where it breaks. "Write me an email" gets nothing useful. "Make a deck about our pilot" gets nothing useful. Vague in, vague out.

Before / After — same task, different prompt

VAGUE

"Write a follow-up email to a counselor."

SHARP

"Draft a warm-professional follow-up to Ms. Reyes at McKinley Tech HS. We met last Friday about a Class of 2026 applicant who missed the May 1 renewal deadline. Audience: counselor with a heavy caseload, mid-stress. Format: 3 short paragraphs, signed Preston. Facts: renewal window reopens 7/15, our office hours M–F 9–5, my direct line is on file. Goal: confirm the next step without blame, invite her to forward to the family."

The move. Every great prompt is a SKILL.md waiting to be written. The more reps you put in on the five-part structure, the less you have to rewrite it the next time.

Description is the most valuable move in this whole pilot.

New

Week 5 Deck — Under the Hood

The AI Friday session deck. Each mechanic paired with a real work scenario outside DC CAP — from Mata v. Avianca and Air Canada to attorney case files, hospital triage, and spam filters.

Open Deck →
New

Model Picker (Nonprofit Tier)

Opus, Sonnet, Haiku — with named DC CAP example tasks for each. Printable one-pager.

Open Picker →
Available

AI Fluency Framework

The 4Ds matrix anchor. How Delegation, Description, Discernment, and Diligence interact across Automation, Augmentation, and Agency.

View Framework →
Available

AI Glossary

Plain-language definitions for context window, tokens, model, memory, and the rest of the terminology in this week.

View Glossary →
Worked Example — One Task, All Eight Questions Answered ▾

The task: draft the quarterly counselor partnership update for DC CAP's 28 high school counselor partners.

Q1–Q2 · Context window + Context

A fresh chat, not a six-week running thread. Loaded into the window: the partner list (28 names + schools), DC CAP voice rules, this cycle's calendar, the prior quarter's update for tone reference.

Q3 · Memory

This sits inside a Project — "Counselor Partnerships" — so the next twelve quarterly updates inherit the partner list, voice rules, and cycle context. No retyping.

Q4 · Drift watch

Fresh chat inside the Project. Re-anchor mid-conversation: "remember — counselors with heavy caseloads, warm-professional tone, signed by Preston."

Q5 · Model

Sonnet 4.6. Stakes don't earn Opus — this is recurring partnership comms, not a board memo. Haiku won't hold the voice.

Q6 · Hallucination check

Every partner name verified against the live partner list. Any quoted counselor reply pasted in directly so Claude doesn't fabricate. Cycle dates checked against the cycle calendar.

Q7 · Verify + validate

Verify: names, dates, claims. Validate: tone reads warm-professional, format matches prior updates, the ask at the end is actionable.

Q8 · Prompt

"Draft the quarterly partnership update to our 28 high school counselors. Audience: counselors with heavy caseloads, mid-cycle. Format: 4 short paragraphs + a bulleted action list of three asks. Facts: the data table I pasted above, the cycle calendar in this Project, the prior quarter's update at the bottom for tone reference. Voice: warm-professional, signed Preston. Iterate with me on the opening before drafting the full thing."

Takeaway: the same partnership update you might have written in Week 2 with a one-shot prompt now has a back-end stack underneath. Same output, sturdier reasons. And the next time this update needs to go out, the Project does most of the work — because the answers to Q1, Q2, and Q3 are already in files.

Governance Pulse — Model Selection IS a Governance Decision

Defaulting everything to Opus burns nonprofit-tier compute and slows the team. Defaulting to Haiku ships AI-flavored slop. Sonnet is the responsible default; Opus is reserved for the highest-stakes drafting; Haiku is for lookups and classifications.

Leader scope: as your team builds Projects and Skills, you're picking the model for everyone who runs them. Pick on purpose.

Weekly Reflection ▾
Confidence level (1-5)
Week 6
Getting It Right When Your Assistant Is Often Confidently Wrong

Week 5 named the mechanics. Week 6 deepens one of them — Claude getting things confidently wrong — and builds the verification discipline that catches it before it ships. Eight questions every newer user asks once they've watched a hallucination land. Plain-answer structure. Heavier focus on what to do, in what order, with what tools.

Hallucination is what the model does by design. Verification is what you do by design.

Where Confident-Wrong Hides — The Five Danger Zones

Names
People, programs, partners.
Numbers
Counts, percentages, dollars.
Dates
Deadlines, events, history.
Citations
Papers, quotes, sources.
Your World
DC CAP partners, scholars, internal facts.

External memory is fuzzy on specifics. These five are where you spot-check first — every time, before anything ships.

Part 1 of 3

Why Claude Gets It Wrong

Q1 Diligence

Why does Claude hallucinate in the first place?

The plain answer. Claude predicts the next word that fits the pattern of an answer. There's no database query happening underneath. When the pattern is strong — "the capital of France is" — the prediction is reliable. When the pattern is weak — "the most-cited paper on nonprofit AI adoption is" — the model generates something that looks like an answer. A plausible title. A plausible author. A plausible journal. None of which need to exist.

Where it breaks. People treat Claude like a search engine. Search engines retrieve. Claude generates. A retrieved fact is either there or it isn't. A generated fact is there whether or not it's true.

The move. Build the felt sense: every Claude answer is a generation. Hold one question in your head — "is the pattern strong enough here for the answer to be right?" Pattern is strong on broad, well-repeated things (definitions, well-known events, common code). Pattern is weak on specifics, recent events, and anything inside your own organization.

Real-world parallel — Mata v. Avianca, June 2023

Two attorneys filed a federal brief citing six prior cases. None of the six existed. The model had generated the kind of thing the prompt asked for — plausible case names, plausible courts, plausible citation numbers — and the lawyers never opened a single one. The court fined them and the case became the canonical hallucination story. The lesson is simpler than "AI is bad": the model completed the pattern, and no one checked.

Every confident answer is a completion.

Q2 Diligence

Why does Claude sound so confident when it's wrong?

The plain answer. Claude has no internal sense of how sure it is. There's no quiet "I'm 30% confident" threaded through every sentence. The training data is mostly written by humans who sound confident, so Claude does too. A fabricated case citation reads the same as a real one because, to the model, both are answers shaped like answers.

Where it breaks. Newer users use tone as a quality signal. A confident answer feels true. An uncertain one feels weak. With Claude, that intuition inverts: the most confident-sounding answer is sometimes the one with the least grounding underneath it.

The move. Two reps to rebuild the reflex:

  1. Ask Claude what it's least sure of. Drop this into any answer that matters: "For each claim in this draft, mark high / medium / low confidence and tell me why." The hedging the model wouldn't volunteer comes out when you ask.
  2. Trust the shape of the claim over the tone of the sentence. A specific name + a specific date + a specific number is the highest-risk shape — and the most confident-sounding. Treat that shape as the signal to verify.

Confidence is the model's default register. Calibration is something you ask for.

Q3 Diligence

Where is Claude most likely to be wrong?

The plain answer. Week 5 named external memory — the fuzzy, recall-heavy memory of everything in the training data. External memory is strong on patterns and weak on specifics. The five danger zones at the top of this week — names, numbers, dates, citations, your world — are exactly where specifics live. So those are the five places to check first, every time.

Why your world is the highest-risk of all. Claude wasn't trained on your partner list, your scholar count, your renewal calendar, or your team's voice. When you ask, it pattern-matches to the nearest thing it knows — "education nonprofit in DC" — and invents plausible specifics. Plausible is the dangerous register: not obviously wrong, easy to miss, sometimes embarrassingly close to right.

The move. Two halves — one before the question, one after.

  1. Pre-empt. Load the source before you ask. If the question is about a DC CAP partner, paste the partner's actual page, the most recent email exchange, or the data row. Don't ask Claude to recall.
  2. Spot-check. After the answer, scan the riskiest claim shapes. Names, numbers, dates, citations, your-world specifics — those go on a one-line check list every time.

The closer the question gets to your specific world, the further it lands from Claude's strong patterns.

Part 2 of 3

Where It Compounds

Q4 Description

What is context overload, and why does loading more sometimes make Claude worse?

The plain answer. Loading a 50-page PDF doesn't mean Claude reads all 50 pages equally. Attention degrades through long contexts. The first and last pages stay sharp. The middle thins. The "Lost in the Middle" finding (Liu et al., 2023) showed this with strong evidence — models scored highest when the relevant info sat at the start or end of a long document, and noticeably worse when it sat in the middle.

Where it breaks. "I gave Claude the whole report and asked about page 14. The answer was wrong." Right — page 14 is the middle. The model pattern-completed something that fit the question because the middle was hazy. The fix lives in the context.

The move. Three plays, in order of value:

  1. Excerpt instead of dumping. Paste the three pages that matter. Drop the rest. Sharper context, sharper answer.
  2. Surface the key facts twice — at the top and the bottom. Anchor what matters at the high-attention zones.
  3. Make Claude quote before answering. "Quote the exact sentence you're basing this on, then answer." If the model can't quote it cleanly, the answer isn't grounded — and you've caught the gap.

Sharper context produces sharper answers.

Q5 Description carried forward

Why do my earlier rules fade as the chat goes on?

The plain answer. Week 5 named the symptom: by message 30, the rule you set at message 2 is gone. Here's the mechanic. Attention weighting tends to favor recent tokens — that's how the architecture works. And the model's own outputs become its future inputs, so any small wobble in message 10 feeds the wobble in message 15. By message 30, the wobbles have compounded into a different voice and a different set of working assumptions.

Where it breaks. A chat that started precise ends generic. A draft that started with three specific constraints ends with two general ones. People notice the result ("Claude got worse") and rarely the cause (the early rules got crowded out by the conversation that followed).

The move. Three plays you can run today:

  1. Re-anchor every 8–10 messages. Just paste: "Remember — audience is counselors, voice is warm-professional, deadline is May 1. Confirm." Forces attention back to the original rules.
  2. Start fresh more often. Long chats aren't a virtue. One task per chat keeps drift from compounding.
  3. Move the rules out of the chat. Anything you keep re-anchoring belongs in a Project's reference files. Internal memory beats re-anchoring every time.

Drift is built into the architecture. Build the workflow that fights it.

Part 3 of 3

How to Verify — The Diligence Stack

Q6 Diligence

What manual checks should I always run?

The plain answer. Three checks. Ninety seconds total. Every time.

  1. Scan for the danger zones. Names, numbers, dates, citations, your-world specifics. Mark each one for spot-check.
  2. Ask Claude for the claim table. After the draft: "List every factual claim in the draft above as a table — claim, my confidence (high/medium/low), source. Be honest about confidence." Surfaces what the draft buried.
  3. Open one source. Just one. If Claude said "according to the May counselor report," open it. The single source you actually check is worth more than the ten you assume.

Where it breaks. Two failure modes. Most common: skipping all three. The draft looks good and people send it. Less common but counterproductive: running heavy verification on everything. Manual checks are the floor. They run fast on every artifact; heavier gates stack on top for higher stakes.

The move. Make the ninety seconds a habit — small enough to run every time. The three checks always run. Q7 and Q8 are how the discipline scales up from there.

Manual verification is the floor. Heavier gates stack from there.

Q7 Description

How do I ground Claude in real sources instead of recall?

The plain answer. Most hallucinations happen when Claude is recalling. Loading the source — making the answer retrieve from internal memory instead of generate from external memory — is the single highest-impact move in the verification stack.

Three grounding patterns — easiest to most disciplined

1. Paste the source inline

Before the question, paste the partner email, the policy page, the data row. Twenty seconds. Cuts hallucination on those specifics to near zero.

2. Constrain the answer to the source

Add a one-line guardrail: "Only use facts from the document above. If a fact isn't there, say 'not in source' instead of guessing." Gives Claude permission to say it doesn't know — without permission, the model fills the gap.

3. Two-pass self-audit

First prompt drafts. Second prompt: "For every factual claim in your draft, mark whether you can trace it to a source I gave you, or whether you generated it from training. List the generated ones." The model will surprise you — it knows the difference, but doesn't volunteer it.

Where it breaks. People paste the source and still ask the recall question. ("Here's the partnership agreement — what year did we sign with Georgetown?") If the source has the answer, ask Claude to quote it. If it doesn't, paste the source that does.

Grounded answers are retrievals dressed as generations.

Q8 Diligence carried forward

How do I build verification gates into the workflow itself?

The plain answer. Manual checks (Q6) and source grounding (Q7) are moves you make on one draft. Verification gates are the cadence those moves live inside — what your team inherits every time, every artifact. A gate is a checkpoint where work doesn't pass until specific checks have run.

The three gates — light, standard, full

Light — pre-send check
A one-paragraph prompt run before the draft: "List every name, number, date, and citation you intend to use, with the source. Wait for me to confirm before drafting." Forces the danger zones open before the model commits.
Standard — two-window method
One chat drafts. A second, fresh chat audits the draft. The audit chat has no investment in the draft and catches what the drafting chat protected. Closest thing to a peer in one person's hands.
Full — adversarial-audit Skill
Four lenses, run blind to each other. Convergent flags — two lenses landing on the same line for different reasons — are the verdict. Reserved for board, funder, and external-publication stakes.

Where it breaks. Two failure modes. Skipping the light gate because it "sounds basic" — light is what catches the obvious hallucinations on every draft. Running full on everything because it feels thorough — full takes ten minutes per artifact, and the team that runs full on everything stops running it on anything.

The team move. A verification gate is a leader decision. The cadence you pick is what your team's defaults become. Skip the gate on Tier 2 work and you've moved the exposure from one draft to a thousand. Open the Audit Picker →

Manual checks are what you do. Gates are what your team inherits.

New

Week 6 Deck — Confident Wrong

The AI Friday session deck. Each mechanic paired with a real-world parallel — eyewitness reconstruction, death by GPS, UN interpreters, jury attention, Bartlett's serial reproduction, the WHO surgical checklist, evidence in the record. Ends with a live stack run that catches five hallucinations in one paragraph.

Open Deck →
Available

Audit Picker

The full gate — light, standard, full — mapped to artifact type and data tier. The Model Picker's sibling. Printable.

Open Picker →
Available

AI Glossary

Plain-language definitions for hallucination, attention, context window, external memory, and the rest of this week's terms.

View Glossary →
Available

Three Builds You Can Steal

Week 4's gallery. The adversarial-audit Skill is Build 02 — the full SKILL.md is the structure behind the Full gate in Q8.

Open Gallery →
Worked Example — One Draft, Five Planted Hallucinations, the Stack Catches All Five ▾

Worked example with hallucinations planted against the public record. Every v2 fact is verifiable in Moffatt v. Air Canada, 2024 BCCRT 149 — go check it yourself after.

The artifact: a short recap note for pilot teammates who couldn't make Friday's session, summarizing the standout case from Week 5. Internal, mid-stakes — standard gate.

Draft v1 — what Claude generated from recall

For teammates who couldn't make Friday's session: the standout case we walked through was Moffatt v. Air Canada. In 2023, attorney David Moffatt argued before the Canadian Supreme Court that Air Canada's chatbot had hallucinated a bereavement-fare policy. The court ordered the airline to refund approximately $2,400 and held that Canadian companies are liable for AI-generated statements on their websites. The ruling has been cited in at least a dozen subsequent cases.

Step 1 — Danger-zone scan (Q3 + Q6)

Flagged on a 30-second read:

  • Name + role: "attorney David Moffatt" — recalled. The Week 5 deck named the plaintiff as a passenger. Verify.
  • Date: "2023" — recalled, not pasted. Verify.
  • Court: "Canadian Supreme Court" — recalled. Verify against the actual ruling.
  • Number: "approximately $2,400" — recalled, not pasted. Verify.
  • Citation: "cited in at least a dozen subsequent cases" — no source given. Verify.

Step 2 — Claim table (Q6)

Prompt: "List every factual claim in the draft above as a table —
         claim, your confidence (high/medium/low), source. Be honest."

Claim                                          | Confidence | Source
-----------------------------------------------|------------|------------------
Case discussed Friday was Moffatt v. Air Canada| HIGH       | Week 5 deck
"2023" ruling                                  | LOW        | Generated
"Attorney David Moffatt"                       | LOW        | Generated
Argued before "Canadian Supreme Court"         | LOW        | Generated
Refund of "approximately $2,400"               | LOW        | Generated
Air Canada bound by chatbot's hallucinated     | HIGH       | Public ruling
  bereavement policy                           |            |
Cited in "at least a dozen subsequent cases"   | LOW        | Generated

Notes: Six of seven claims are LOW confidence or generated. Only the
topic (Moffatt was Friday's case) and the underlying principle (Air
Canada held liable for chatbot output) are HIGH and grounded.

Step 3 — Source-loaded rewrite (Q7)

Pasted into the chat: the actual Moffatt v. Air Canada, 2024 BCCRT 149 decision (Jake Moffatt, self-represented passenger; British Columbia Civil Resolution Tribunal; $812.02 award) and the Week 5 deck slide with the verified facts.

New prompt: "Rewrite the paragraph using only facts from the BCCRT decision and the Week 5 deck slide I pasted above. If a fact is not in either source, replace it with [verify] instead of guessing."

Draft v2 — what shipped

For teammates who couldn't make Friday's session: the standout case we walked through was Moffatt v. Air Canada, 2024 BCCRT 149. In early 2024, passenger Jake Moffatt represented himself before the British Columbia Civil Resolution Tribunal after Air Canada's chatbot fabricated a bereavement-fare policy and the airline refused to honor it. The Tribunal ordered Air Canada to pay Moffatt $812.02 and held that the airline is responsible for the information its chatbot generates. The ruling is being cited as Canadian precedent on AI liability [verify scope].

What the stack caught: the year (2023 → 2024), the name and role (attorney David → passenger Jake, self-represented), the court (Canadian Supreme Court → BC Civil Resolution Tribunal), the amount (~$2,400 → $812.02), and the citation-scope claim (generated → bracketed for follow-up). Five fixes from three minutes of verification against the public record.

For higher-stakes drafts going to a board or funder, Q8's full gate (the adversarial-audit Skill) layers four expert lenses on top of this. The Audit Picker linked above maps the gates to artifact types.

This Week's Challenge — Run the Stack on One Real Draft

Pick one draft you wrote with Claude this week — a counselor email, a partner update, an internal memo, a LinkedIn post. Run the Q6 ninety-second pass: danger-zone scan, claim table, open one source.

Bring to Friday: the artifact (sanitized if needed), the claim table Claude produced, and the one fact the stack caught that you didn't see yourself. Friday's question: which of the five danger zones is your blindspot?

Capstone Bridge — Name Your Verification Cadence

One sentence in your capstone scope. Fill the blanks:

"The verification cadence for this build is ______ gate (light / standard / full), run by ______, before ______ ships."

That sentence is the Diligence stub for Section 1. Week 7's peer review pulls from it directly.

Governance Pulse — Verification Cadence by Data Tier

Tier 4 (public) work runs Q6 manual checks every time. Tier 3 (internal strategy) layers in Q7 source-grounding by default. Tier 2 (sensitive partner or operational) requires Q8's standard or full gate, with the verdict in the record. Tier 1 (FERPA/PII) doesn't touch Claude at all. The data tier sets the verification floor; heavier gates can stack on top.

Leader scope: the cadence you pick for your team's build is the cadence every artifact inherits. Picking "light gate, run by the author" is a real decision with a real shape. Skipping the decision means the team picks for itself, one artifact at a time.

Weekly Reflection ▾
Confidence level (1-5)
Reference Library — Available Anytime ▾

AI Fluency Framework

The full 4D framework document

View Framework →

Key Terminology

Definitions and vocabulary reference

View Terms →

Plain-English Glossary

Every pilot term in everyday language

View Glossary →
External

Anthropic Academy

Free courses from the team behind Claude

Open Academy →

Permissions & Quick Reference

Quick Ref

What You Can Safely Do

You CAN:

  • Draft emails and memos
  • Summarize documents
  • Brainstorm ideas
  • Analyze anonymized data
  • Create templates
  • Prepare meeting agendas

Check First:

If your task involves student names, financial details, or partner-specific terms → consult the governance framework

Always:

Review every output before sending. Claude drafts; you own what goes out.

The Daily Briefing

Small, digestible content to build your AI fluency — one day at a time.

Your AI Field Guide

Curated voices, tools, and case studies to keep you sharp between sessions.

Voices in AI for Work

Practical AI thinking for knowledge workers and leaders

📘
One Useful Thing
🎬
Jeff Su
🎙️
Nathaniel Whittemore — AI Daily Brief

Bleeding Edge Updates

Stay current on the frontier of AI development

🔬
Anthropic Research Blog
📰
The Rundown AI
📬
The Neuron
🎧
Latent Space Podcast
🎧
Hard Fork (NYT)

Claude Skills & Reference

Official Anthropic documentation for getting the most from Claude

📖
Prompt Engineering Guide
📋
Claude Model Overview
🛡️
Anthropic Safety & Governance
🎓
Anthropic Learning Hub

Ideas & Inspiration

How education, nonprofits, and philanthropy are deploying AI now

🎓
EDUCAUSE AI Library
🏛️
Gates Foundation on AI
🤝
Stanford Social Innovation Review — Tech
📊
NTEN — Nonprofit Technology Network
📐
Brookings AI Policy