Pilot KPI Framework

Three tiers of measurement for one deployment story.

Overview

Most organizations measure the wrong things. 23% can accurately measure AI ROI despite 89% deploying tools. This framework gives DC CAP three layers of evidence — what people do (engagement), how well they do it (proficiency), and what it produces (impact). Together, these layers tell the June board briefing story: the investment is working, here's the proof, here's what we need for Q1.

Tier 1

Engagement

Are people using it?

Tracked weekly starting now

Tier 2

Proficiency

Are they getting better?

Tracked biweekly starting Week 3

Tier 3

Impact

Is it producing results?

Tracked at Day 45 and Day 60

Tier 1: Engagement KPIs

Engagement measures adoption and basic usage patterns. These metrics track whether participants are actually using Claude and how frequently.

Monthly Active Users

Target: 80%+ of pilot cohort (7+ of 9) by Day 60.

Data source: Claude Enterprise admin panel.

Prompts per Participant per Week

Baseline: Established Week 1.

Target: 15+ by Week 6.

Segmentation: Light (1-5/week), Moderate (6-19), Heavy (20+).

Active User Segmentation

Track distribution across Light/Moderate/Heavy each week.

Success = Cohort shifting rightward over time (more users adopting heavier usage patterns).

How We Track: Data source is Claude Enterprise admin panel usage logs. Preston pulls weekly reports and shares with unit leads.

Tier 2: Proficiency KPIs

Proficiency measures skill development and effective use patterns. These metrics track whether participants are learning to use Claude better over time.

Iteration Frequency

Self-report: "How many times this week did you revise a prompt or push back on Claude's output?"

Scale: Never / Once / A few times / Most sessions / Every session.

Early warning: "Never" for 2+ consecutive weeks triggers 1:1 coaching.

Observable Fluency Behaviors

Target: 70%+ participants demonstrate 3+ behaviors by Day 60.

Behaviors tracked: Iteration on drafts, clarifying goals, questioning model reasoning, identifying missing context, specifying output formats, providing examples, fact-checking outputs.

Session Depth

Metric: Average conversation length increasing over time.

Data: Enterprise admin data if available, cross-referenced with self-report.

How We Track: Weekly self-report (30 seconds per person) + enterprise admin data cross-reference. Added to weekly check-in protocol starting Week 1.

Tier 3: Impact KPIs

Impact measures actual value delivery. These metrics track whether AI adoption is redesigning workflows, saving time, improving quality, and freeing capacity for high-value work.

Documented Workflow Redesigns

Target: Minimum 2 per unit (6 total).

Each documents: old process, new AI-integrated process, governance check, and measured outcome.

Time Savings Evidence

Target: Each participant documents at least 1 task with before/after time comparison.

Aggregated into "total hours redirected per week" metric.

Quality Improvement Evidence

Target: At least 3 examples of AI-assisted outputs rated higher quality than previous manual outputs.

Peer-assessed using standard rubric.

Mission Connection

Target: At least 1 example per unit of AI-freed time redirected to high-value student interaction.

Demonstrates impact on core mission, not just efficiency.

How We Track: Workflow redesign templates deployed Week 3. Each participant completes at least one. These templates become board briefing evidence.

Success Thresholds & Decisions

At Day 45 and Day 60, data across all three tiers informs a go/pause/pivot decision for Q1 planning. Success is not just high numbers—it's evidence that the pilot can scale responsibly.

SCALE

• 70%+ monthly active users
• 60%+ demonstrating iteration
• 3+ documented workflow redesigns

Recommendation: Full Q1 rollout with cohort-based structure.

PAUSE

• 40-70% active users
• Proficiency plateau
• Budget/bandwidth constraints

Recommendation: Extend pilot 30 days. Diagnose barriers before scaling.

PIVOT

• Below 40% active users
• Leadership misalignment
• Governance incidents

Recommendation: Restructure approach before investing further.

Tracking Calendar

Key milestones for data collection and review across the 9-week pilot window.

Week 1
Apr 7–11
Tier 1 baseline established — First week of usage data collected.
Tier 2 self-report added — Iteration frequency question integrated into weekly check-in.
Week 3
Apr 21–25
First Tier 2 biweekly pull — Iteration patterns and fluency behaviors assessed.
Workflow redesign templates deployed — Tier 3 tracking begins.
Week 5
May 5–9
Tier 3 first check — Early workflow redesigns in progress, time savings anecdotes collected.
Week 7
May 19–23
Tier 2 + Tier 3 second check — Proficiency trends, workflow redesigns consolidated, quality examples gathered.
Week 9
Jun 2–5
Final data pull all tiers — Complete usage, proficiency, and impact data compiled.
Board narrative compiled — Evidence organized for June board briefing.