What You Can Learn
Primer isn’t a dashboard you glance at — it’s an intelligence layer that answers specific questions about how your organization uses AI coding tools. This guide explains what you can learn at each level, how to interpret the metrics, and what actions to take.
The Three Levels
Primer provides insights at three levels, each targeting a different audience:
| Level | Audience | Core Question |
|---|---|---|
| Organization | VP Eng, Directors | Is our AI investment working? |
| Team | Team Leads, Managers | How do I help my team get better? |
| Individual | Engineers | How do I improve my own practice? |
Every metric in Primer exists to answer a question that couldn’t be answered before AI tools generated session data.
Organization Level
Adoption & ROI
The question: Are we getting value from our AI tool investment?
Primer computes:
- Adoption rate — percentage of engineers who used an AI tool in the period. A common pattern: adoption starts at 30-40% and plateaus around 80-90%. If you’re stuck below 60%, look at the maturity page to find which teams lag.
- Sessions per engineer per day — a proxy for integration depth. Early adopters average 1-2/day; power users hit 4-6. Below 1/day often means the tool isn’t integrated into daily workflow yet.
- ROI ratio — estimated time saved per dollar spent (configurable via
PRIMER_MINUTES_PER_SESSION). This isn’t precise, but directionally useful for justifying budget. - Cost per successful outcome — total spend divided by sessions that achieved their goal. This is the number to optimize, not raw cost.
Track adoption rate over time, not as a snapshot. A team at 60% adoption trending upward is healthier than one at 85% and flat — the first is still learning, the second may have stalled.
Friction Analysis
The question: What systemic issues are making AI tools less effective across the org?
Primer classifies friction into types:
| Friction Type | What It Means | Typical Fix |
|---|---|---|
permission_denied | Tool tried to access something it couldn’t | Update .claude/settings.json allowed directories |
context_limit | Session hit token limit | Break tasks into smaller sessions; use project-scoped instructions |
timeout | Tool or command timed out | Investigate slow test suites or builds |
edit_conflict | Tool’s edit was rejected or conflicted | Improve CLAUDE.md with project conventions |
tool_error | An MCP tool or integration failed | Check tool configuration and availability |
The key insight: Primer doesn’t just count friction — it scores impact. A friction type that occurs often but doesn’t affect success rates is noise. A type that occurs less often but correlates with 40% lower success rates is worth fixing immediately.
Look at the Friction Impact chart on the dashboard. The x-axis is frequency, the y-axis is success rate penalty. Items in the upper-right quadrant are your highest-priority fixes.
Cost Intelligence
The question: Are we spending efficiently?
Beyond raw spend tracking, Primer provides:
- Cache savings analysis — How much money prompt caching is saving and how much more it could save. If your org-wide cache hit rate is below 40%, engineers may not be using project context files effectively.
- API vs. subscription modeling — For each engineer, Primer estimates whether they’d be cheaper on API billing, Pro ($20/mo), or Max ($100/mo). The aggregate recommendation shows potential monthly savings.
- 30-day cost forecast — Linear regression on daily costs with confidence bands. Useful for budget planning and catching trends before they become problems.
- Budget burn-rate alerts — Set monthly budgets per team. Primer calculates daily burn rate and projects whether you’ll exceed the budget, alerting before it happens.
Team Level
Engineer Benchmarking
The question: How do my engineers compare, and who needs help?
The Engineers leaderboard ranks by multiple dimensions:
- Success rate — percentage of sessions that achieve their goal. Team median is typically 55-70%. Engineers consistently below 50% likely need configuration help or training.
- Cost efficiency — total cost divided by sessions. High cost with high success is fine. High cost with low success means the engineer is spending more but getting less.
- Leverage score — a 0-100 composite measuring how sophisticatedly someone uses AI tools (tool diversity, orchestration use, cache efficiency). Higher leverage correlates with higher success rates.
- PR impact — number of PRs linked to AI sessions, and merge rate of those PRs.
Don’t use these metrics punitively. An engineer with a low success rate but high session count is trying hard — they need guidance, not criticism. An engineer with zero sessions needs a different conversation about whether they’ve adopted at all.
Onboarding & Ramp-Up
The question: How quickly are new hires becoming effective with AI tools?
Primer’s cohort analysis groups engineers by their first session date and tracks how their metrics evolve week-over-week. You can see:
- Time to team median — how many weeks until a new hire’s success rate matches the team average
- Ramp-up patterns — whether new hires plateau early or continue improving
- Configuration gaps — new hires often have default configs; Primer’s config optimization flags when someone’s setup differs significantly from high-performers
Code Quality Impact
The question: Does AI-assisted code meet our quality bar?
Primer links sessions to GitHub PRs and compares:
| Metric | Claude-Assisted | Non-Claude | What It Means |
|---|---|---|---|
| Merge rate | Higher/lower? | Baseline | Are AI PRs being accepted? |
| Review comments | More/fewer? | Baseline | Do AI PRs need more review? |
| Time to merge | Faster/slower? | Baseline | Does AI speed up delivery? |
| Code volume | Larger/smaller? | Baseline | Are AI sessions more ambitious? |
These comparisons answer the skeptic’s question: “Is the AI actually helping or just creating more work for reviewers?”
Primer also tracks automated review findings from bots like BugBot (cursor[bot]). For each PR, it parses review comments to extract:
- Severity breakdown — how many high, medium, and low severity issues were flagged
- Fix rate — percentage of findings resolved before merge
- Average findings per PR — trend over time to see if AI-assisted code is improving
This gives you a concrete quality signal beyond merge rates: are AI-assisted PRs generating fewer (or more) automated review issues than the baseline?
Individual Level
Personal Trajectory
The question: Am I getting better at using AI tools?
Each engineer’s profile shows weekly sparklines for:
- Success rate trend — are you succeeding more often over time?
- Session volume — are you using the tool more or less?
- Cost trend — is your cost per session stable, rising, or falling?
- Tool diversity — are you expanding your tool usage or stuck on the basics?
The goal is upward trends in success rate and tool diversity, with stable or declining cost per session.
Friction Breakdown
The question: What specific problems keep disrupting my sessions?
Your profile shows your top friction types ranked by frequency and impact. Common patterns:
- Heavy on
context_limit? You’re probably trying to do too much in single sessions. Break work into focused tasks. - Lots of
permission_denied? Your Claude settings need updating — you’re hitting guardrails that slow you down. - Frequent
edit_conflict? Your project’s CLAUDE.md may need better conventions about code style and patterns. tool_errorspikes? An MCP server or integration is unreliable — check your tooling setup.
AI-Generated Insights
The question: What’s the big picture of my usage patterns?
Primer uses Claude to generate narrative reports about your patterns:
- At a Glance — What’s working, what’s hindering, quick wins
- Impressive Things You Did — Highlights from your sessions
- Where Things Go Wrong — Friction pattern analysis
- New Usage Patterns — Emerging behaviors worth building on
- On the Horizon — Suggestions for what to try next
These narratives synthesize hundreds of data points into actionable stories.
MCP Sidecar
The question: How am I doing right now?
Engineers can query their own data mid-session via the MCP sidecar in Claude Code:
- “What’s my success rate this week?”
- “What friction am I hitting most?”
- “How do I compare to my team?”
- “What recommendations do you have for me?”
This closes the loop — instead of checking a dashboard after the fact, engineers get feedback in the moment they can act on it.
How to Think About These Metrics
Leverage vs. Effectiveness: The Two Dimensions of AI Maturity
Primer measures AI tool usage along two independent dimensions:
Leverage (0-100) measures how sophisticatedly an engineer uses AI tools. It has three sub-scores:
| Sub-Score | What It Measures | Factors |
|---|---|---|
| Tool Mastery (33%) | Breadth of tool usage | Tool diversity (Shannon entropy) + category spread (5 categories) |
| Orchestration Depth (33%) | Sophistication of delegation | Orchestration+skill ratio + agent team detection (bonus) |
| Efficiency (33%) | Resource intelligence | Cache hit rate + model diversity (cost tier coverage) |
The model diversity factor rewards engineers who choose cost-appropriate models — using Haiku for quick lookups, Sonnet for standard coding, and Opus for complex architecture rather than defaulting to the most powerful model for everything.
The agent team detection factor rewards multi-agent orchestration (TeamCreate, SendMessage, Agent coordination). This is a bonus: engineers who don’t use teams aren’t penalized; those who do get an uplift.
Effectiveness (0-100) measures how well it’s working — the outcomes:
| Factor | Weight | What It Measures |
|---|---|---|
| Success rate | 40% | Percentage of sessions that achieve their goal |
| Cost efficiency | 30% | Cost per success vs. team median (lower = better) |
| Session health | 30% | Composite health score |
Together, Leverage and Effectiveness create four diagnostic quadrants:
| High Effectiveness | Low Effectiveness | |
|---|---|---|
| High Leverage | Mastery — advanced tooling, strong outcomes | Experimenting — sophisticated but outcomes need work |
| Low Leverage | Efficient basics — ships results, could level up | Needs support — both tool usage and outcomes lag |
The AI Maturity page shows both scores per engineer with a quadrant view. Use this to target coaching: high-leverage/low-effectiveness engineers need better problem framing, not more tool training. Low-leverage/high-effectiveness engineers are ready to learn advanced patterns.
Leading vs. Lagging Indicators
| Type | Metrics | Why |
|---|---|---|
| Leading | Adoption rate, leverage score, model diversity, tool diversity | Predict future outcomes |
| Lagging | Effectiveness score, success rate, cost per outcome, PR merge rate | Confirm past outcomes |
Focus on leading indicators for planning and lagging indicators for validation.
Healthy Ranges
These are rough benchmarks from typical engineering organizations:
| Metric | Concerning | Healthy | Excellent |
|---|---|---|---|
| Adoption rate | < 40% | 60-85% | > 85% |
| Success rate | < 45% | 55-70% | > 75% |
| Sessions/engineer/day | < 0.5 | 1-3 | 3-6 |
| Leverage score | < 25 | 35-55 | > 60 |
| Effectiveness score | < 40 | 50-70 | > 75 |
| Cache hit rate | < 20% | 30-50% | > 50% |
| Model diversity | 1 model | 2-3 models | 3+ across tiers |
| Cost per success | Trending up | Stable | Trending down |
These benchmarks will vary by organization, team size, and project complexity. Use them as starting points, then establish your own baselines.
Actions by Signal
| You See | What It Means | What to Do |
|---|---|---|
| High adoption, low success | Engineers are using the tool but struggling | Invest in CLAUDE.md, shared prompts, and training |
| Low adoption, high success | A few people love it, most haven’t tried | Pair programming, demos, and reduce setup friction |
| Rising costs, stable success | Usage is growing but efficiency isn’t | Review cache optimization and session scope |
| Friction spike on one team | Something changed in their environment | Check recent infra changes, permission updates, or tooling |
| Leverage score plateau | Engineers stopped exploring new capabilities | Share patterns from high-leverage users, run workshops |
| High leverage, low effectiveness | Engineers know the tools but outcomes lag | Focus on problem framing, CLAUDE.md quality, and session scope |
| Low model diversity | Engineers defaulting to one model for everything | Train on model selection — Haiku for lookups, Sonnet for coding, Opus for architecture |
| New hire ramp-up > 4 weeks | Onboarding for AI tools needs work | Create team-specific getting-started guides |
Where to Find These Insights
Each insight area has a dedicated page in the Primer dashboard:
| Page | What It Shows | Key Questions Answered |
|---|---|---|
Organization (/dashboard) | KPIs, activity trends, outcomes, alerts, deep-dive navigation | How is our AI investment performing overall? |
Friction (/friction) | Friction trends, impact scoring, project-level breakdown | What systemic issues hurt AI effectiveness most? |
Code Quality (/quality) | PR metrics, Claude-vs-non-Claude comparison, automated review findings, engineer rankings | Are AI-assisted PRs better? |
Sessions (/sessions) | Browse tab + Insights tab with health, satisfaction, goals, cache | How are sessions performing in aggregate? |
AI Maturity (/maturity) | Leverage + Effectiveness scores, tool categories, Tools tab, model diversity, agent teams | How mature is our AI usage? How effective are engineers? |
Growth (/growth) | Onboarding cohorts, shared patterns, team skill gaps | How fast are new hires ramping up? What patterns should we share? |
FinOps (/finops) | Cost tracking, cache analytics, subscription modeling, budgets | Are we spending efficiently? |
Synthesis (/synthesis) | AI-generated narrative reports at org, team, and engineer scope | What’s the big picture story? |
Engineers (/engineers) | Leaderboard with success rates, costs, leverage scores | Who needs help? Who’s excelling? |
Engineer Profile (/engineers/:id) | Weekly trajectory, friction breakdown, strengths, AI-generated insights | How is this individual doing? |
Next Steps
- Installation — Get Primer running in 5 minutes
- FinOps & Cost Management — Deep dive into cost optimization
- Alert Thresholds — Set up automated anomaly detection