Documentation Guides FinOps & Cost Management

FinOps & Cost Management

Overview

The FinOps module helps engineering leaders understand and optimize AI coding tool costs. It covers four areas:

  1. Cache Analytics — measure prompt caching efficiency and savings per model and engineer
  2. Cost Modeling — compare API-key billing vs subscription plans to find the cheapest option per engineer
  3. Forecasting — project future spend using linear regression on historical daily costs
  4. Budgets — set monthly or quarterly spend limits with automatic status tracking

All endpoints live under /api/v1/finops/ and respect role-based access control.

Cache Analytics

The cache analytics endpoint aggregates prompt cache usage across your organization and computes savings using per-model pricing deltas.

curl http://localhost:8000/api/v1/finops/cache \
  -H "x-admin-key: your-admin-key"

Key metrics returned:

FieldDescription
cache_hit_rateRatio of cache read tokens to total input tokens
cache_savings_estimateDollar savings from cached vs uncached input pricing
model_cache_breakdownPer-model cache tokens, hit rates, and savings
engineer_cache_breakdownPer-engineer hit rates, savings, and potential upside
total_potential_additional_savingsEstimated savings if below-average engineers matched the team average

The per-engineer breakdown identifies engineers with low cache hit rates and estimates how much the team could save if they improved to the team average. This is useful for targeted coaching.

Improving cache hit rates

Cache hit rates improve when engineers reuse conversation context effectively — longer sessions with incremental edits cache better than many short sessions. The per-engineer breakdown helps identify who might benefit from workflow adjustments.

Scoping

All FinOps endpoints support optional query parameters for filtering:

ParameterTypeDescription
team_idstringFilter to a specific team (admin only)
start_datedatetimeStart of date range
end_datedatetimeEnd of date range

Team leads automatically see only their team’s data. Engineers see only their own data.

Cost Modeling

Compares each engineer’s actual API spend against Claude subscription plan tiers to find the optimal allocation.

curl http://localhost:8000/api/v1/finops/cost-modeling \
  -H "x-admin-key: your-admin-key"

Access control

Cost modeling requires admin or team_lead role. Regular engineers cannot access this endpoint.

Plan tiers evaluated:

PlanMonthly CostBest For
API Key$0 (pay per token)Light users spending under $20/mo
Claude Pro$20/moModerate users ($20–100/mo API equivalent)
Claude Max 5x$100/moHeavy users ($100–200/mo API equivalent)
Claude Max 20x$200/moPower users spending over $200/mo

The response includes per-engineer recommendations, an allocation summary showing how many engineers belong to each tier, and the total monthly savings from switching to optimal plans.

Response fields:

FieldDescription
engineers[]Per-engineer: monthly API cost, recommended plan, savings
allocation[]How many engineers per tier and total tier cost
total_api_cost_monthlyWhat the org currently spends on API billing
total_optimal_cost_monthlyWhat the org would spend with optimal plan assignment
total_savings_monthlyDifference between API and optimal

Forecasting

Projects future daily costs using linear regression on historical spending data.

curl "http://localhost:8000/api/v1/finops/forecast?forecast_days=30" \
  -H "x-admin-key: your-admin-key"

The forecast_days parameter controls how far ahead to project (1–365 days, default 30). The response includes:

FieldDescription
historical[]Daily cost and session count for the observed period
forecast[]Projected daily cost with upper/lower confidence bounds
monthly_projectionEstimated total spend for the next 30 days
trend_direction"increasing", "decreasing", or "stable"

The confidence band is derived from the residual standard deviation of the linear fit. A wider band means higher day-to-day cost variance.

Minimum data requirement

Forecasting requires at least 2 days of historical data to fit a regression line. With fewer data points, the endpoint returns historical data only with no forecast.

Budgets

Create monthly or quarterly spend budgets per team or org-wide. Primer tracks actual spend against the budget and computes status automatically.

Create a budget

curl -X POST http://localhost:8000/api/v1/finops/budgets \
  -H "x-admin-key: your-admin-key" \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "550e8400-e29b-41d4-a716-446655440000",
    "name": "Backend Team Monthly",
    "amount": 2000.00,
    "period": "monthly",
    "alert_threshold_pct": 80
  }'

Omit team_id to create an org-wide budget that tracks total spend across all teams.

List budgets

curl http://localhost:8000/api/v1/finops/budgets \
  -H "x-admin-key: your-admin-key"

Each budget in the response includes computed status fields:

FieldDescription
current_spendActual spend so far in the current period
burn_rate_dailyAverage daily spend rate
projected_end_of_periodProjected total spend by period end
pct_usedPercentage of budget consumed
status"on_track", "warning", or "over_budget"

The status transitions:

  • on_track — spend is below the alert threshold
  • warning — spend exceeds alert_threshold_pct but is under 100%
  • over_budget — spend exceeds the budget amount

Update a budget

curl -X PATCH http://localhost:8000/api/v1/finops/budgets/<budget-id> \
  -H "x-admin-key: your-admin-key" \
  -H "Content-Type: application/json" \
  -d '{"amount": 2500.00}'

Delete a budget

curl -X DELETE http://localhost:8000/api/v1/finops/budgets/<budget-id> \
  -H "x-admin-key: your-admin-key"

Team lead permissions

Team leads can create, update, and delete budgets for their own team only. Attempting to modify another team’s budget returns a 403 error.

Dashboard

The FinOps page in the Primer dashboard provides five tabs:

TabContent
OverviewTotal spend, cost per session, cost per success, cache savings KPIs
CacheSavings by model chart, daily savings trend, per-engineer efficiency table
ModelingPer-engineer plan recommendations, allocation summary, total savings
ForecastHistorical cost chart with projected trend line and confidence band
BudgetsBudget cards with progress bars, burn rates, and status indicators

Navigate to FinOps in the sidebar or click “View FinOps” from the main dashboard overview.

Pricing

Cost calculations use per-model token pricing with longest-prefix matching. Primer ships with pricing for:

  • Anthropic: Claude Opus 4.6, Opus 4.5, Opus 4, Sonnet 4 (covers 4/4.5/4.6), Sonnet 3.5, Haiku 4.5, Haiku 3.5
  • OpenAI: GPT-5.3 Codex, GPT-5.2, GPT-5, GPT-5 Mini, GPT-4o, GPT-4o Mini, GPT-4.1, GPT-4.1 Mini, GPT-4.1 Nano, o3, o3-mini, o4-mini, o1, o1-mini, Codex Mini
  • Google: Gemini 3.1 Pro, Gemini 3.0 Pro, Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.0 Flash, Gemini 1.5 Pro, Gemini 1.5 Flash

Each model has four price points: input, output, cache read, and cache creation tokens. To update pricing for new models, edit MODEL_PRICING in src/primer/common/pricing.py (backend) and frontend/src/lib/utils.ts (frontend).

Unknown models

If a session uses a model not in the pricing table, Primer falls back to Claude Sonnet 4 pricing. Add new model entries as they become available to ensure accurate cost tracking.