FinOps & Cost Management

Overview

The FinOps module helps engineering leaders understand and optimize AI coding tool costs. It covers four areas:

Cache Analytics — measure prompt caching efficiency and savings per model and engineer
Cost Modeling — compare API-key billing vs subscription plans to find the cheapest option per engineer
Forecasting — project future spend using linear regression on historical daily costs
Budgets — set monthly or quarterly spend limits with automatic status tracking

All endpoints live under /api/v1/finops/ and respect role-based access control.

Cache Analytics

The cache analytics endpoint aggregates prompt cache usage across your organization and computes savings using per-model pricing deltas.

curl http://localhost:8000/api/v1/finops/cache \
  -H "x-admin-key: your-admin-key"

Key metrics returned:

Field	Description
`cache_hit_rate`	Ratio of cache read tokens to total input tokens
`cache_savings_estimate`	Dollar savings from cached vs uncached input pricing
`model_cache_breakdown`	Per-model cache tokens, hit rates, and savings
`engineer_cache_breakdown`	Per-engineer hit rates, savings, and potential upside
`total_potential_additional_savings`	Estimated savings if below-average engineers matched the team average

The per-engineer breakdown identifies engineers with low cache hit rates and estimates how much the team could save if they improved to the team average. This is useful for targeted coaching.

Improving cache hit rates

Cache hit rates improve when engineers reuse conversation context effectively — longer sessions with incremental edits cache better than many short sessions. The per-engineer breakdown helps identify who might benefit from workflow adjustments.

Scoping

All FinOps endpoints support optional query parameters for filtering:

Parameter	Type	Description
`team_id`	string	Filter to a specific team (admin only)
`start_date`	datetime	Start of date range
`end_date`	datetime	End of date range

Team leads automatically see only their team’s data. Engineers see only their own data.

Cost Modeling

Compares each engineer’s actual API spend against Claude subscription plan tiers to find the optimal allocation.

curl http://localhost:8000/api/v1/finops/cost-modeling \
  -H "x-admin-key: your-admin-key"

Access control

Cost modeling requires admin or team_lead role. Regular engineers cannot access this endpoint.

Plan tiers evaluated:

Plan	Monthly Cost	Best For
API Key	$0 (pay per token)	Light users spending under $20/mo
Claude Pro	$20/mo	Moderate users ($20–100/mo API equivalent)
Claude Max 5x	$100/mo	Heavy users ($100–200/mo API equivalent)
Claude Max 20x	$200/mo	Power users spending over $200/mo

The response includes per-engineer recommendations, an allocation summary showing how many engineers belong to each tier, and the total monthly savings from switching to optimal plans.

Response fields:

Field	Description
`engineers[]`	Per-engineer: monthly API cost, recommended plan, savings
`allocation[]`	How many engineers per tier and total tier cost
`total_api_cost_monthly`	What the org currently spends on API billing
`total_optimal_cost_monthly`	What the org would spend with optimal plan assignment
`total_savings_monthly`	Difference between API and optimal

Forecasting

Projects future daily costs using linear regression on historical spending data.

curl "http://localhost:8000/api/v1/finops/forecast?forecast_days=30" \
  -H "x-admin-key: your-admin-key"

The forecast_days parameter controls how far ahead to project (1–365 days, default 30). The response includes:

Field	Description
`historical[]`	Daily cost and session count for the observed period
`forecast[]`	Projected daily cost with upper/lower confidence bounds
`monthly_projection`	Estimated total spend for the next 30 days
`trend_direction`	`"increasing"`, `"decreasing"`, or `"stable"`

The confidence band is derived from the residual standard deviation of the linear fit. A wider band means higher day-to-day cost variance.

Minimum data requirement

Forecasting requires at least 2 days of historical data to fit a regression line. With fewer data points, the endpoint returns historical data only with no forecast.

Budgets

Create monthly or quarterly spend budgets per team or org-wide. Primer tracks actual spend against the budget and computes status automatically.

Create a budget

curl -X POST http://localhost:8000/api/v1/finops/budgets \
  -H "x-admin-key: your-admin-key" \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "550e8400-e29b-41d4-a716-446655440000",
    "name": "Backend Team Monthly",
    "amount": 2000.00,
    "period": "monthly",
    "alert_threshold_pct": 80
  }'

Omit team_id to create an org-wide budget that tracks total spend across all teams.

List budgets

curl http://localhost:8000/api/v1/finops/budgets \
  -H "x-admin-key: your-admin-key"

Each budget in the response includes computed status fields:

Field	Description
`current_spend`	Actual spend so far in the current period
`burn_rate_daily`	Average daily spend rate
`projected_end_of_period`	Projected total spend by period end
`pct_used`	Percentage of budget consumed
`status`	`"on_track"`, `"warning"`, or `"over_budget"`

The status transitions:

on_track — spend is below the alert threshold
warning — spend exceeds alert_threshold_pct but is under 100%
over_budget — spend exceeds the budget amount

Update a budget

curl -X PATCH http://localhost:8000/api/v1/finops/budgets/<budget-id> \
  -H "x-admin-key: your-admin-key" \
  -H "Content-Type: application/json" \
  -d '{"amount": 2500.00}'

Delete a budget

curl -X DELETE http://localhost:8000/api/v1/finops/budgets/<budget-id> \
  -H "x-admin-key: your-admin-key"

Team lead permissions

Team leads can create, update, and delete budgets for their own team only. Attempting to modify another team’s budget returns a 403 error.

Dashboard

The FinOps page in the Primer dashboard provides five tabs:

Tab	Content
Overview	Total spend, cost per session, cost per success, cache savings KPIs
Cache	Savings by model chart, daily savings trend, per-engineer efficiency table
Modeling	Per-engineer plan recommendations, allocation summary, total savings
Forecast	Historical cost chart with projected trend line and confidence band
Budgets	Budget cards with progress bars, burn rates, and status indicators

Navigate to FinOps in the sidebar or click “View FinOps” from the main dashboard overview.

Pricing

Cost calculations use per-model token pricing with longest-prefix matching. Primer ships with pricing for:

Anthropic: Claude Opus 4.6, Opus 4.5, Opus 4, Sonnet 4 (covers 4/4.5/4.6), Sonnet 3.5, Haiku 4.5, Haiku 3.5
OpenAI: GPT-5.3 Codex, GPT-5.2, GPT-5, GPT-5 Mini, GPT-4o, GPT-4o Mini, GPT-4.1, GPT-4.1 Mini, GPT-4.1 Nano, o3, o3-mini, o4-mini, o1, o1-mini, Codex Mini
Google: Gemini 3.1 Pro, Gemini 3.0 Pro, Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.0 Flash, Gemini 1.5 Pro, Gemini 1.5 Flash

Each model has four price points: input, output, cache read, and cache creation tokens. To update pricing for new models, edit MODEL_PRICING in src/primer/common/pricing.py (backend) and frontend/src/lib/utils.ts (frontend).

Unknown models

If a session uses a model not in the pricing table, Primer falls back to Claude Sonnet 4 pricing. Add new model entries as they become available to ensure accurate cost tracking.