AI Cost Analysis

Development & Testing Costs

Primary model: Claude Sonnet 4.6 ($3/MTok input, $15/MTok output)

Additional costs:

Eval scoring (Haiku 4.5): ~86 cases x ~500 tokens/judgment = ~43K tokens = $0.05
Multiple eval runs during development: ~10 runs x $0.05 = $0.50
Estimated total dev spend: ~$2.00

Model	Avg Input	Avg Output	Cost/Chat
Sonnet 4.6 (default)	4,128 tok	200 tok	$0.0154
Haiku 4.5	4,128 tok	200 tok	$0.0051
Opus 4.6	4,128 tok	200 tok	$0.0769

Queries per user per day: 5 (portfolio check, performance, transactions, market data, misc)
Avg tokens per query: 4,328 (4,128 input + 200 output) — observed from production data
Tool call frequency: 1.9 tool calls/chat average (observed)
Verification overhead: Negligible (deterministic, no extra LLM calls)
Cache warming overhead: warmPortfolioCache runs after activity/account writes — Redis + BullMQ job, zero LLM tokens, up to 30s added latency per write operation
Model mix: 90% Sonnet 4.6, 8% Haiku 4.5, 2% Opus 4.6
Blended cost per chat: $0.0154 x 0.90 + $0.0051 x 0.08 + $0.0769 x 0.02 = $0.0157

Scale	Users	Chats/Month	Token Volume	Monthly Cost
100 users	100	15,000	64.9M tokens	$235
1,000 users	1,000	150,000	649M tokens	$2,355
10,000 users	10,000	1,500,000	6.49B tokens	$23,550
100,000 users	100,000	15,000,000	64.9B tokens	$235,500

Strategy	Estimated Savings	Trade-off
Default to Haiku 4.5 for simple queries	60-70%	Slightly less nuanced responses
Prompt caching (repeated system prompt)	30-40%	Requires API support
Response caching for market data	10-20%	Staleness window
Reduce system prompt size	15-25%	Less detailed agent behavior

Scale	AI Cost	Infra	Total/Month	Cost/User/Month
100 users	$235	$42	$277	$2.77
1,000 users	$2,355	$42	$2,397	$2.40
10,000 users	$23,550	$100*	$23,650	$2.37
100,000 users	$235,500	$500*	$236,000	$2.36

*Infrastructure scales with traffic — estimated for higher tiers.

Cost is tracked live via GET /api/v1/agent/metrics:

{
  "cost": {
    "totalUsd": 0.0226,
    "avgPerChatUsd": 0.0226
  }
}

Computed per request using model-specific pricing (Sonnet/Haiku/Opus rates) applied to actual prompt and completion token counts.