12 KiB
Agent Architecture Document
Domain & Use Cases
Domain: Personal finance portfolio management (Ghostfolio fork)
Problems solved: Ghostfolio's existing UI requires manual navigation across multiple pages to understand portfolio state. The agent provides a conversational interface that synthesizes holdings, performance, market data, and transaction history into coherent natural language — and can execute write operations (account management, activity logging, watchlist, tags) with native tool approval gates.
Target users: Self-directed investors tracking multi-asset portfolios (stocks, ETFs, crypto) who want quick portfolio insights without clicking through dashboards.
Agent Architecture
Framework & Stack
| Layer | Choice | Rationale |
|---|---|---|
| Runtime | NestJS (TypeScript) | Native to Ghostfolio codebase |
| Agent framework | Vercel AI SDK v6 (ToolLoopAgent) |
Native TS, streaming SSE, built-in tool dispatch |
| LLM | Claude Sonnet 4.6 (default) | Strong function calling, structured output, 200K context |
| Model options | Haiku 4.5 ($0.005/chat), Sonnet 4.6 ($0.015/chat), Opus 4.6 ($0.077/chat) | User-selectable per session |
| Schemas | Zod v4 | Required by AI SDK v6 inputSchema |
| Database | Prisma + Postgres | Shared with Ghostfolio, plus agent-specific tables |
| Cache warming | warmPortfolioCache helper |
Redis + BullMQ (PortfolioSnapshotService) — ensures portfolio reads reflect recent writes |
Request Flow
User message
│
▼
POST /api/v1/agent/chat (JWT auth)
Body: { messages: UIMessage[], toolHistory?, model?, approvedActions? }
│
▼
ToolLoopAgent created → pipeAgentUIStreamToResponse()
│
├─► prepareStep()
│ ├─ Injects current date into system prompt
│ ├─ All 10 tools available from step 1 (activity_manage auto-resolves accountId)
│ └─ Loads contextual SKILL.md files based on tool history
│
├─► LLM reasoning → tool selection
│ └─ Up to 10 steps (stopWhen: stepCountIs(10))
│
├─► Tool execution (try/catch per tool)
│ └─ Returns structured JSON to LLM for synthesis
│
├─► Approval gate (write tools only)
│ ├─ needsApproval() evaluates per invocation
│ ├─ Skips: list actions, previously-approved signatures, SKIP_APPROVAL env
│ └─ If required: stream pauses → client shows approval card → resumes on approve/deny
│
├─► Post-write cache warming (activity_manage, account_manage)
│ └─ warmPortfolioCache: clear Redis → drain stale jobs → enqueue HIGH priority → await (30s timeout)
│
├─► SSE stream to client (UIMessage protocol)
│ └─ Events: text-delta, tool-input-start, tool-input-available, tool-approval-request, tool-output-available, finish
│
└─► onFinish callback
├─ Verification pipeline (3 systems)
├─ Metrics recording (in-memory + Postgres)
└─ Structured log: chat_complete
Tool Design
10 tools organized into read (6) and write (4) categories:
| Tool | Type | Purpose |
|---|---|---|
portfolio_analysis |
Read | Holdings, allocations, total value, account breakdown |
portfolio_performance |
Read | Returns, net performance, chart data (downsampled to ~20 points) |
holdings_lookup |
Read | Deep dive on single position: dividends, fees, sectors, countries |
market_data |
Read | Live quotes for 1-10 symbols (FMP + CoinGecko) |
symbol_search |
Read | Disambiguate crypto vs stock, find correct data source |
transaction_history |
Read | Buys, sells, dividends, fees, deposits, withdrawals |
account_manage |
Write | CRUD accounts + transfers between accounts |
activity_manage |
Write | CRUD transactions (BUY/SELL/DIVIDEND/FEE/INTEREST/LIABILITY) |
watchlist_manage |
Write | Add/remove/list watchlist items |
tag_manage |
Write | CRUD tags for transaction organization |
Auto-resolution: activity_manage auto-resolves accountId when omitted on creates — matches accounts by asset type keywords (crypto → "crypto"/"wallet" accounts, stocks → "stock"/"brokerage" accounts) with fallback to highest-activity account. No tool gating; all 10 tools available from step 1.
Approval gates: All 4 write tools define needsApproval — a function-based gate evaluated per invocation. Read-only actions (list) and previously-approved action signatures are auto-skipped. SKIP_APPROVAL=true env var disables all gates (used in evals). Action signatures follow the pattern tool_name:action:identifier (e.g., activity_manage:create:AAPL).
| Tool | Approval Rule |
|---|---|
activity_manage |
create and delete only (not update), unless signature in approvedActions |
account_manage |
Everything except list, unless signature in approvedActions |
tag_manage |
Everything except list |
watchlist_manage |
Everything except list |
Cache warming: After every write operation in activity_manage and account_manage, warmPortfolioCache() runs — clears stale Redis portfolio snapshots, drains in-flight BullMQ jobs, enqueues fresh computation at HIGH priority, and awaits completion with a 30s timeout. Ensures subsequent read tools return up-to-date portfolio data. Injected via PortfolioSnapshotService, RedisCacheService, and UserService.
Skill injection: Contextual SKILL.md documents (transaction workflow, market data patterns) are injected into the system prompt based on which tools have been used, providing the LLM with domain-specific guidance without bloating every request.
Verification Strategy
Three deterministic systems run on every response in the onFinish callback:
| System | What It Checks | Weight in Composite |
|---|---|---|
| Output Validation | Min/max length, numeric data present when tools called, disclaimer on forward-looking language, write claims backed by write tool calls | 0.3 |
| Hallucination Detection | Ticker symbols in response exist in tool data, dollar amounts match within 5% or $1, holding claims reference actual tool results | 0.3 |
| Confidence Score | Composite 0-1: tool success rate (0.3), step efficiency (0.1), output validity (0.3), hallucination score (0.3) | — |
Why deterministic: No additional LLM calls = zero latency overhead and zero cost. Cross-references response text against actual tool result data. Catches the most common financial hallucination patterns (phantom tickers, fabricated dollar amounts) with high precision.
Persistence: Results stored on AgentChatLog (verificationScore, verificationResult) and exposed via GET /agent/verification/:requestId.
Eval Results
Framework: evalite — dedicated eval runner with scoring, UI, and CI integration.
| Suite | Cases | Pass Rate | Scorers |
|---|---|---|---|
| Golden Set | 19 | 100% | GoldenCheck (deterministic binary) |
| Scenarios | 67 | 91% | ToolCallAccuracy (F1), HasResponse, ResponseQuality (LLM-judged) |
| Total | 86 |
Category breakdown: single-tool (10), multi-tool (10), ambiguous (6), account management (8), activity management (10), watchlist (4), tag (4), multi-step write (4), adversarial write (4), edge cases (7), golden routing (7), golden structural (4), golden behavioral (2), golden guardrails (4).
CI gate: GitHub Actions runs golden set on every push to main. Threshold: 100%. Hits deployed Render instance.
Remaining 9% scenario gap: Agent calls extra tools on ambiguous queries (thorough, not wrong). Write-operation approval is now handled by native needsApproval gates (bypassed in evals via SKIP_APPROVAL=true).
Observability Setup
| Capability | Implementation |
|---|---|
| Trace logging | Structured JSON: chat_start, step_finish (tool names, token usage), verification, chat_complete |
| Latency tracking | Total latencyMs per request, persisted to Postgres |
| Token usage | prompt/completion/total per request, averaged in metrics summary |
| Cost tracking | Model-specific pricing (Sonnet $3/$15, Haiku $1/$5, Opus $15/$75 per MTok), cost.totalUsd and cost.avgPerChatUsd in metrics |
| Error tracking | Errors recorded with sanitized messages (DB URLs, API keys redacted) |
| User feedback | Thumbs up/down per message, unique per (requestId, userId), satisfaction rate in metrics |
| Metrics endpoint | GET /agent/metrics?since=1h — summary, feedback, recent chats |
Key insight: Avg $0.015/chat with Sonnet 4.6, ~5.9s latency, ~1.9 steps per query. Portfolio analysis and market data are the most-used tools.
Open Source Contribution
Eval dataset: 86-case structured JSON dataset (evals/dataset.json) covering tool routing, multi-tool chaining, write operations, adversarial inputs, and edge cases for a financial AI agent. Each case includes input query, expected tools, expected behavioral assertions, and category labels. Published in the repository for reuse by other Ghostfolio agent implementations.