You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 

12 KiB

Agent Architecture Document

Domain & Use Cases

Domain: Personal finance portfolio management (Ghostfolio fork)

Problems solved: Ghostfolio's existing UI requires manual navigation across multiple pages to understand portfolio state. The agent provides a conversational interface that synthesizes holdings, performance, market data, and transaction history into coherent natural language — and can execute write operations (account management, activity logging, watchlist, tags) with native tool approval gates.

Target users: Self-directed investors tracking multi-asset portfolios (stocks, ETFs, crypto) who want quick portfolio insights without clicking through dashboards.

Agent Architecture

Framework & Stack

Layer Choice Rationale
Runtime NestJS (TypeScript) Native to Ghostfolio codebase
Agent framework Vercel AI SDK v6 (ToolLoopAgent) Native TS, streaming SSE, built-in tool dispatch
LLM Claude Sonnet 4.6 (default) Strong function calling, structured output, 200K context
Model options Haiku 4.5 ($0.005/chat), Sonnet 4.6 ($0.015/chat), Opus 4.6 ($0.077/chat) User-selectable per session
Schemas Zod v4 Required by AI SDK v6 inputSchema
Database Prisma + Postgres Shared with Ghostfolio, plus agent-specific tables
Cache warming warmPortfolioCache helper Redis + BullMQ (PortfolioSnapshotService) — ensures portfolio reads reflect recent writes

Request Flow

User message
    │
    ▼
POST /api/v1/agent/chat (JWT auth)
    Body: { messages: UIMessage[], toolHistory?, model?, approvedActions? }
    │
    ▼
ToolLoopAgent created → pipeAgentUIStreamToResponse()
    │
    ├─► prepareStep()
    │     ├─ Injects current date into system prompt
    │     ├─ All 10 tools available from step 1 (activity_manage auto-resolves accountId)
    │     └─ Loads contextual SKILL.md files based on tool history
    │
    ├─► LLM reasoning → tool selection
    │     └─ Up to 10 steps (stopWhen: stepCountIs(10))
    │
    ├─► Tool execution (try/catch per tool)
    │     └─ Returns structured JSON to LLM for synthesis
    │
    ├─► Approval gate (write tools only)
    │     ├─ needsApproval() evaluates per invocation
    │     ├─ Skips: list actions, previously-approved signatures, SKIP_APPROVAL env
    │     └─ If required: stream pauses → client shows approval card → resumes on approve/deny
    │
    ├─► Post-write cache warming (activity_manage, account_manage)
    │     └─ warmPortfolioCache: clear Redis → drain stale jobs → enqueue HIGH priority → await (30s timeout)
    │
    ├─► SSE stream to client (UIMessage protocol)
    │     └─ Events: text-delta, tool-input-start, tool-input-available, tool-approval-request, tool-output-available, finish
    │
    └─► onFinish callback
          ├─ Verification pipeline (3 systems)
          ├─ Metrics recording (in-memory + Postgres)
          └─ Structured log: chat_complete

Tool Design

10 tools organized into read (6) and write (4) categories:

Tool Type Purpose
portfolio_analysis Read Holdings, allocations, total value, account breakdown
portfolio_performance Read Returns, net performance, chart data (downsampled to ~20 points)
holdings_lookup Read Deep dive on single position: dividends, fees, sectors, countries
market_data Read Live quotes for 1-10 symbols (FMP + CoinGecko)
symbol_search Read Disambiguate crypto vs stock, find correct data source
transaction_history Read Buys, sells, dividends, fees, deposits, withdrawals
account_manage Write CRUD accounts + transfers between accounts
activity_manage Write CRUD transactions (BUY/SELL/DIVIDEND/FEE/INTEREST/LIABILITY)
watchlist_manage Write Add/remove/list watchlist items
tag_manage Write CRUD tags for transaction organization

Auto-resolution: activity_manage auto-resolves accountId when omitted on creates — matches accounts by asset type keywords (crypto → "crypto"/"wallet" accounts, stocks → "stock"/"brokerage" accounts) with fallback to highest-activity account. No tool gating; all 10 tools available from step 1.

Approval gates: All 4 write tools define needsApproval — a function-based gate evaluated per invocation. Read-only actions (list) and previously-approved action signatures are auto-skipped. SKIP_APPROVAL=true env var disables all gates (used in evals). Action signatures follow the pattern tool_name:action:identifier (e.g., activity_manage:create:AAPL).

Tool Approval Rule
activity_manage create and delete only (not update), unless signature in approvedActions
account_manage Everything except list, unless signature in approvedActions
tag_manage Everything except list
watchlist_manage Everything except list

Cache warming: After every write operation in activity_manage and account_manage, warmPortfolioCache() runs — clears stale Redis portfolio snapshots, drains in-flight BullMQ jobs, enqueues fresh computation at HIGH priority, and awaits completion with a 30s timeout. Ensures subsequent read tools return up-to-date portfolio data. Injected via PortfolioSnapshotService, RedisCacheService, and UserService.

Skill injection: Contextual SKILL.md documents (transaction workflow, market data patterns) are injected into the system prompt based on which tools have been used, providing the LLM with domain-specific guidance without bloating every request.

Verification Strategy

Three deterministic systems run on every response in the onFinish callback:

System What It Checks Weight in Composite
Output Validation Min/max length, numeric data present when tools called, disclaimer on forward-looking language, write claims backed by write tool calls 0.3
Hallucination Detection Ticker symbols in response exist in tool data, dollar amounts match within 5% or $1, holding claims reference actual tool results 0.3
Confidence Score Composite 0-1: tool success rate (0.3), step efficiency (0.1), output validity (0.3), hallucination score (0.3)

Why deterministic: No additional LLM calls = zero latency overhead and zero cost. Cross-references response text against actual tool result data. Catches the most common financial hallucination patterns (phantom tickers, fabricated dollar amounts) with high precision.

Persistence: Results stored on AgentChatLog (verificationScore, verificationResult) and exposed via GET /agent/verification/:requestId.

Eval Results

Framework: evalite — dedicated eval runner with scoring, UI, and CI integration.

Suite Cases Pass Rate Scorers
Golden Set 19 100% GoldenCheck (deterministic binary)
Scenarios 67 91% ToolCallAccuracy (F1), HasResponse, ResponseQuality (LLM-judged)
Total 86

Category breakdown: single-tool (10), multi-tool (10), ambiguous (6), account management (8), activity management (10), watchlist (4), tag (4), multi-step write (4), adversarial write (4), edge cases (7), golden routing (7), golden structural (4), golden behavioral (2), golden guardrails (4).

CI gate: GitHub Actions runs golden set on every push to main. Threshold: 100%. Hits deployed Render instance.

Remaining 9% scenario gap: Agent calls extra tools on ambiguous queries (thorough, not wrong). Write-operation approval is now handled by native needsApproval gates (bypassed in evals via SKIP_APPROVAL=true).

Observability Setup

Capability Implementation
Trace logging Structured JSON: chat_start, step_finish (tool names, token usage), verification, chat_complete
Latency tracking Total latencyMs per request, persisted to Postgres
Token usage prompt/completion/total per request, averaged in metrics summary
Cost tracking Model-specific pricing (Sonnet $3/$15, Haiku $1/$5, Opus $15/$75 per MTok), cost.totalUsd and cost.avgPerChatUsd in metrics
Error tracking Errors recorded with sanitized messages (DB URLs, API keys redacted)
User feedback Thumbs up/down per message, unique per (requestId, userId), satisfaction rate in metrics
Metrics endpoint GET /agent/metrics?since=1h — summary, feedback, recent chats

Key insight: Avg $0.015/chat with Sonnet 4.6, ~5.9s latency, ~1.9 steps per query. Portfolio analysis and market data are the most-used tools.

Open Source Contribution

Eval dataset: 86-case structured JSON dataset (evals/dataset.json) covering tool routing, multi-tool chaining, write operations, adversarial inputs, and edge cases for a financial AI agent. Each case includes input query, expected tools, expected behavioral assertions, and category labels. Published in the repository for reuse by other Ghostfolio agent implementations.