# Agent Architecture Document

## Domain & Use Cases

**Domain**: Personal finance portfolio management (Ghostfolio fork)

**Problems solved**: Ghostfolio's existing UI requires manual navigation across multiple pages to understand portfolio state. The agent provides a conversational interface that synthesizes holdings, performance, market data, and transaction history into coherent natural language — and can execute write operations (account management, activity logging, watchlist, tags) with native tool approval gates.

**Target users**: Self-directed investors tracking multi-asset portfolios (stocks, ETFs, crypto) who want quick portfolio insights without clicking through dashboards.

## Agent Architecture

### Framework & Stack

| Layer           | Choice                                                                    | Rationale                                                                                   |
| --------------- | ------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------- |
| Runtime         | NestJS (TypeScript)                                                       | Native to Ghostfolio codebase                                                               |
| Agent framework | Vercel AI SDK v6 (`ToolLoopAgent`)                                        | Native TS, streaming SSE, built-in tool dispatch                                            |
| LLM             | Claude Sonnet 4.6 (default)                                               | Strong function calling, structured output, 200K context                                    |
| Model options   | Haiku 4.5 ($0.005/chat), Sonnet 4.6 ($0.015/chat), Opus 4.6 ($0.077/chat) | User-selectable per session                                                                 |
| Schemas         | Zod v4                                                                    | Required by AI SDK v6 `inputSchema`                                                         |
| Database        | Prisma + Postgres                                                         | Shared with Ghostfolio, plus agent-specific tables                                          |
| Cache warming   | `warmPortfolioCache` helper                                               | Redis + BullMQ (`PortfolioSnapshotService`) — ensures portfolio reads reflect recent writes |

### Request Flow

```
User message
    │
    ▼
POST /api/v1/agent/chat (JWT auth)
    Body: { messages: UIMessage[], toolHistory?, model?, approvedActions? }
    │
    ▼
ToolLoopAgent created → pipeAgentUIStreamToResponse()
    │
    ├─► prepareStep()
    │     ├─ Injects current date into system prompt
    │     ├─ All 10 tools available from step 1 (activity_manage auto-resolves accountId)
    │     └─ Loads contextual SKILL.md files based on tool history
    │
    ├─► LLM reasoning → tool selection
    │     └─ Up to 10 steps (stopWhen: stepCountIs(10))
    │
    ├─► Tool execution (try/catch per tool)
    │     └─ Returns structured JSON to LLM for synthesis
    │
    ├─► Approval gate (write tools only)
    │     ├─ needsApproval() evaluates per invocation
    │     ├─ Skips: list actions, previously-approved signatures, SKIP_APPROVAL env
    │     └─ If required: stream pauses → client shows approval card → resumes on approve/deny
    │
    ├─► Post-write cache warming (activity_manage, account_manage)
    │     └─ warmPortfolioCache: clear Redis → drain stale jobs → enqueue HIGH priority → await (30s timeout)
    │
    ├─► SSE stream to client (UIMessage protocol)
    │     └─ Events: text-delta, tool-input-start, tool-input-available, tool-approval-request, tool-output-available, finish
    │
    └─► onFinish callback
          ├─ Verification pipeline (3 systems)
          ├─ Metrics recording (in-memory + Postgres)
          └─ Structured log: chat_complete
```

### Tool Design

10 tools organized into read (6) and write (4) categories:

| Tool                    | Type  | Purpose                                                           |
| ----------------------- | ----- | ----------------------------------------------------------------- |
| `portfolio_analysis`    | Read  | Holdings, allocations, total value, account breakdown             |
| `portfolio_performance` | Read  | Returns, net performance, chart data (downsampled to ~20 points)  |
| `holdings_lookup`       | Read  | Deep dive on single position: dividends, fees, sectors, countries |
| `market_data`           | Read  | Live quotes for 1-10 symbols (FMP + CoinGecko)                    |
| `symbol_search`         | Read  | Disambiguate crypto vs stock, find correct data source            |
| `transaction_history`   | Read  | Buys, sells, dividends, fees, deposits, withdrawals               |
| `account_manage`        | Write | CRUD accounts + transfers between accounts                        |
| `activity_manage`       | Write | CRUD transactions (BUY/SELL/DIVIDEND/FEE/INTEREST/LIABILITY)      |
| `watchlist_manage`      | Write | Add/remove/list watchlist items                                   |
| `tag_manage`            | Write | CRUD tags for transaction organization                            |

**Auto-resolution**: `activity_manage` auto-resolves `accountId` when omitted on creates — matches accounts by asset type keywords (crypto → "crypto"/"wallet" accounts, stocks → "stock"/"brokerage" accounts) with fallback to highest-activity account. No tool gating; all 10 tools available from step 1.

**Approval gates**: All 4 write tools define `needsApproval` — a function-based gate evaluated per invocation. Read-only actions (`list`) and previously-approved action signatures are auto-skipped. `SKIP_APPROVAL=true` env var disables all gates (used in evals). Action signatures follow the pattern `tool_name:action:identifier` (e.g., `activity_manage:create:AAPL`).

| Tool               | Approval Rule                                                                    |
| ------------------ | -------------------------------------------------------------------------------- |
| `activity_manage`  | `create` and `delete` only (not `update`), unless signature in `approvedActions` |
| `account_manage`   | Everything except `list`, unless signature in `approvedActions`                  |
| `tag_manage`       | Everything except `list`                                                         |
| `watchlist_manage` | Everything except `list`                                                         |

**Cache warming**: After every write operation in `activity_manage` and `account_manage`, `warmPortfolioCache()` runs — clears stale Redis portfolio snapshots, drains in-flight BullMQ jobs, enqueues fresh computation at HIGH priority, and awaits completion with a 30s timeout. Ensures subsequent read tools return up-to-date portfolio data. Injected via `PortfolioSnapshotService`, `RedisCacheService`, and `UserService`.

**Skill injection**: Contextual SKILL.md documents (transaction workflow, market data patterns) are injected into the system prompt based on which tools have been used, providing the LLM with domain-specific guidance without bloating every request.

## Verification Strategy

Three deterministic systems run on every response in the `onFinish` callback:

| System                      | What It Checks                                                                                                                          | Weight in Composite |
| --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- | ------------------- |
| **Output Validation**       | Min/max length, numeric data present when tools called, disclaimer on forward-looking language, write claims backed by write tool calls | 0.3                 |
| **Hallucination Detection** | Ticker symbols in response exist in tool data, dollar amounts match within 5% or $1, holding claims reference actual tool results       | 0.3                 |
| **Confidence Score**        | Composite 0-1: tool success rate (0.3), step efficiency (0.1), output validity (0.3), hallucination score (0.3)                         | —                   |

**Why deterministic**: No additional LLM calls = zero latency overhead and zero cost. Cross-references response text against actual tool result data. Catches the most common financial hallucination patterns (phantom tickers, fabricated dollar amounts) with high precision.

**Persistence**: Results stored on `AgentChatLog` (`verificationScore`, `verificationResult`) and exposed via `GET /agent/verification/:requestId`.

## Eval Results

**Framework**: evalite — dedicated eval runner with scoring, UI, and CI integration.

| Suite      | Cases  | Pass Rate | Scorers                                                          |
| ---------- | ------ | --------- | ---------------------------------------------------------------- |
| Golden Set | 19     | **100%**  | GoldenCheck (deterministic binary)                               |
| Scenarios  | 67     | **91%**   | ToolCallAccuracy (F1), HasResponse, ResponseQuality (LLM-judged) |
| **Total**  | **86** |           |                                                                  |

**Category breakdown**: single-tool (10), multi-tool (10), ambiguous (6), account management (8), activity management (10), watchlist (4), tag (4), multi-step write (4), adversarial write (4), edge cases (7), golden routing (7), golden structural (4), golden behavioral (2), golden guardrails (4).

**CI gate**: GitHub Actions runs golden set on every push to main. Threshold: 100%. Hits deployed Render instance.

**Remaining 9% scenario gap**: Agent calls extra tools on ambiguous queries (thorough, not wrong). Write-operation approval is now handled by native `needsApproval` gates (bypassed in evals via `SKIP_APPROVAL=true`).

## Observability Setup

| Capability           | Implementation                                                                                                                  |
| -------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| **Trace logging**    | Structured JSON: `chat_start`, `step_finish` (tool names, token usage), `verification`, `chat_complete`                         |
| **Latency tracking** | Total `latencyMs` per request, persisted to Postgres                                                                            |
| **Token usage**      | prompt/completion/total per request, averaged in metrics summary                                                                |
| **Cost tracking**    | Model-specific pricing (Sonnet $3/$15, Haiku $1/$5, Opus $15/$75 per MTok), `cost.totalUsd` and `cost.avgPerChatUsd` in metrics |
| **Error tracking**   | Errors recorded with sanitized messages (DB URLs, API keys redacted)                                                            |
| **User feedback**    | Thumbs up/down per message, unique per (requestId, userId), satisfaction rate in metrics                                        |
| **Metrics endpoint** | `GET /agent/metrics?since=1h` — summary, feedback, recent chats                                                                 |

**Key insight**: Avg $0.015/chat with Sonnet 4.6, ~5.9s latency, ~1.9 steps per query. Portfolio analysis and market data are the most-used tools.

## Open Source Contribution

**Eval dataset**: 86-case structured JSON dataset (`evals/dataset.json`) covering tool routing, multi-tool chaining, write operations, adversarial inputs, and edge cases for a financial AI agent. Each case includes input query, expected tools, expected behavioral assertions, and category labels. Published in the repository for reuse by other Ghostfolio agent implementations.