12 KiB

Raw Blame History

Ghostfolio AI Agent — Architecture Documentation

Domain & Use Cases

Domain: Personal Finance + Real Estate Portfolio Management

Problem Solved: Most people manage investments and real estate in completely separate places. A portfolio app tracks stocks. A spreadsheet tracks property equity. Neither talks to the other. No single tool answers: "Given everything I own, am I on track to retire? Can I afford to buy more real estate? What does my financial picture actually look like?"

Target Customer: Working professionals aged 28–45 who have started investing in Ghostfolio and own or are planning to own real estate. They want to understand their complete financial picture — investments + property equity — and run scenarios on major life decisions (job offers, buying property, having children, retiring earlier).

Specific user this was built for: A 32-year-old software engineer who uses Ghostfolio to track their investments and is trying to figure out if their $94k portfolio can fund a down payment, whether to accept a job offer in Seattle, and what their retirement looks like if they start buying rental properties — all in one conversation without switching between 8 different tools.

Use Cases:

Track real estate equity alongside investment portfolio
Run "what if I buy a house every 2 years for 10 years" retirement scenarios
Ask whether a job offer in another city is financially worth it after cost of living
Understand total net worth across all asset classes (stocks + real estate)
Check if savings rate is on track vs peers (Federal Reserve SCF 2022 data)
Plan family finances including childcare cost impact by city
Analyze equity options (keep / cash-out refi / rental property)

Agent Architecture

Framework: LangGraph (Python)
LLM: Claude claude-sonnet-4-5 (Anthropic claude-sonnet-4-5-20251001)
Backend: FastAPI
Database: SQLite (properties.db — stateful CRUD) + Ghostfolio PostgreSQL
Observability: LangSmith
Deployment: Railway

Why LangGraph

Chosen over plain LangChain because the agent requires stateful multi-step reasoning: classify intent → select tool → execute → verify → format. LangGraph's explicit state machine makes every step debuggable and testable. The graph has clear nodes and edges rather than an opaque chain.

Graph Architecture

User Message
↓
classify_node      (keyword matching → intent category string)
↓
_route_after_classify   (maps intent string → executor node)
↓
[Tool Executor Node]    (calls appropriate tool function, returns structured result)
↓
verify_node        (confidence scoring + domain constraint check)
↓
format_node        (LLM synthesizes tool result into natural language response)
↓
Response to User

State Schema (`AgentState`)

{
  "user_query": str,
  "messages": list[BaseMessage],   # full conversation history
  "query_type": str,
  "portfolio_snapshot": dict,
  "tool_results": list[dict],
  "pending_verifications": list,
  "confidence_score": float,
  "verification_outcome": str,
  "awaiting_confirmation": bool,
  "confirmation_payload": dict | None,
  "pending_write": dict | None,
  "bearer_token": str | None,
  "final_response": str | None,
  "citations": list[str],
  "error": str | None,
}

Tool Registry

16 Tools Built Across 7 Files:

Tool	File	Purpose
portfolio_analysis	portfolio.py	Live Ghostfolio holdings, allocation, performance
compliance_check	portfolio.py	Concentration risk, regulatory flags
tax_estimate	portfolio.py	Tax liability estimation
market_data	market_data.py	Live stock prices via Yahoo Finance
add_property	property_tracker.py	CRUD — create property record
get_properties	property_tracker.py	CRUD — read all properties
update_property	property_tracker.py	CRUD — update property values
remove_property	property_tracker.py	CRUD — delete property record
analyze_equity_options	property_tracker.py	Home equity scenario analysis
get_total_net_worth	property_tracker.py	Portfolio + real estate combined
calculate_relocation_runway	relocation_runway.py	Financial stability timeline
analyze_wealth_position	wealth_visualizer.py	Fed Reserve peer comparison
simulate_real_estate_strategy	realestate_strategy.py	Buy-hold retirement projection
plan_family_finances	family_planner.py	Childcare cost impact
analyze_life_decision	life_decision_advisor.py	Job offer, relocation decisions
calculate_down_payment_power	wealth_bridge.py	Portfolio to home purchase

Latency Notes

Single-tool queries average 5–10 seconds due to Claude Sonnet response generation time. The classify step (keyword matching) adds <10ms. Tool execution adds 50–200ms. The majority of latency is LLM synthesis. Streaming responses (/chat/steps, /chat/stream) are implemented to improve perceived performance. A startup warmup pre-establishes the LLM connection to reduce cold-start latency on the first request.

Verification Strategy

Three verification systems implemented:

1. Confidence Scoring Every /chat response includes a confidence score between 0.0 and 1.0. Score is based on tool success, data source reliability, and query type. Responses with confidence below 0.80 have verified=false returned to the client.

2. Source Attribution (Citation Enforcement) The system prompt enforces a citation rule: every factual claim must name its data source. Portfolio data cites "Ghostfolio live data". Real estate projections cite user-provided assumptions. Federal Reserve data is cited by name. The LLM cannot return a number without its source.

3. Domain Constraint Check A pre-return scan runs on every financial response checking for high-risk phrases ("guaranteed return", "you should buy", "risk-free"). Responses containing these phrases without appropriate disclaimers are flagged. Every financial projection includes "not financial advice" language.

Note on plan vs delivery: The pre-search described a fact-check node with tool_result_id tagging. The implemented approach achieves the same goal differently: citation enforcement is in the system prompt rather than a separate node, which proved more reliable in practice because it cannot be bypassed by the routing logic.

Human-in-the-Loop (Implemented)

Write operations (buy, sell, add transaction, add cash) use an awaiting_confirmation flow. When the user expresses a write intent (e.g. "buy 10 shares of AAPL"), the write_prepare node builds a confirmation payload and sets awaiting_confirmation=True. The user sees a summary and must reply "yes" or "confirm" to proceed. Only then does write_execute run the actual Ghostfolio API call. This prevents accidental trades.

Eval Results

Test Suite: 183 test cases across 10 test files
Pass Rate: 100% (183/183)

Test Categories

Category	Count	Description
Happy path	20	Normal successful user flows
Edge cases	12	Zero values, boundary inputs, missing data
Adversarial	12	SQL injection, extreme values, bad inputs
Multi-step	12	Chained tool calls, stateful CRUD flows
Portfolio logic	60	Compliance, tax, categorization, helpers
Property CRUD	13	Full property lifecycle
Real estate	8	Listing search, compare, feature flag
Strategy	7	Simulation correctness
Relocation	5	Runway calculations
Wealth bridge	8	COL comparison, net worth
Wealth visualizer	6	Fed Reserve benchmarks

Performance Targets

Metric	Target	Status
Single-tool queries	< 5s	✅ avg ~3–4s
Multi-step chains	< 15s	✅ avg ~8–12s
Tool success rate	> 95%	✅
Eval pass rate	> 80%	✅ 100%

Observability Setup

LangSmith Tracing

Every request generates a LangSmith trace showing the full execution graph:
input → classify → tool call → verify → format → output

Environment variables:

LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=<key>
LANGCHAIN_PROJECT=agentforce

Dashboard: smith.langchain.com

Per-Response Observability

Every /chat response includes:

{
  "latency_ms": 3241,
  "tokens": {
    "input": 1200,
    "output": 400,
    "total": 1600,
    "estimated_cost_usd": 0.0096
  },
  "confidence": 0.95,
  "verified": true,
  "trace_id": "uuid-here",
  "timestamp": "2026-02-27T03:45:00Z",
  "tool": "property_tracker",
  "tools_used": ["property_tracker"],
  "verification_details": {
    "passed": true,
    "flags": [],
    "has_disclaimer": true
  }
}

/metrics Endpoint

GET /metrics returns aggregate session metrics:

{
  "total_requests": 47,
  "avg_latency_ms": 3890,
  "successful_tool_calls": 44,
  "failed_tool_calls": 3,
  "tool_success_rate_pct": 93.6,
  "recent_errors": [],
  "last_updated": "2026-02-27T03:45:00Z"
}

Additional Endpoints

Endpoint	Purpose
`GET /health`	Agent + Ghostfolio reachability check
`GET /metrics`	Aggregate session metrics
`GET /costs`	Estimated Anthropic API cost tracker
`GET /feedback/summary`	👍/👎 approval rate across all sessions
`GET /real-estate/log`	Tool invocation log (last 50)

Open Source Contribution

Contribution Type: Public Eval Dataset

What was delivered: 183 test cases for finance AI agents — released publicly on GitHub as the first eval dataset for agents built on Ghostfolio.

Note on plan vs delivery: The pre-search planned an npm package and Hugging Face dataset release. During development, the eval dataset approach was chosen instead because it provides more direct value to developers forking Ghostfolio — they can run the test suite immediately without installing a package. The dataset is MIT licensed and accepts contributions.

Location: github.com/lakshmipunukollu-ai/ghostfolio/tree/submission/final/agent/evals

Documentation: agent/evals/EVAL_DATASET_README.md

How to Run

# Clone and setup
git clone https://github.com/lakshmipunukollu-ai/ghostfolio-agent-priya
cd ghostfolio
git checkout feature/complete-showcase

# Start Ghostfolio (portfolio backend)
docker-compose up -d
npm install && npm run build
npm run start:server &   # API server: http://localhost:3333
npm run start:client &   # Angular UI: http://localhost:4200

# Start AI agent
cd agent
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reload --port 8000

# Run eval suite
python -m pytest evals/ -v
# → 182 passed in ~30s

# Access
# Portfolio UI:   http://localhost:4200
# Agent API:      http://localhost:8000
# Agent health:   http://localhost:8000/health
# Agent metrics:  http://localhost:8000/metrics
# LangSmith:      https://smith.langchain.com (project: agentforce)

Deployed Application

Production URL: https://ghostfolio-agent-production.up.railway.app

The agent is deployed on Railway free tier. The Angular UI is served separately by the Ghostfolio Next.js/Angular build pipeline.

12 KiB Raw Blame History