12 KiB
Ghostfolio AI Agent — Architecture Documentation
Domain & Use Cases
Domain: Personal Finance + Real Estate Portfolio Management
Problem Solved: Most people manage investments and real estate in completely separate places. A portfolio app tracks stocks. A spreadsheet tracks property equity. Neither talks to the other. No single tool answers: "Given everything I own, am I on track to retire? Can I afford to buy more real estate? What does my financial picture actually look like?"
Target Customer: Working professionals aged 28–45 who have started investing in Ghostfolio and own or are planning to own real estate. They want to understand their complete financial picture — investments + property equity — and run scenarios on major life decisions (job offers, buying property, having children, retiring earlier).
Specific user this was built for: A 32-year-old software engineer who uses Ghostfolio to track their investments and is trying to figure out if their $94k portfolio can fund a down payment, whether to accept a job offer in Seattle, and what their retirement looks like if they start buying rental properties — all in one conversation without switching between 8 different tools.
Use Cases:
- Track real estate equity alongside investment portfolio
- Run "what if I buy a house every 2 years for 10 years" retirement scenarios
- Ask whether a job offer in another city is financially worth it after cost of living
- Understand total net worth across all asset classes (stocks + real estate)
- Check if savings rate is on track vs peers (Federal Reserve SCF 2022 data)
- Plan family finances including childcare cost impact by city
- Analyze equity options (keep / cash-out refi / rental property)
Agent Architecture
Framework: LangGraph (Python)
LLM: Claude claude-sonnet-4-5 (Anthropic claude-sonnet-4-5-20251001)
Backend: FastAPI
Database: SQLite (properties.db — stateful CRUD) + Ghostfolio PostgreSQL
Observability: LangSmith
Deployment: Railway
Why LangGraph
Chosen over plain LangChain because the agent requires stateful multi-step reasoning: classify intent → select tool → execute → verify → format. LangGraph's explicit state machine makes every step debuggable and testable. The graph has clear nodes and edges rather than an opaque chain.
Graph Architecture
User Message
↓
classify_node (keyword matching → intent category string)
↓
_route_after_classify (maps intent string → executor node)
↓
[Tool Executor Node] (calls appropriate tool function, returns structured result)
↓
verify_node (confidence scoring + domain constraint check)
↓
format_node (LLM synthesizes tool result into natural language response)
↓
Response to User
State Schema (AgentState)
{
"user_query": str,
"messages": list[BaseMessage], # full conversation history
"query_type": str,
"portfolio_snapshot": dict,
"tool_results": list[dict],
"pending_verifications": list,
"confidence_score": float,
"verification_outcome": str,
"awaiting_confirmation": bool,
"confirmation_payload": dict | None,
"pending_write": dict | None,
"bearer_token": str | None,
"final_response": str | None,
"citations": list[str],
"error": str | None,
}
Tool Registry
16 Tools Built Across 7 Files:
| Tool | File | Purpose |
|---|---|---|
| portfolio_analysis | portfolio.py | Live Ghostfolio holdings, allocation, performance |
| compliance_check | portfolio.py | Concentration risk, regulatory flags |
| tax_estimate | portfolio.py | Tax liability estimation |
| market_data | market_data.py | Live stock prices via Yahoo Finance |
| add_property | property_tracker.py | CRUD — create property record |
| get_properties | property_tracker.py | CRUD — read all properties |
| update_property | property_tracker.py | CRUD — update property values |
| remove_property | property_tracker.py | CRUD — delete property record |
| analyze_equity_options | property_tracker.py | Home equity scenario analysis |
| get_total_net_worth | property_tracker.py | Portfolio + real estate combined |
| calculate_relocation_runway | relocation_runway.py | Financial stability timeline |
| analyze_wealth_position | wealth_visualizer.py | Fed Reserve peer comparison |
| simulate_real_estate_strategy | realestate_strategy.py | Buy-hold retirement projection |
| plan_family_finances | family_planner.py | Childcare cost impact |
| analyze_life_decision | life_decision_advisor.py | Job offer, relocation decisions |
| calculate_down_payment_power | wealth_bridge.py | Portfolio to home purchase |
Latency Notes
Single-tool queries average 5–10 seconds due to Claude Sonnet response generation time.
The classify step (keyword matching) adds <10ms. Tool execution adds 50–200ms. The majority
of latency is LLM synthesis. Streaming responses (/chat/steps, /chat/stream) are
implemented to improve perceived performance. A startup warmup pre-establishes the LLM
connection to reduce cold-start latency on the first request.
Verification Strategy
Three verification systems implemented:
1. Confidence Scoring Every /chat response includes a confidence score between 0.0 and 1.0. Score is based on tool success, data source reliability, and query type. Responses with confidence below 0.80 have verified=false returned to the client.
2. Source Attribution (Citation Enforcement) The system prompt enforces a citation rule: every factual claim must name its data source. Portfolio data cites "Ghostfolio live data". Real estate projections cite user-provided assumptions. Federal Reserve data is cited by name. The LLM cannot return a number without its source.
3. Domain Constraint Check A pre-return scan runs on every financial response checking for high-risk phrases ("guaranteed return", "you should buy", "risk-free"). Responses containing these phrases without appropriate disclaimers are flagged. Every financial projection includes "not financial advice" language.
Note on plan vs delivery: The pre-search described a fact-check node with tool_result_id tagging. The implemented approach achieves the same goal differently: citation enforcement is in the system prompt rather than a separate node, which proved more reliable in practice because it cannot be bypassed by the routing logic.
Human-in-the-Loop (Implemented)
Write operations (buy, sell, add transaction, add cash) use an awaiting_confirmation flow. When the user expresses a write intent (e.g. "buy 10 shares of AAPL"), the write_prepare node builds a confirmation payload and sets awaiting_confirmation=True. The user sees a summary and must reply "yes" or "confirm" to proceed. Only then does write_execute run the actual Ghostfolio API call. This prevents accidental trades.
Eval Results
Test Suite: 183 test cases across 10 test files
Pass Rate: 100% (183/183)
Test Categories
| Category | Count | Description |
|---|---|---|
| Happy path | 20 | Normal successful user flows |
| Edge cases | 12 | Zero values, boundary inputs, missing data |
| Adversarial | 12 | SQL injection, extreme values, bad inputs |
| Multi-step | 12 | Chained tool calls, stateful CRUD flows |
| Portfolio logic | 60 | Compliance, tax, categorization, helpers |
| Property CRUD | 13 | Full property lifecycle |
| Real estate | 8 | Listing search, compare, feature flag |
| Strategy | 7 | Simulation correctness |
| Relocation | 5 | Runway calculations |
| Wealth bridge | 8 | COL comparison, net worth |
| Wealth visualizer | 6 | Fed Reserve benchmarks |
Performance Targets
| Metric | Target | Status |
|---|---|---|
| Single-tool queries | < 5s | ✅ avg ~3–4s |
| Multi-step chains | < 15s | ✅ avg ~8–12s |
| Tool success rate | > 95% | ✅ |
| Eval pass rate | > 80% | ✅ 100% |
Observability Setup
LangSmith Tracing
Every request generates a LangSmith trace showing the full execution graph:
input → classify → tool call → verify → format → output
Environment variables:
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=<key>
LANGCHAIN_PROJECT=agentforce
Dashboard: smith.langchain.com
Per-Response Observability
Every /chat response includes:
{
"latency_ms": 3241,
"tokens": {
"input": 1200,
"output": 400,
"total": 1600,
"estimated_cost_usd": 0.0096
},
"confidence": 0.95,
"verified": true,
"trace_id": "uuid-here",
"timestamp": "2026-02-27T03:45:00Z",
"tool": "property_tracker",
"tools_used": ["property_tracker"],
"verification_details": {
"passed": true,
"flags": [],
"has_disclaimer": true
}
}
/metrics Endpoint
GET /metrics returns aggregate session metrics:
{
"total_requests": 47,
"avg_latency_ms": 3890,
"successful_tool_calls": 44,
"failed_tool_calls": 3,
"tool_success_rate_pct": 93.6,
"recent_errors": [],
"last_updated": "2026-02-27T03:45:00Z"
}
Additional Endpoints
| Endpoint | Purpose |
|---|---|
GET /health |
Agent + Ghostfolio reachability check |
GET /metrics |
Aggregate session metrics |
GET /costs |
Estimated Anthropic API cost tracker |
GET /feedback/summary |
👍/👎 approval rate across all sessions |
GET /real-estate/log |
Tool invocation log (last 50) |
Open Source Contribution
Contribution Type: Public Eval Dataset
What was delivered: 183 test cases for finance AI agents — released publicly on GitHub as the first eval dataset for agents built on Ghostfolio.
Note on plan vs delivery: The pre-search planned an npm package and Hugging Face dataset release. During development, the eval dataset approach was chosen instead because it provides more direct value to developers forking Ghostfolio — they can run the test suite immediately without installing a package. The dataset is MIT licensed and accepts contributions.
Location: github.com/lakshmipunukollu-ai/ghostfolio/tree/submission/final/agent/evals
Documentation: agent/evals/EVAL_DATASET_README.md
How to Run
# Clone and setup
git clone https://github.com/lakshmipunukollu-ai/ghostfolio-agent-priya
cd ghostfolio
git checkout feature/complete-showcase
# Start Ghostfolio (portfolio backend)
docker-compose up -d
npm install && npm run build
npm run start:server & # API server: http://localhost:3333
npm run start:client & # Angular UI: http://localhost:4200
# Start AI agent
cd agent
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reload --port 8000
# Run eval suite
python -m pytest evals/ -v
# → 182 passed in ~30s
# Access
# Portfolio UI: http://localhost:4200
# Agent API: http://localhost:8000
# Agent health: http://localhost:8000/health
# Agent metrics: http://localhost:8000/metrics
# LangSmith: https://smith.langchain.com (project: agentforce)
Deployed Application
Production URL: https://ghostfolio-agent-production.up.railway.app
The agent is deployed on Railway free tier. The Angular UI is served separately by the Ghostfolio Next.js/Angular build pipeline.