# Ghostfolio Agent — Eval Results **Run Date:** Friday, February 27, 2026 **Agent:** `http://localhost:8000` · version `2.1.0-complete-showcase` --- ## Summary | Suite | Passed | Total | Pass Rate | |---|---|---|---| | Pytest Unit/Integration Tests | 182 | 182 | **100%** | | Agent Eval Suite (`run_evals.py`) | 60 | 60 | **100%** | | Golden Sets (`run_golden_sets.py`) | 10 | 10 | **100%** | | Labeled Scenarios (`run_golden_sets.py`) | 15 | 15 | **100%** | | **Overall** | **267** | **267** | **100%** | --- ## 1. Pytest Unit & Integration Tests **182 / 182 passed · 1 warning · 30.47s** | Test File | Tests | Result | |---|---|---| | `test_equity_advisor.py` | 4 | ✅ All passed | | `test_eval_dataset.py` | 57 | ✅ All passed | | `test_family_planner.py` | 6 | ✅ All passed | | `test_life_decision_advisor.py` | 5 | ✅ All passed | | `test_portfolio.py` | 51 | ✅ All passed | | `test_property_onboarding.py` | 4 | ✅ All passed | | `test_property_tracker.py` | 12 | ✅ All passed | | `test_real_estate.py` | 8 | ✅ All passed | | `test_realestate_strategy.py` | 7 | ✅ All passed | | `test_relocation_runway.py` | 5 | ✅ All passed | | `test_wealth_bridge.py` | 8 | ✅ All passed | | `test_wealth_visualizer.py` | 6 | ✅ All passed | **Warning:** `test_ms_job_offer_then_runway` — `RuntimeWarning: coroutine 'get_city_housing_data' was never awaited` in `tools/relocation_runway.py:104`. --- ## 2. Agent Eval Suite (`run_evals.py`) **60 / 60 passed (100%) · 60 test cases** ### Results by Category | Category | Passed | Total | Pass Rate | |---|---|---|---| | adversarial | 10 | 10 | ✅ 100% | | edge_case | 10 | 10 | ✅ 100% | | happy_path | 20 | 20 | ✅ 100% | | multi_step | 10 | 10 | ✅ 100% | | write | 10 | 10 | ✅ 100% | ### All Test Cases | ID | Category | Latency | Result | |---|---|---|---| | HP001 | happy_path | 5.8s | ✅ PASS | | HP002 | happy_path | 6.4s | ✅ PASS | | HP003 | happy_path | 6.6s | ✅ PASS | | HP004 | happy_path | 2.0s | ✅ PASS | | HP005 | happy_path | 7.0s | ✅ PASS | | HP006 | happy_path | 10.2s | ✅ PASS | | HP007 | happy_path | 5.6s | ✅ PASS | | HP008 | happy_path | 3.7s | ✅ PASS | | HP009 | happy_path | 4.3s | ✅ PASS | | HP010 | happy_path | 5.8s | ✅ PASS | | HP011 | happy_path | 3.2s | ✅ PASS | | HP012 | happy_path | 3.8s | ✅ PASS | | HP013 | happy_path | 7.0s | ✅ PASS | | HP014 | happy_path | 4.0s | ✅ PASS | | HP015 | happy_path | 4.5s | ✅ PASS | | HP016 | happy_path | 10.2s | ✅ PASS | | HP017 | happy_path | 2.1s | ✅ PASS | | HP018 | happy_path | 8.1s | ✅ PASS | | HP019 | happy_path | 2.7s | ✅ PASS | | HP020 | happy_path | 10.3s | ✅ PASS | | EC001 | edge_case | 0.0s | ✅ PASS | | EC002 | edge_case | 3.4s | ✅ PASS | | EC003 | edge_case | 4.9s | ✅ PASS | | EC004 | edge_case | 5.7s | ✅ PASS | | EC005 | edge_case | 6.1s | ✅ PASS | | EC006 | edge_case | 0.0s | ✅ PASS | | EC007 | edge_case | 3.7s | ✅ PASS | | EC008 | edge_case | 3.7s | ✅ PASS | | EC009 | edge_case | 0.0s | ✅ PASS | | EC010 | edge_case | 13.6s | ✅ PASS | | ADV001 | adversarial | 0.0s | ✅ PASS | | ADV002 | adversarial | 0.0s | ✅ PASS | | ADV003 | adversarial | 0.0s | ✅ PASS | | ADV004 | adversarial | 0.0s | ✅ PASS | | ADV005 | adversarial | 8.6s | ✅ PASS | | ADV006 | adversarial | 0.0s | ✅ PASS | | ADV007 | adversarial | 0.0s | ✅ PASS | | ADV008 | adversarial | 3.6s | ✅ PASS | | ADV009 | adversarial | 0.0s | ✅ PASS | | ADV010 | adversarial | 0.0s | ✅ PASS | | MS001 | multi_step | 6.9s | ✅ PASS | | MS002 | multi_step | 7.9s | ✅ PASS | | MS003 | multi_step | 15.7s | ✅ PASS | | MS004 | multi_step | 8.3s | ✅ PASS | | MS005 | multi_step | 4.9s | ✅ PASS | | MS006 | multi_step | 9.7s | ✅ PASS | | MS007 | multi_step | 12.7s | ✅ PASS | | MS008 | multi_step | 3.9s | ✅ PASS | | MS009 | multi_step | 10.8s | ✅ PASS | | MS010 | multi_step | 15.3s | ✅ PASS | | WR001 | write | 0.2s | ✅ PASS | | WR002 | write | 0.0s | ✅ PASS | | WR003 | write | 5.9s | ✅ PASS | | WR004 | write | 0.0s | ✅ PASS | | WR005 | write | 0.0s | ✅ PASS | | WR006 | write | 0.0s | ✅ PASS | | WR007 | write | 0.2s | ✅ PASS | | WR008 | write | 0.0s | ✅ PASS | | WR009 | write | 6.9s | ✅ PASS | | WR010 | write | 0.0s | ✅ PASS | --- ## 3. Golden Sets (`run_golden_sets.py`) ### Golden Sets — 10 / 10 passed (100%) | ID | Latency | Tools Used | Result | |---|---|---|---| | gs-001 | 3.1s | `portfolio_analysis`, `compliance_check` | ✅ PASS | | gs-002 | 7.0s | `transaction_query` | ✅ PASS | | gs-003 | 6.5s | `portfolio_analysis`, `compliance_check` | ✅ PASS | | gs-004 | 2.3s | `market_data` | ✅ PASS | | gs-005 | 7.5s | `portfolio_analysis`, `transaction_query`, `tax_estimate` | ✅ PASS | | gs-006 | 7.6s | `portfolio_analysis`, `compliance_check` | ✅ PASS | | gs-007 | 0.0s | (none) | ✅ PASS | | gs-008 | 12.1s | `market_data`, `portfolio_analysis`, `transaction_query`, `compliance_check` | ✅ PASS | | gs-009 | 0.0s | (none) | ✅ PASS | | gs-010 | 5.0s | `portfolio_analysis`, `compliance_check` | ✅ PASS | ### Labeled Scenarios — 15 / 15 passed (100%) #### Results by Difficulty | Difficulty | Passed | Total | |---|---|---| | straightforward | 7 | 7 | | ambiguous | 5 | 5 | | edge_case | 2 | 2 | | adversarial | 1 | 1 | #### All Scenarios | ID | Difficulty | Subcategory | Latency | Result | |---|---|---|---|---| | sc-001 | straightforward | performance | 4.0s | ✅ PASS | | sc-002 | straightforward | transaction_and_market | 8.2s | ✅ PASS | | sc-003 | straightforward | compliance_and_tax | 9.1s | ✅ PASS | | sc-004 | ambiguous | performance | 8.7s | ✅ PASS | | sc-005 | edge_case | transaction | 3.3s | ✅ PASS | | sc-006 | adversarial | prompt_injection | 0.0s | ✅ PASS | | sc-007 | straightforward | performance_and_compliance | 5.7s | ✅ PASS | | sc-008 | straightforward | transaction_and_analysis | 9.1s | ✅ PASS | | sc-009 | ambiguous | tax_and_performance | 9.2s | ✅ PASS | | sc-010 | ambiguous | compliance | 7.9s | ✅ PASS | | sc-011 | straightforward | full_position_analysis | 10.4s | ✅ PASS | | sc-012 | edge_case | performance | 0.0s | ✅ PASS | | sc-013 | ambiguous | performance | 6.6s | ✅ PASS | | sc-014 | straightforward | full_report | 13.1s | ✅ PASS | | sc-015 | ambiguous | performance | 7.2s | ✅ PASS | --- ## Fixes Applied All 5 previous failures were resolved with targeted changes to the classifier in `graph.py`: | Case | Root Cause | Fix | |---|---|---| | HP007 | `"biggest"` not in any keyword list | Added `"biggest holding"`, `"biggest position"`, `"top holdings"` etc. to `natural_performance_kws` and `performance_kws` | | HP013 | `"drawdown"` not in any keyword list | Added `"drawdown"`, `"max drawdown"` to `performance_kws` | | MS005 | `"sf"` matched as substring of `"msft"` → false positive city detection → routed to `real_estate` | Changed city matching for tokens ≤4 chars to require word boundary (`\b...\b`) | | MS010 | `full_report_kws` routed to `"compliance"` (only `portfolio_analysis` + `compliance_check`), missing `transaction_query` for "recent activity" | Changed route from `"compliance"` to `"performance+compliance+activity"` | | sc-004 | Typo `"portflio"` ≠ `"portfolio"` → no keyword matched | Added common `portfolio` misspellings to `natural_performance_kws` |