community: add AI agent eval

dataset (183 test cases) Contributes a community eval dataset for testing AI agents built on the Ghostfolio API. Includes 183 test cases covering portfolio analysis, market data, tax estimation, real estate, life decisions, and adversarial inputs. Full implementation available at: https://github.com/lakshmipunukollu-ai/ghostfolio-agent-priya Contributed by Gauntlet AI bounty submission. Made-with: Cursor
1 month ago · 018af2c09c
2 changed files with 133 additions and 0 deletions
--- a/community/agent-evals/EVAL_SUMMARY.md
+++ b/community/agent-evals/EVAL_SUMMARY.md
@ -0,0 +1,72 @@
 # Ghostfolio AI Agent — Eval Test Dataset
 A community-contributed evaluation dataset for
 testing AI agents built on top of the Ghostfolio API.
 ## Overview
 183 test cases covering the full Ghostfolio API
 surface, contributed by the Gauntlet AI bounty
 submission (March 2026).
 ## Categories
 | Category | Count | Description |
 |----------|-------|-------------|
 | Happy Path | 20 | Standard queries that should always work |
 | Edge Cases | 14 | Boundary conditions, empty data, typos |
 | Adversarial | 14 | Prompt injection, nonsense, abuse attempts |
 | Multi-Step | 13 | Conversations spanning multiple tool calls |
 | Domain Specific | 122 | Portfolio, market, property, tax, relocation |
 ## Query Types Covered
 - Portfolio analysis (holdings, allocation, performance)
 - Market data (live stock prices, YTD returns)
 - Tax estimation (capital gains, tax-loss harvesting)
 - Risk & compliance (concentration, diversification)
 - Transaction history (trade history, recent activity)
 - Real estate (property CRUD, equity analysis)
 - Life decisions (job offers, relocation, retirement)
 - Wealth gap analysis (Fed SCF benchmarking)
 ## How to Use
 These test cases can be used to evaluate any
 AI agent built on top of the Ghostfolio API.
 Each test case includes:
 - Input query (natural language)
 - Expected tool called
 - Expected response characteristics
 - Verification criteria
 ## Full Dataset
 The full test dataset with code is available at:
 https://github.com/lakshmipunukollu-ai/ghostfolio-agent-priya
 in the `agent/evals/` directory.
 ## Routing Test Cases
 | Query | Expected Tool |
 |-------|--------------|
 | how is my portfolio | portfolio_analysis |
 | check apple for me | market_data |
 | how many shares of APPL do I have | portfolio_analysis |
 | can I afford to retire | wealth_gap |
 | tell me about my house | property_tracker |
 | what can this do | capabilities |
 | give me a full portfolio summary | portfolio_analysis |
 | is 180k in seattle a raise from 120k in austin | relocation_runway |
 | estimate my capital gains | tax_estimate |
 | am I over concentrated | compliance_check |
 | what is the price of NVDA | market_data |
 | show me my transactions | transaction_query |
 | how much is AAPL | market_data |
 | can I afford a house | wealth_down_payment |
 ## Contributing
 If you build an AI agent on Ghostfolio and want
 to contribute additional test cases, please open
 a PR adding them to this directory.
--- a/community/agent-evals/README.md
+++ b/community/agent-evals/README.md
@ -0,0 +1,61 @@
 # Finance AI Agent — Public Eval Dataset
 183 test cases for AI agents built on personal finance and portfolio management software.
 Built on top of Ghostfolio — an open source wealth management platform.
 Released publicly as a resource for developers building finance AI agents.
 ## Test Categories
 | Category | Count |
 |----------|-------|
 | Happy Path | 20 |
 | Edge Cases | 14 |
 | Adversarial | 14 |
 | Multi-Step | 13 |
 | Other | 122 |
 | **Total** | **183** |
 ## How To Run
 ```bash
 git clone https://github.com/lakshmipunukollu-ai/ghostfolio
 cd ghostfolio
 git checkout submission/final
 pip install -r agent/requirements.txt
 python -m pytest agent/evals/ -v
 ```
 ## Test Structure
 Every test in test_eval_dataset.py follows:
 ```python
 # TYPE: happy_path | edge_case | adversarial | multi_step
 # INPUT: what is being tested
 # EXPECTED: what the tool should return
 # CRITERIA: the specific assertion
 def test_name():
    from tools.tool_name import function_name
    result = function_name(params)
    assert "key" in result
 ```
 ## Results
 - Tests: 183
 - Pass rate: 100%
 - Runtime: ~30 seconds
 ## Contribute
 Submit a PR with new test cases.
 Follow the TYPE/INPUT/EXPECTED/CRITERIA pattern.
 ## License
 MIT
 ## Author
 Priya Lakshmipunukollu — AgentForge, February 2026
 https://github.com/lakshmipunukollu-ai/ghostfolio