diff --git a/community/agent-evals/EVAL_SUMMARY.md b/community/agent-evals/EVAL_SUMMARY.md new file mode 100644 index 000000000..1f9591a86 --- /dev/null +++ b/community/agent-evals/EVAL_SUMMARY.md @@ -0,0 +1,72 @@ +# Ghostfolio AI Agent — Eval Test Dataset + +A community-contributed evaluation dataset for +testing AI agents built on top of the Ghostfolio API. + +## Overview + +183 test cases covering the full Ghostfolio API +surface, contributed by the Gauntlet AI bounty +submission (March 2026). + +## Categories + +| Category | Count | Description | +|----------|-------|-------------| +| Happy Path | 20 | Standard queries that should always work | +| Edge Cases | 14 | Boundary conditions, empty data, typos | +| Adversarial | 14 | Prompt injection, nonsense, abuse attempts | +| Multi-Step | 13 | Conversations spanning multiple tool calls | +| Domain Specific | 122 | Portfolio, market, property, tax, relocation | + +## Query Types Covered + +- Portfolio analysis (holdings, allocation, performance) +- Market data (live stock prices, YTD returns) +- Tax estimation (capital gains, tax-loss harvesting) +- Risk & compliance (concentration, diversification) +- Transaction history (trade history, recent activity) +- Real estate (property CRUD, equity analysis) +- Life decisions (job offers, relocation, retirement) +- Wealth gap analysis (Fed SCF benchmarking) + +## How to Use + +These test cases can be used to evaluate any +AI agent built on top of the Ghostfolio API. +Each test case includes: +- Input query (natural language) +- Expected tool called +- Expected response characteristics +- Verification criteria + +## Full Dataset + +The full test dataset with code is available at: +https://github.com/lakshmipunukollu-ai/ghostfolio-agent-priya +in the `agent/evals/` directory. + +## Routing Test Cases + +| Query | Expected Tool | +|-------|--------------| +| how is my portfolio | portfolio_analysis | +| check apple for me | market_data | +| how many shares of APPL do I have | portfolio_analysis | +| can I afford to retire | wealth_gap | +| tell me about my house | property_tracker | +| what can this do | capabilities | +| give me a full portfolio summary | portfolio_analysis | +| is 180k in seattle a raise from 120k in austin | relocation_runway | +| estimate my capital gains | tax_estimate | +| am I over concentrated | compliance_check | +| what is the price of NVDA | market_data | +| show me my transactions | transaction_query | +| how much is AAPL | market_data | +| can I afford a house | wealth_down_payment | + +## Contributing + +If you build an AI agent on Ghostfolio and want +to contribute additional test cases, please open +a PR adding them to this directory. diff --git a/community/agent-evals/README.md b/community/agent-evals/README.md new file mode 100644 index 000000000..293d4ea97 --- /dev/null +++ b/community/agent-evals/README.md @@ -0,0 +1,61 @@ +# Finance AI Agent — Public Eval Dataset + +183 test cases for AI agents built on personal finance and portfolio management software. + +Built on top of Ghostfolio — an open source wealth management platform. + +Released publicly as a resource for developers building finance AI agents. + +## Test Categories + +| Category | Count | +|----------|-------| +| Happy Path | 20 | +| Edge Cases | 14 | +| Adversarial | 14 | +| Multi-Step | 13 | +| Other | 122 | +| **Total** | **183** | + +## How To Run +```bash +git clone https://github.com/lakshmipunukollu-ai/ghostfolio +cd ghostfolio +git checkout submission/final +pip install -r agent/requirements.txt +python -m pytest agent/evals/ -v +``` + +## Test Structure + +Every test in test_eval_dataset.py follows: +```python +# TYPE: happy_path | edge_case | adversarial | multi_step +# INPUT: what is being tested +# EXPECTED: what the tool should return +# CRITERIA: the specific assertion +def test_name(): + from tools.tool_name import function_name + result = function_name(params) + assert "key" in result +``` + +## Results + +- Tests: 183 +- Pass rate: 100% +- Runtime: ~30 seconds + +## Contribute + +Submit a PR with new test cases. +Follow the TYPE/INPUT/EXPECTED/CRITERIA pattern. + +## License + +MIT + +## Author + +Priya Lakshmipunukollu — AgentForge, February 2026 +https://github.com/lakshmipunukollu-ai/ghostfolio