Browse Source

community: add AI agent eval

dataset (183 test cases)

Contributes a community eval dataset for testing
AI agents built on the Ghostfolio API. Includes
183 test cases covering portfolio analysis,
market data, tax estimation, real estate,
life decisions, and adversarial inputs.

Full implementation available at:
https://github.com/lakshmipunukollu-ai/ghostfolio-agent-priya

Contributed by Gauntlet AI bounty submission.

Made-with: Cursor
pull/6453/head
Priyanka Punukollu 1 month ago
parent
commit
018af2c09c
  1. 72
      community/agent-evals/EVAL_SUMMARY.md
  2. 61
      community/agent-evals/README.md

72
community/agent-evals/EVAL_SUMMARY.md

@ -0,0 +1,72 @@
# Ghostfolio AI Agent — Eval Test Dataset
A community-contributed evaluation dataset for
testing AI agents built on top of the Ghostfolio API.
## Overview
183 test cases covering the full Ghostfolio API
surface, contributed by the Gauntlet AI bounty
submission (March 2026).
## Categories
| Category | Count | Description |
|----------|-------|-------------|
| Happy Path | 20 | Standard queries that should always work |
| Edge Cases | 14 | Boundary conditions, empty data, typos |
| Adversarial | 14 | Prompt injection, nonsense, abuse attempts |
| Multi-Step | 13 | Conversations spanning multiple tool calls |
| Domain Specific | 122 | Portfolio, market, property, tax, relocation |
## Query Types Covered
- Portfolio analysis (holdings, allocation, performance)
- Market data (live stock prices, YTD returns)
- Tax estimation (capital gains, tax-loss harvesting)
- Risk & compliance (concentration, diversification)
- Transaction history (trade history, recent activity)
- Real estate (property CRUD, equity analysis)
- Life decisions (job offers, relocation, retirement)
- Wealth gap analysis (Fed SCF benchmarking)
## How to Use
These test cases can be used to evaluate any
AI agent built on top of the Ghostfolio API.
Each test case includes:
- Input query (natural language)
- Expected tool called
- Expected response characteristics
- Verification criteria
## Full Dataset
The full test dataset with code is available at:
https://github.com/lakshmipunukollu-ai/ghostfolio-agent-priya
in the `agent/evals/` directory.
## Routing Test Cases
| Query | Expected Tool |
|-------|--------------|
| how is my portfolio | portfolio_analysis |
| check apple for me | market_data |
| how many shares of APPL do I have | portfolio_analysis |
| can I afford to retire | wealth_gap |
| tell me about my house | property_tracker |
| what can this do | capabilities |
| give me a full portfolio summary | portfolio_analysis |
| is 180k in seattle a raise from 120k in austin | relocation_runway |
| estimate my capital gains | tax_estimate |
| am I over concentrated | compliance_check |
| what is the price of NVDA | market_data |
| show me my transactions | transaction_query |
| how much is AAPL | market_data |
| can I afford a house | wealth_down_payment |
## Contributing
If you build an AI agent on Ghostfolio and want
to contribute additional test cases, please open
a PR adding them to this directory.

61
community/agent-evals/README.md

@ -0,0 +1,61 @@
# Finance AI Agent — Public Eval Dataset
183 test cases for AI agents built on personal finance and portfolio management software.
Built on top of Ghostfolio — an open source wealth management platform.
Released publicly as a resource for developers building finance AI agents.
## Test Categories
| Category | Count |
|----------|-------|
| Happy Path | 20 |
| Edge Cases | 14 |
| Adversarial | 14 |
| Multi-Step | 13 |
| Other | 122 |
| **Total** | **183** |
## How To Run
```bash
git clone https://github.com/lakshmipunukollu-ai/ghostfolio
cd ghostfolio
git checkout submission/final
pip install -r agent/requirements.txt
python -m pytest agent/evals/ -v
```
## Test Structure
Every test in test_eval_dataset.py follows:
```python
# TYPE: happy_path | edge_case | adversarial | multi_step
# INPUT: what is being tested
# EXPECTED: what the tool should return
# CRITERIA: the specific assertion
def test_name():
from tools.tool_name import function_name
result = function_name(params)
assert "key" in result
```
## Results
- Tests: 183
- Pass rate: 100%
- Runtime: ~30 seconds
## Contribute
Submit a PR with new test cases.
Follow the TYPE/INPUT/EXPECTED/CRITERIA pattern.
## License
MIT
## Author
Priya Lakshmipunukollu — AgentForge, February 2026
https://github.com/lakshmipunukollu-ai/ghostfolio
Loading…
Cancel
Save