From 2e571afe52805e6e4f9d61ab66c35c8ab90afbd3 Mon Sep 17 00:00:00 2001 From: Priyanka Punukollu Date: Sun, 1 Mar 2026 15:28:23 -0600 Subject: [PATCH] =?UTF-8?q?docs:=20fix=20test=20count=20to=20183=20and=20a?= =?UTF-8?q?dd=20open=20source=20contribution=20section=20=E2=80=94=20eval?= =?UTF-8?q?=20dataset=20of=20183=20cases=20is=20open=20source=20contributi?= =?UTF-8?q?on?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Made-with: Cursor --- BOUNTY.md | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/BOUNTY.md b/BOUNTY.md index db434e2c7..02e32c712 100644 --- a/BOUNTY.md +++ b/BOUNTY.md @@ -191,10 +191,28 @@ clear disclaimers about what is a projection vs. a prediction. ## Eval & Verification -- **182 tests** (100% pass rate) covering portfolio logic, CRUD flows, strategy simulation, +- **183 tests** (100% pass rate) covering portfolio logic, CRUD flows, strategy simulation, edge cases, adversarial inputs, and multi-step chains - **3 verification systems:** confidence scoring, source attribution (citation enforcement), and domain constraint check (no guaranteed-return language) - **LangSmith tracing** active — every request traced at `smith.langchain.com` - All tool failures return structured error codes (e.g., `PROPERTY_TRACKER_NOT_FOUND`) - Conversation history maintained across all turns via `AgentState.messages` + +--- + +## Open Source Contribution + +### Eval Test Dataset (183 cases) +The agent ships with 183 open-source evaluation +test cases in `agent/evals/` covering: +- Happy path queries (portfolio, market, property) +- Edge cases (typos, ambiguous queries, empty data) +- Adversarial inputs (prompt injection, nonsense) +- Multi-step conversations (add property then analyze) + +These tests are freely available for any developer +building an AI agent on top of Ghostfolio. They +cover the full Ghostfolio API surface and can be +used as a benchmark for future Ghostfolio agents. +See `agent/evals/EVAL_DATASET_README.md` for details.