mirror of https://github.com/ghostfolio/ghostfolio
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
4.9 KiB
4.9 KiB
Code Review — AI Agent Requirement Closure
Date: 2026-02-24
Scope: Ghostfolio finance agent requirement closure (docs/requirements.md)
Status: ✅ Core technical requirements complete (local verification gate passed, including strict live-latency check)
Summary
The previously open requirement gaps are closed in code and tests:
- Eval framework expanded to 50+ deterministic cases with category minimum checks.
- LangSmith observability integrated for chat traces and eval-suite tracing.
- User feedback capture implemented end-to-end (API + persistence + UI actions).
- Local verification gate completed without pushing to
main. - Reply quality guardrail and eval slice added.
- Live model/network latency gate added and passing strict targets.
What Changed
1) Eval Dataset Expansion (50+)
- Dataset now exports 53 cases:
happy_path: 23edge_case: 10adversarial: 10multi_step: 10
- Category assertions are enforced in
mvp-eval.runner.spec.ts. - Dataset organization uses category files under:
apps/api/src/app/endpoints/ai/evals/dataset/
2) Observability Integration
- Chat observability in API:
apps/api/src/app/endpoints/ai/ai-observability.service.tsapps/api/src/app/endpoints/ai/ai.service.ts
- Captures:
- latency (total + breakdown)
- token estimates
- tool trace metadata
- failure traces
- LangSmith wiring is environment-gated and supports
LANGSMITH_*andLANGCHAIN_*variables.
3) Feedback Loop (Thumbs Up/Down)
- API DTO + endpoint:
apps/api/src/app/endpoints/ai/ai-chat-feedback.dto.tsPOST /api/v1/ai/chat/feedback
- Persistence + telemetry:
- feedback saved in Redis with TTL
- feedback event traced/logged through observability service
- UI action wiring:
apps/client/src/app/pages/portfolio/analysis/ai-chat-panel/- user can mark assistant responses as
HelpfulorNeeds work
4) Reply Quality Guardrail
- Quality heuristics added:
- anti-disclaimer filtering
- actionability checks for invest/rebalance intent
- numeric evidence checks for quantitative prompts
- New verification check in responses:
response_quality
- New quality eval suite:
apps/api/src/app/endpoints/ai/evals/ai-quality-eval.spec.ts
5) Live Latency Gate
- New benchmark suite:
apps/api/src/app/endpoints/ai/evals/ai-live-latency.spec.ts
- Commands:
npm run test:ai:live-latencynpm run test:ai:live-latency:strict
- Latest strict run:
- single-tool p95:
3514ms(<5000ms) - multi-step p95:
3505ms(<15000ms)
- single-tool p95:
- Tail-latency guardrail:
AI_AGENT_LLM_TIMEOUT_IN_MS(default3500) with deterministic fallback.
6) Eval Quality Metrics (Tracked)
hallucinationRateadded to eval suite result with threshold gate<= 0.05.verificationAccuracyadded to eval suite result with threshold gate>= 0.9.- Both metrics are asserted in
mvp-eval.runner.spec.ts.
Verification Results
Commands run locally:
npm run test:ai
npm run test:mvp-eval
npm run test:ai:quality
npm run test:ai:performance
npm run test:ai:live-latency:strict
npx nx run api:lint
npx dotenv-cli -e .env.example -- npx jest apps/client/src/app/pages/portfolio/analysis/ai-chat-panel/ai-chat-panel.component.spec.ts --config apps/client/jest.config.ts
Results:
test:ai: passed (9 suites, 40 tests)test:mvp-eval: passed (category gate + pass-rate gate)test:ai:quality: passed (reply-quality eval slice)test:ai:performance: passed (service-level p95 gate)test:ai:live-latency:strict: passed (real model/network p95 gate)api:lint: passed (existing workspace warnings remain non-blocking)- client chat panel spec: passed (4 tests, including feedback flow)
Requirement Mapping (Technical Scope)
| Requirement | Status | Evidence |
|---|---|---|
| 5+ required tools | ✅ | determineToolPlan() + 5 tool executors in AI endpoint |
| 50+ eval cases + category mix | ✅ | mvp-eval.dataset.ts + evals/dataset/* + category assertions in spec |
| Observability (trace, latency, token) | ✅ | ai-observability.service.ts, ai.service.ts, mvp-eval.runner.ts |
| User feedback mechanism | ✅ | /ai/chat/feedback, Redis write, UI buttons |
| Verification/guardrails in output | ✅ | verification checks + confidence + citations + response_quality in response contract |
Strict latency targets (<5s / <15s) |
✅ | test:ai:live-latency:strict evidence in this review |
Hallucination-rate tracking (<5%) |
✅ | mvp-eval.runner.ts metric + mvp-eval.runner.spec.ts threshold assertion |
Verification-accuracy tracking (>90%) |
✅ | mvp-eval.runner.ts metric + mvp-eval.runner.spec.ts threshold assertion |
Remaining Non-Code Submission Items
These are still manual deliverables outside local code/test closure:
- Demo video (3-5 min)
- Social post (X/LinkedIn)
- Final PDF packaging of submission docs