You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 

4.9 KiB

Code Review — AI Agent Requirement Closure

Date: 2026-02-24
Scope: Ghostfolio finance agent requirement closure (docs/requirements.md)
Status: Core technical requirements complete (local verification gate passed, including strict live-latency check)

Summary

The previously open requirement gaps are closed in code and tests:

  1. Eval framework expanded to 50+ deterministic cases with category minimum checks.
  2. LangSmith observability integrated for chat traces and eval-suite tracing.
  3. User feedback capture implemented end-to-end (API + persistence + UI actions).
  4. Local verification gate completed without pushing to main.
  5. Reply quality guardrail and eval slice added.
  6. Live model/network latency gate added and passing strict targets.

What Changed

1) Eval Dataset Expansion (50+)

  • Dataset now exports 53 cases:
    • happy_path: 23
    • edge_case: 10
    • adversarial: 10
    • multi_step: 10
  • Category assertions are enforced in mvp-eval.runner.spec.ts.
  • Dataset organization uses category files under:
    • apps/api/src/app/endpoints/ai/evals/dataset/

2) Observability Integration

  • Chat observability in API:
    • apps/api/src/app/endpoints/ai/ai-observability.service.ts
    • apps/api/src/app/endpoints/ai/ai.service.ts
  • Captures:
    • latency (total + breakdown)
    • token estimates
    • tool trace metadata
    • failure traces
  • LangSmith wiring is environment-gated and supports LANGSMITH_* and LANGCHAIN_* variables.

3) Feedback Loop (Thumbs Up/Down)

  • API DTO + endpoint:
    • apps/api/src/app/endpoints/ai/ai-chat-feedback.dto.ts
    • POST /api/v1/ai/chat/feedback
  • Persistence + telemetry:
    • feedback saved in Redis with TTL
    • feedback event traced/logged through observability service
  • UI action wiring:
    • apps/client/src/app/pages/portfolio/analysis/ai-chat-panel/
    • user can mark assistant responses as Helpful or Needs work

4) Reply Quality Guardrail

  • Quality heuristics added:
    • anti-disclaimer filtering
    • actionability checks for invest/rebalance intent
    • numeric evidence checks for quantitative prompts
  • New verification check in responses:
    • response_quality
  • New quality eval suite:
    • apps/api/src/app/endpoints/ai/evals/ai-quality-eval.spec.ts

5) Live Latency Gate

  • New benchmark suite:
    • apps/api/src/app/endpoints/ai/evals/ai-live-latency.spec.ts
  • Commands:
    • npm run test:ai:live-latency
    • npm run test:ai:live-latency:strict
  • Latest strict run:
    • single-tool p95: 3514ms (< 5000ms)
    • multi-step p95: 3505ms (< 15000ms)
  • Tail-latency guardrail:
    • AI_AGENT_LLM_TIMEOUT_IN_MS (default 3500) with deterministic fallback.

6) Eval Quality Metrics (Tracked)

  • hallucinationRate added to eval suite result with threshold gate <= 0.05.
  • verificationAccuracy added to eval suite result with threshold gate >= 0.9.
  • Both metrics are asserted in mvp-eval.runner.spec.ts.

Verification Results

Commands run locally:

npm run test:ai
npm run test:mvp-eval
npm run test:ai:quality
npm run test:ai:performance
npm run test:ai:live-latency:strict
npx nx run api:lint
npx dotenv-cli -e .env.example -- npx jest apps/client/src/app/pages/portfolio/analysis/ai-chat-panel/ai-chat-panel.component.spec.ts --config apps/client/jest.config.ts

Results:

  • test:ai: passed (9 suites, 40 tests)
  • test:mvp-eval: passed (category gate + pass-rate gate)
  • test:ai:quality: passed (reply-quality eval slice)
  • test:ai:performance: passed (service-level p95 gate)
  • test:ai:live-latency:strict: passed (real model/network p95 gate)
  • api:lint: passed (existing workspace warnings remain non-blocking)
  • client chat panel spec: passed (4 tests, including feedback flow)

Requirement Mapping (Technical Scope)

Requirement Status Evidence
5+ required tools determineToolPlan() + 5 tool executors in AI endpoint
50+ eval cases + category mix mvp-eval.dataset.ts + evals/dataset/* + category assertions in spec
Observability (trace, latency, token) ai-observability.service.ts, ai.service.ts, mvp-eval.runner.ts
User feedback mechanism /ai/chat/feedback, Redis write, UI buttons
Verification/guardrails in output verification checks + confidence + citations + response_quality in response contract
Strict latency targets (<5s / <15s) test:ai:live-latency:strict evidence in this review
Hallucination-rate tracking (<5%) mvp-eval.runner.ts metric + mvp-eval.runner.spec.ts threshold assertion
Verification-accuracy tracking (>90%) mvp-eval.runner.ts metric + mvp-eval.runner.spec.ts threshold assertion

Remaining Non-Code Submission Items

These are still manual deliverables outside local code/test closure:

  • Demo video (3-5 min)
  • Social post (X/LinkedIn)
  • Final PDF packaging of submission docs