Critical Requirements Status

Date: 2026-02-24
Scope: docs/requirements.md + docs/PRESEARCH.md critical gates

1) Core Technical Requirements

Requirement	Status	Evidence
Agent responds to natural-language finance queries	Complete	`POST /api/v1/ai/chat` in `apps/api/src/app/endpoints/ai/ai.controller.ts`
5+ functional tools	Complete	`portfolio_analysis`, `risk_assessment`, `market_data_lookup`, `rebalance_plan`, `stress_test` in `ai.service.ts` and helper modules
Tool calls return structured results	Complete	`AiAgentChatResponse` shape with `toolCalls`, `citations`, `verification`, `confidence`
Conversation memory across turns	Complete	Redis-backed memory in `ai-agent.chat.helpers.ts` (`AI_AGENT_MEMORY_MAX_TURNS`, TTL)
Graceful error handling	Complete	Tool-level catch and fallback response in `ai.service.ts` / `buildAnswer()`
3+ verification checks	Complete	`numerical_consistency`, `market_data_coverage`, `tool_execution`, `citation_coverage`, `output_completeness`, `response_quality`, `rebalance_coverage`, `stress_test_coherence`
Eval dataset 50+ with required category distribution	Complete	53 total in `apps/api/src/app/endpoints/ai/evals/dataset/*` with category gate in `mvp-eval.runner.spec.ts`
Observability (trace + latency + token + errors + eval traces)	Complete	`ai-observability.service.ts` + eval tracing in `mvp-eval.runner.ts` (LangSmith env-gated)
User feedback mechanism	Complete	`POST /api/v1/ai/chat/feedback`, `AiFeedbackService`, UI feedback buttons in `ai-chat-panel`

Command:

npm run test:ai:performance

Observed p95 (2026-02-24):

Notes:

This benchmark validates application orchestration performance and guards future refactors.
It uses mocked providers and isolates app-side overhead.

Commands:

npm run test:ai:live-latency
npm run test:ai:live-latency:strict

Observed strict p95 (2026-02-24):

Notes:

Uses real provider keys from .env (z_ai_glm_api_key / minimax_api_key).
Guardrail AI_AGENT_LLM_TIMEOUT_IN_MS (default 3500) bounds tail latency and triggers deterministic fallback when provider response exceeds budget.

npm run test:ai
npm run test:mvp-eval
npm run test:ai:quality
npm run test:ai:performance
npm run test:ai:live-latency:strict
npx nx run api:lint

All pass locally.

Hallucination-rate heuristic is tracked in mvp-eval.runner.ts and asserted in mvp-eval.runner.spec.ts with threshold <= 0.05.
Verification-accuracy metric is tracked in mvp-eval.runner.ts and asserted in mvp-eval.runner.spec.ts with threshold >= 0.9.

Current AI endpoint surface stays within the target:

Refactor requirement now:

These are still outstanding at submission level:

Demo video (3-5 min)
Social post with @GauntletAI
Open-source release link (local scaffold complete at tools/evals/finance-agent-evals/, external publish/PR link still pending)

Open-source scaffold verification commands:

npm run evals:package:check
npm run evals:package:pack

Current state:

Deterministic response-quality heuristics are implemented (response_quality verification check).
Generic disclaimer answers and low-information answers are filtered by reliability gating in buildAnswer().
Quality eval slice is active via apps/api/src/app/endpoints/ai/evals/ai-quality-eval.spec.ts.

Recommendation:

Keep adding real failing prompts into quality eval cases and tune prompt policy in buildAnswer() with deterministic assertions.