# Critical Requirements Status

Date: 2026-02-24  
Scope: `docs/requirements.md` + `docs/PRESEARCH.md` critical gates

## 1) Core Technical Requirements

| Requirement | Status | Evidence |
| --- | --- | --- |
| Agent responds to natural-language finance queries | Complete | `POST /api/v1/ai/chat` in `apps/api/src/app/endpoints/ai/ai.controller.ts` |
| 5+ functional tools | Complete | `portfolio_analysis`, `risk_assessment`, `market_data_lookup`, `rebalance_plan`, `stress_test` in `ai.service.ts` and helper modules |
| Tool calls return structured results | Complete | `AiAgentChatResponse` shape with `toolCalls`, `citations`, `verification`, `confidence` |
| Conversation memory across turns | Complete | Redis-backed memory in `ai-agent.chat.helpers.ts` (`AI_AGENT_MEMORY_MAX_TURNS`, TTL) |
| Graceful error handling | Complete | Tool-level catch and fallback response in `ai.service.ts` / `buildAnswer()` |
| 3+ verification checks | Complete | `numerical_consistency`, `market_data_coverage`, `tool_execution`, `citation_coverage`, `output_completeness`, `response_quality`, `rebalance_coverage`, `stress_test_coherence` |
| Eval dataset 50+ with required category distribution | Complete | 53 total in `apps/api/src/app/endpoints/ai/evals/dataset/*` with category gate in `mvp-eval.runner.spec.ts` |
| Observability (trace + latency + token + errors + eval traces) | Complete | `ai-observability.service.ts` + eval tracing in `mvp-eval.runner.ts` (LangSmith env-gated) |
| User feedback mechanism | Complete | `POST /api/v1/ai/chat/feedback`, `AiFeedbackService`, UI feedback buttons in `ai-chat-panel` |

## 2) Performance Evidence

### Service-level latency regression gate (deterministic, mocked providers)

Command:

```bash
npm run test:ai:performance
```

Observed p95 (2026-02-24):

- Single-tool query p95: `0.64ms` (target `<5000ms`)
- Multi-step query p95: `0.22ms` (target `<15000ms`)

Notes:

- This benchmark validates application orchestration performance and guards future refactors.
- It uses mocked providers and isolates app-side overhead.

### Live model/network latency gate (env-backed, strict target mode)

Commands:

```bash
npm run test:ai:live-latency
npm run test:ai:live-latency:strict
```

Observed strict p95 (2026-02-24):

- Single-tool query p95: `3514ms` (target `<5000ms`)
- Multi-step query p95: `3505ms` (target `<15000ms`)

Notes:

- Uses real provider keys from `.env` (`z_ai_glm_api_key` / `minimax_api_key`).
- Guardrail `AI_AGENT_LLM_TIMEOUT_IN_MS` (default `3500`) bounds tail latency and triggers deterministic fallback when provider response exceeds budget.

### Required command gate (current)

```bash
npm run test:ai
npm run test:mvp-eval
npm run test:ai:quality
npm run test:ai:performance
npm run test:ai:live-latency:strict
npx nx run api:lint
```

All pass locally.

### Eval quality target tracking

- Hallucination-rate heuristic is tracked in `mvp-eval.runner.ts` and asserted in `mvp-eval.runner.spec.ts` with threshold `<= 0.05`.
- Verification-accuracy metric is tracked in `mvp-eval.runner.ts` and asserted in `mvp-eval.runner.spec.ts` with threshold `>= 0.9`.

## 3) File Size Constraint (~500 LOC)

Current AI endpoint surface stays within the target:

- `ai.service.ts`: 470 LOC
- `ai-agent.chat.helpers.ts`: 436 LOC
- `ai-agent.verification.helpers.ts`: 102 LOC
- `mvp-eval.runner.ts`: 450 LOC
- `ai-observability.service.ts`: 443 LOC

Refactor requirement now:

- No mandatory refactor required to satisfy the file-size rule.

## 4) Remaining Final Submission Items

These are still outstanding at submission level:

- Demo video (3-5 min)
- Social post with `@GauntletAI`
- Open-source package npm publish (local scaffold complete at `tools/evals/finance-agent-evals/`, upstream PRs opened)

External contribution links:

- https://github.com/openai/evals/pull/1625
- https://github.com/langchain-ai/langchain/pull/35421

Open-source scaffold verification commands:

```bash
npm run evals:package:check
npm run evals:package:pack
```

## 5) AI Reply Quality

Current state:

- Deterministic response-quality heuristics are implemented (`response_quality` verification check).
- Generic disclaimer answers and low-information answers are filtered by reliability gating in `buildAnswer()`.
- Quality eval slice is active via `apps/api/src/app/endpoints/ai/evals/ai-quality-eval.spec.ts`.

Recommendation:

- Keep adding real failing prompts into quality eval cases and tune prompt policy in `buildAnswer()` with deterministic assertions.