# Todo Updated: 2026-02-24 - [x] Verify current repository state and missing required files - [x] Create `docs/adr/` for architecture decisions - [x] Save `Tasks.md` at repository root - [x] Populate `docs/tasks/tasks.md` - [x] Create `tasks/improvements.md` - [x] Create `tasks/lessons.md` - [x] Confirm files exist on disk - [x] Kick off MVP slice after Presearch refresh (this session) # Tasks Last updated: 2026-02-24 ## Active Tickets | ID | Feature | Status | Tests | PR / Commit | | --- | --- | --- | --- | --- | | T-001 | Presearch package and architecture direction | Complete | Doc review checklist | Local docs update | | T-002 | ADR foundation in `docs/adr/` | Complete | ADR template and first ADR review | Local docs update | | T-003 | Agent MVP tool 1: `portfolio_analysis` | Complete | `apps/api/src/app/endpoints/ai/ai.service.spec.ts` | Planned | | T-004 | Agent memory and response formatter | Complete | `apps/api/src/app/endpoints/ai/ai.service.spec.ts` | Planned | | T-005 | Eval dataset baseline (MVP 5-10) | Complete | `apps/api/src/app/endpoints/ai/evals/mvp-eval.runner.spec.ts` | Planned | | T-006 | Full eval dataset (50+) | Complete | `apps/api/src/app/endpoints/ai/evals/mvp-eval.runner.spec.ts` | Local implementation | | T-007 | Observability wiring (LangSmith traces and metrics) | Complete | `apps/api/src/app/endpoints/ai/ai.service.spec.ts`, `apps/api/src/app/endpoints/ai/ai-feedback.service.spec.ts`, `apps/api/src/app/endpoints/ai/evals/mvp-eval.runner.spec.ts` | Local implementation | | T-008 | Deployment and submission bundle | Complete | `npm run test:ai` + Railway healthcheck + submission docs checklist | `2b6506de8` | | T-009 | Open source eval framework contribution | In Review | `@ghostfolio/finance-agent-evals` package scaffold + dataset export + smoke/pack checks | openai/evals PR #1625 + langchain PR #35421 | ## Notes - Canonical project requirements live in `docs/requirements.md`. - Architecture decisions live in `docs/adr/`. - Detailed task board mirror lives in `docs/tasks/tasks.md`. ## MVP Start (Finance Agent on Ghostfolio) - [x] Inspect existing AI endpoint and integration points (`ai.controller.ts`, `ai.service.ts`, portfolio and data-provider services) - [x] Add `POST /api/v1/ai/chat` endpoint with request validation - [x] Implement 3 MVP tools in AI service - [x] Add Redis-backed session memory for conversation continuity - [x] Add verification checks and structured output formatter (citations, confidence, verification details) - [x] Add targeted API unit tests for tool selection and response contract - [x] Run lint and API tests - [x] Share required and optional `.env` keys for local MVP run ## Session Plan (2026-02-23) - [x] Refresh `docs/PRESEARCH.md` with source-backed framework and eval notes - [x] Add root `Tasks.md` mirror for submission checklist compliance - [x] Add AI chat service tests (tool execution, memory, verification, confidence) - [x] Add MVP runbook snippet for local execution and API invocation - [x] Execute focused verification (lint/test on touched surface) - [x] Update ticket status and evidence links ## Session Plan (2026-02-23, UI + Deploy + Test Data) - [x] Add client chat integration method for `POST /api/v1/ai/chat` - [x] Build MVP chat interface in portfolio analysis page - [x] Add focused frontend tests for chat request and response rendering - [x] Verify AI test suite + eval suite after UI changes - [x] Validate Railway API key and project visibility - [x] Install/use Railway CLI and initialize or link project configuration - [x] Add local data setup path and seed script command for MVP testing - [x] Run final verification commands and capture evidence ## Session Plan (2026-02-23, Visibility + Test URL) - [x] Diagnose why AI panel and activities are not visible in local UI - [x] Remove analysis page visibility gate that hides AI panel for non-experimental users - [x] Seed MVP activities for all local users to make testing deterministic - [x] Update `docs/CODE-REVIEW.md` with exact local test URLs and validation steps - [x] Run focused tests for touched files and record verification evidence - [x] Capture lesson for UI discoverability and test-path communication ## Session Plan (2026-02-23, Publish via CLI) - [x] Validate Railway CLI availability and auth path - [x] Switch to supported Railway CLI version (`@railway/cli`) - [x] Link current repository to configured Railway project/service - [x] Trigger production deployment from local repository - [x] Return deployed URL and post-deploy health check evidence ## Session Plan (2026-02-23, Seed Expansion) - [x] Expand AI MVP seed dataset with more symbols and transactions - [x] Add a second account per user for diversification scenarios - [x] Run seeding command and verify row counts and sample orders - [x] Share exact seeded coverage for local and deploy testing ## Session Plan (2026-02-23, Submission Bundle Completion) - [x] Switch Railway service from source-build deploys to GHCR image deploys - [x] Trigger redeploy and verify production health endpoint on image-based deploy - [x] Create 1-page AI development log document for submission - [x] Create AI cost analysis document with 100/1K/10K/100K projections - [x] Push submission documents and deployment updates to `origin/main` ## Session Plan (2026-02-23, Railway Crash Recovery) - [x] Reproduce Railway start-command failure locally - [x] Correct Railway start command to built API entrypoint - [x] Verify fixed command resolves module-not-found crash - [x] Update task tracker evidence for deploy follow-up ## Session Plan (2026-02-23, AI Chat Intent Recovery) - [x] Diagnose why allocation/invest prompts return memory-only fallback - [x] Expand tool-intent routing for invest/allocate/rebalance prompts - [x] Improve deterministic fallback answer content for allocation guidance - [x] Normalize risk concentration math for leveraged/liability portfolios - [x] Run focused AI test suite and eval regression checks ## Session Plan (2026-02-24, LangSmith Relevance Gate) - [x] Add deterministic investment-relevance expectations to MVP eval dataset - [x] Add direct eval case for the prompt "Where should I invest?" - [x] Add runnable LangSmith eval script for full suite + investment subset summary - [x] Run LangSmith eval command and capture pass/fail evidence ## Session Plan (2026-02-23, Railway Latency + Redis Auth Fix) - [x] Reproduce production slowness and capture health endpoint latency - [x] Identify Redis AUTH error spam source from cache URL construction - [x] Fix Redis cache URL to avoid credentials when password is empty - [x] Correct `railway.toml` start command for Docker runtime (`node main.js`) - [x] Redeploy and verify logs + latency improvements in production ## Session Plan (2026-02-23, Core Features Expansion) - [x] Run focused AI verification gate before feature work (`npm run test:ai`, `nx run api:lint`) - [x] Expand agent toolset from 3 to 5 meaningful finance tools - [x] Add deterministic tests for new tool planning and orchestration - [x] Extend MVP eval dataset with coverage for new tools - [x] Run focused AI regression suite and push to `origin/main` ## Session Plan (2026-02-23, Full Requirements Closure - Local) - [x] Expand eval dataset to 50+ cases with required category coverage (happy/edge/adversarial/multi-step) - [x] Add LangSmith observability integration for AI chat traces and key metrics - [x] Add/adjust tests to validate observability payload and expanded eval pass gate - [x] Update submission docs to reflect 5-tool architecture and 50+ eval status - [x] Run local verification (`npm run test:ai`, `npm run test:mvp-eval`, `nx run api:lint`) without pushing ## Session Plan (2026-02-24, Requirement Closure Execution) - [x] Expand eval dataset to at least 50 deterministic test cases with explicit category tags and category-level assertions. - [x] Wire `AiObservabilityService` into `AiService.chat` and capture total latency, tool latency, LLM latency, error traces, and token estimates. - [x] Integrate optional LangSmith eval run upload path in eval runner with environment-based gating. - [x] Update AI endpoint tests for observability payload and updated eval thresholds. - [x] Update `.env.example`, `docs/LOCAL-TESTING.md`, `Tasks.md`, and `docs/tasks/tasks.md` to reflect LangSmith setup and new eval baseline. - [x] Run focused verification and record outcomes. ## Session Plan (2026-02-24, Quality Lift to 9+) - [x] Fix AI service typing regression and ensure extended AI quality/performance suites compile and pass. - [x] Make observability non-blocking on the request path and harden env defaults to prevent accidental tracing overhead. - [x] Improve chat panel quality for theming consistency, i18n coverage, and accessibility semantics. - [x] Expand AI verification gate scripts to include quality/performance/feedback suites. - [x] Re-run verification (`test:ai`, `test:mvp-eval`, `api:lint`, targeted client tests) and record outcomes. - [x] Add deterministic performance regression test gate for single-tool and multi-step latency targets. ## Session Plan (2026-02-24, Live Latency + Reply Quality Hardening) - [x] Add environment-gated live latency benchmark test that exercises real LLM network calls and records p95 for single-tool and multi-step prompts. - [x] Add deterministic reply-quality eval checks (clarity/actionability/anti-disclaimer guardrails) on representative prompts. - [x] Add npm script(s) for the new benchmark/eval paths and document how to run locally. - [x] Run focused verification (`test:ai`, `test:mvp-eval`, new quality and live latency commands) and capture evidence. - [x] Update critical requirements and presearch docs with latest evidence and any remaining gaps. ## Session Plan (2026-02-24, Remaining Gap Closure) - [x] Add explicit eval metrics for hallucination rate and verification accuracy. - [x] Add open-source eval package scaffold with dataset artifact and framework-agnostic runner. - [x] Add condensed architecture summary document derived from `docs/MVP-VERIFICATION.md`. - [x] Re-run focused verification and capture updated evidence. ## Session Plan (2026-02-24, Tool Gating + Routing Hardening) - [x] Replace planner unknown-intent fallback with no-tool route (`[]`) to prevent deterministic over-tooling. - [x] Add deterministic policy gate at executor boundary to enforce route decisions (`direct|tools|clarify`) and tool allowlist filtering. - [x] Emit policy metrics in runtime output (`blocked_by_policy`, `block_reason`, `forced_direct`) via verification checks and observability logging. - [x] Add/adjust unit tests for planner fallback, policy enforcement, and no-tool execution path. - [x] Run focused verification (`npm run test:ai`, `npm run test:mvp-eval`) and capture evidence. ## Session Plan (2026-02-24, OSS Publish + External PRs) - [x] Confirm in-repo open-source tool package is committed and documented for direct repo consumption. - [x] Create high-signal upstream PR in `openai/evals` fork with finance-agent eval dataset integration docs/template. - [x] Create high-signal upstream PR in `langchain` fork with a focused docs contribution for finance-agent eval interoperability. - [x] Push fork branches and open PRs against upstream repositories. - [x] Update `Tasks.md` and plan artifact with PR links and current status. ## Session Plan (2026-02-24, Chat Persistence + Simple Query Handling) - [x] Review current chat panel storage behavior and AI policy direct-route behavior. - [x] Implement localStorage-backed chat session/message persistence with bounded history in `ai-chat-panel.component.ts`. - [x] Extend direct no-tool query handling for simple assistant capability/help prompts in `ai-agent.policy.utils.ts`. - [x] Add or update unit tests for chat persistence and policy simple-query routing. - [x] Run focused verification on touched frontend/backend AI suites and update task tracking artifacts. ## Session Plan (2026-02-24, Per-LLM LangSmith Invocation Tracing) - [x] Audit current AI provider call path and verify where LangSmith/LangChain tracing is missing. - [x] Add explicit per-provider LLM invocation tracing hooks before/after each `generateText` provider call. - [x] Thread query/session/user context into LLM invocation tracing payloads for easier LangSmith filtering. - [x] Update AI and observability unit tests to assert LLM invocation trace behavior and keep provider fallback behavior stable. - [x] Run focused verification for touched AI suites and update task tracking notes. ## Session Plan (2026-02-24, LangChain Wrapper + Arithmetic Direct Reply Correction) - [x] Enforce provider invocation through LangChain runnable wrapper in `AiService.generateText`. - [x] Fix no-tool arithmetic prompts to return computed deterministic replies instead of capability fallback text. - [x] Add/update unit tests for arithmetic direct replies and provider tracing/fallback behavior. - [x] Run focused verification (`test:ai` and `api:lint`) and update tracker notes. ## Verification Notes - `nx run api:lint` completed successfully (existing workspace warnings only). - Full `nx test api` currently fails in pre-existing portfolio calculator suites unrelated to AI endpoint changes. - Focused MVP verification passed: - `npm run test:ai` - `npm run test:mvp-eval` - `npm run hostinger:check` - `npx dotenv-cli -e .env.example -- npx jest apps/client/src/app/pages/portfolio/analysis/ai-chat-panel/ai-chat-panel.component.spec.ts --config apps/client/jest.config.ts` - `npm run railway:check` - `npm run railway:setup` - `npm run database:seed:ai-mvp` - `npx nx run client:build:development-en` - `npx nx run client:lint` - `npx dotenv-cli -e .env -- npx -y @railway/cli@latest up --detach` - `npx dotenv-cli -e .env -- npx -y @railway/cli@latest service status` - `curl -i https://ghostfolio-api-production.up.railway.app/api/v1/health` - Railway crash recovery verification (local): - `node main.js` fails with `MODULE_NOT_FOUND` for `/ghostfolio/main.js` (old command) - `node dist/apps/api/main.js` starts successfully - `curl -fsS http://127.0.0.1:3333/api/v1/health` returns `{"status":"OK"}` - Railway crash recovery verification (production): - `npx dotenv-cli -e .env -- npx -y @railway/cli@latest up --detach` - `npx dotenv-cli -e .env -- npx -y @railway/cli@latest service status` reached `Status: SUCCESS` on deployment `4f26063a-97e5-43dd-b2dd-360e9e12a951` - `curl -i https://ghostfolio-api-production.up.railway.app/api/v1/health` returned `HTTP/2 200` with `{"status":"OK"}` - AI chat intent recovery verification: - `npx dotenv-cli -e .env.example -- npx jest apps/api/src/app/endpoints/ai/ai-agent.utils.spec.ts apps/api/src/app/endpoints/ai/ai.service.spec.ts --config apps/api/jest.config.ts` - `npm run test:ai` (passed) - LangSmith relevance gate verification: - `npm run test:mvp-eval` (passes with the new investment relevance checks) - `npm run test:ai` (6/6 suites, 34/34 tests) - `npm run test:ai:langsmith` -> `Overall suite: 53/53 passed (100.0%)`, `Investment relevance subset: 25/25 passed (100.0%)` - Full requirements closure verification (local, 2026-02-24): - `npm run test:mvp-eval` (passes with 50+ eval cases and category minimums) - `npm run test:ai` (7 suites passed, includes reply quality and timeout fallback assertions) - `npm run test:ai:performance` (service-level p95 regression gate for `<5s` / `<15s` targets) - `npm run test:ai:quality` (reply-quality eval slice passed) - `npm run test:ai:live-latency` (env-backed live benchmark passed with strict targets enabled) - `npm run test:ai:live-latency:strict` (single-tool p95 `3514ms`, multi-step p95 `3505ms`, both within thresholds) - `npx nx run api:lint` (passed with existing non-blocking workspace warnings) - Remaining-gap closure verification (local, 2026-02-24): - `npm run test:ai` (9/9 suites, 40/40 tests) - `npm run test:mvp-eval` (includes hallucination-rate and verification-accuracy assertions) - `npm run test:ai:quality` (3/3 tests) - `npm run test:ai:performance` (p95 under service-level targets) - `npm run test:ai:live-latency:strict` (real model/network strict targets pass) - `(cd tools/evals/finance-agent-evals && npm run check)` (package scaffold smoke test pass) - `(cd tools/evals/finance-agent-evals && npm run pack:dry-run)` (packaging dry run pass) - Railway latency + Redis auth fix verification (production): - `railway up --service ghostfolio-api --detach` produced successful deployment `d7f73e4a-0a11-4c06-b066-3cbe58368094` - `railway logs -s ghostfolio-api -d d7f73e4a-0a11-4c06-b066-3cbe58368094 -n 800 | rg "ERR AUTH|Redis health check failed"` returned no matches - `curl` probes improved from ~1.8-2.2s TTFB to ~0.16-0.47s on `/api/v1/health` - `/en/accounts` now serves in ~0.27-0.42s TTFB in repeated probes - Quality lift verification (local, 2026-02-24): - `npm run test:ai` (9 suites passed, includes new `ai-observability.service.spec.ts` and deterministic performance gate) - `npx dotenv-cli -e .env.example -- npx jest apps/client/src/app/pages/portfolio/analysis/ai-chat-panel/ai-chat-panel.component.spec.ts --config apps/client/jest.config.ts` (4/4 tests passed) - `npx nx run api:lint` (passes with existing workspace warnings) - `npx nx run client:lint` (passes with existing workspace warnings) - Tool gating + routing hardening verification (local, 2026-02-24): - `npx jest apps/api/src/app/endpoints/ai/ai-agent.utils.spec.ts apps/api/src/app/endpoints/ai/ai.service.spec.ts --config apps/api/jest.config.ts` (passes after policy-gating assertion updates) - `npm run test:ai` (9/9 suites, 44/44 tests) - `npm run test:mvp-eval` (pass rate threshold test still passes) - `npx nx run api:lint` (passes with existing workspace warnings) - Chat persistence + simple query handling verification (local, 2026-02-24): - `npx jest apps/client/src/app/pages/portfolio/analysis/ai-chat-panel/ai-chat-panel.component.spec.ts --config apps/client/jest.config.ts` (7/7 tests passed) - `npx jest apps/api/src/app/endpoints/ai/ai-agent.utils.spec.ts apps/api/src/app/endpoints/ai/ai.service.spec.ts --config apps/api/jest.config.ts` (31/31 tests passed) - `npx nx run api:lint` (passes with existing workspace warnings) - `npx nx run client:lint` (passes with existing workspace warnings) - Per-LLM LangSmith invocation tracing verification (local + deploy config, 2026-02-24): - `npx jest apps/api/src/app/endpoints/ai/ai-observability.service.spec.ts apps/api/src/app/endpoints/ai/ai.service.spec.ts apps/api/src/app/endpoints/ai/evals/mvp-eval.runner.spec.ts apps/api/src/app/endpoints/ai/ai-performance.spec.ts apps/api/src/app/endpoints/ai/evals/ai-quality-eval.spec.ts apps/api/src/app/endpoints/ai/evals/ai-live-latency.spec.ts --config apps/api/jest.config.ts` (5/5 suites passed, live-latency suite skipped by env gate) - `npx nx run api:lint` (passes with existing workspace warnings) - `railway variable set -s ghostfolio-api --skip-deploys LANGCHAIN_API_KEY=... LANGSMITH_API_KEY=... LANGCHAIN_TRACING_V2=true LANGSMITH_TRACING=true LANGSMITH_PROJECT=ghostfolio-ai-agent` - `railway variable list -s ghostfolio-api --kv` confirms: `LANGCHAIN_API_KEY`, `LANGSMITH_API_KEY`, `LANGCHAIN_TRACING_V2`, `LANGSMITH_TRACING`, `LANGSMITH_PROJECT` - LangChain wrapper + arithmetic direct reply verification (local, 2026-02-24): - `npx jest apps/api/src/app/endpoints/ai/ai-agent.utils.spec.ts apps/api/src/app/endpoints/ai/ai.service.spec.ts apps/api/src/app/endpoints/ai/ai-observability.service.spec.ts --config apps/api/jest.config.ts` (36/36 tests passed) - `npm run test:ai` (9/9 suites passed, 49/49 tests) - `npx nx run api:lint --verbose` (passes with existing workspace warnings)