You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 

18 KiB

Todo

Updated: 2026-02-24

  • Verify current repository state and missing required files
  • Create docs/adr/ for architecture decisions
  • Save Tasks.md at repository root
  • Populate docs/tasks/tasks.md
  • Create tasks/improvements.md
  • Create tasks/lessons.md
  • Confirm files exist on disk
  • Kick off MVP slice after Presearch refresh (this session)

Tasks

Last updated: 2026-02-24

Active Tickets

ID Feature Status Tests PR / Commit
T-001 Presearch package and architecture direction Complete Doc review checklist Local docs update
T-002 ADR foundation in docs/adr/ Complete ADR template and first ADR review Local docs update
T-003 Agent MVP tool 1: portfolio_analysis Complete apps/api/src/app/endpoints/ai/ai.service.spec.ts Planned
T-004 Agent memory and response formatter Complete apps/api/src/app/endpoints/ai/ai.service.spec.ts Planned
T-005 Eval dataset baseline (MVP 5-10) Complete apps/api/src/app/endpoints/ai/evals/mvp-eval.runner.spec.ts Planned
T-006 Full eval dataset (50+) Complete apps/api/src/app/endpoints/ai/evals/mvp-eval.runner.spec.ts Local implementation
T-007 Observability wiring (LangSmith traces and metrics) Complete apps/api/src/app/endpoints/ai/ai.service.spec.ts, apps/api/src/app/endpoints/ai/ai-feedback.service.spec.ts, apps/api/src/app/endpoints/ai/evals/mvp-eval.runner.spec.ts Local implementation
T-008 Deployment and submission bundle Complete npm run test:ai + Railway healthcheck + submission docs checklist 2b6506de8
T-009 Open source eval framework contribution In Review @ghostfolio/finance-agent-evals package scaffold + dataset export + smoke/pack checks openai/evals PR #1625 + langchain PR #35421

Notes

  • Canonical project requirements live in docs/requirements.md.
  • Architecture decisions live in docs/adr/.
  • Detailed task board mirror lives in docs/tasks/tasks.md.

MVP Start (Finance Agent on Ghostfolio)

  • Inspect existing AI endpoint and integration points (ai.controller.ts, ai.service.ts, portfolio and data-provider services)
  • Add POST /api/v1/ai/chat endpoint with request validation
  • Implement 3 MVP tools in AI service
  • Add Redis-backed session memory for conversation continuity
  • Add verification checks and structured output formatter (citations, confidence, verification details)
  • Add targeted API unit tests for tool selection and response contract
  • Run lint and API tests
  • Share required and optional .env keys for local MVP run

Session Plan (2026-02-23)

  • Refresh docs/PRESEARCH.md with source-backed framework and eval notes
  • Add root Tasks.md mirror for submission checklist compliance
  • Add AI chat service tests (tool execution, memory, verification, confidence)
  • Add MVP runbook snippet for local execution and API invocation
  • Execute focused verification (lint/test on touched surface)
  • Update ticket status and evidence links

Session Plan (2026-02-23, UI + Deploy + Test Data)

  • Add client chat integration method for POST /api/v1/ai/chat
  • Build MVP chat interface in portfolio analysis page
  • Add focused frontend tests for chat request and response rendering
  • Verify AI test suite + eval suite after UI changes
  • Validate Railway API key and project visibility
  • Install/use Railway CLI and initialize or link project configuration
  • Add local data setup path and seed script command for MVP testing
  • Run final verification commands and capture evidence

Session Plan (2026-02-23, Visibility + Test URL)

  • Diagnose why AI panel and activities are not visible in local UI
  • Remove analysis page visibility gate that hides AI panel for non-experimental users
  • Seed MVP activities for all local users to make testing deterministic
  • Update docs/CODE-REVIEW.md with exact local test URLs and validation steps
  • Run focused tests for touched files and record verification evidence
  • Capture lesson for UI discoverability and test-path communication

Session Plan (2026-02-23, Publish via CLI)

  • Validate Railway CLI availability and auth path
  • Switch to supported Railway CLI version (@railway/cli)
  • Link current repository to configured Railway project/service
  • Trigger production deployment from local repository
  • Return deployed URL and post-deploy health check evidence

Session Plan (2026-02-23, Seed Expansion)

  • Expand AI MVP seed dataset with more symbols and transactions
  • Add a second account per user for diversification scenarios
  • Run seeding command and verify row counts and sample orders
  • Share exact seeded coverage for local and deploy testing

Session Plan (2026-02-23, Submission Bundle Completion)

  • Switch Railway service from source-build deploys to GHCR image deploys
  • Trigger redeploy and verify production health endpoint on image-based deploy
  • Create 1-page AI development log document for submission
  • Create AI cost analysis document with 100/1K/10K/100K projections
  • Push submission documents and deployment updates to origin/main

Session Plan (2026-02-23, Railway Crash Recovery)

  • Reproduce Railway start-command failure locally
  • Correct Railway start command to built API entrypoint
  • Verify fixed command resolves module-not-found crash
  • Update task tracker evidence for deploy follow-up

Session Plan (2026-02-23, AI Chat Intent Recovery)

  • Diagnose why allocation/invest prompts return memory-only fallback
  • Expand tool-intent routing for invest/allocate/rebalance prompts
  • Improve deterministic fallback answer content for allocation guidance
  • Normalize risk concentration math for leveraged/liability portfolios
  • Run focused AI test suite and eval regression checks

Session Plan (2026-02-24, LangSmith Relevance Gate)

  • Add deterministic investment-relevance expectations to MVP eval dataset
  • Add direct eval case for the prompt "Where should I invest?"
  • Add runnable LangSmith eval script for full suite + investment subset summary
  • Run LangSmith eval command and capture pass/fail evidence

Session Plan (2026-02-23, Railway Latency + Redis Auth Fix)

  • Reproduce production slowness and capture health endpoint latency
  • Identify Redis AUTH error spam source from cache URL construction
  • Fix Redis cache URL to avoid credentials when password is empty
  • Correct railway.toml start command for Docker runtime (node main.js)
  • Redeploy and verify logs + latency improvements in production

Session Plan (2026-02-23, Core Features Expansion)

  • Run focused AI verification gate before feature work (npm run test:ai, nx run api:lint)
  • Expand agent toolset from 3 to 5 meaningful finance tools
  • Add deterministic tests for new tool planning and orchestration
  • Extend MVP eval dataset with coverage for new tools
  • Run focused AI regression suite and push to origin/main

Session Plan (2026-02-23, Full Requirements Closure - Local)

  • Expand eval dataset to 50+ cases with required category coverage (happy/edge/adversarial/multi-step)
  • Add LangSmith observability integration for AI chat traces and key metrics
  • Add/adjust tests to validate observability payload and expanded eval pass gate
  • Update submission docs to reflect 5-tool architecture and 50+ eval status
  • Run local verification (npm run test:ai, npm run test:mvp-eval, nx run api:lint) without pushing

Session Plan (2026-02-24, Requirement Closure Execution)

  • Expand eval dataset to at least 50 deterministic test cases with explicit category tags and category-level assertions.
  • Wire AiObservabilityService into AiService.chat and capture total latency, tool latency, LLM latency, error traces, and token estimates.
  • Integrate optional LangSmith eval run upload path in eval runner with environment-based gating.
  • Update AI endpoint tests for observability payload and updated eval thresholds.
  • Update .env.example, docs/LOCAL-TESTING.md, Tasks.md, and docs/tasks/tasks.md to reflect LangSmith setup and new eval baseline.
  • Run focused verification and record outcomes.

Session Plan (2026-02-24, Quality Lift to 9+)

  • Fix AI service typing regression and ensure extended AI quality/performance suites compile and pass.
  • Make observability non-blocking on the request path and harden env defaults to prevent accidental tracing overhead.
  • Improve chat panel quality for theming consistency, i18n coverage, and accessibility semantics.
  • Expand AI verification gate scripts to include quality/performance/feedback suites.
  • Re-run verification (test:ai, test:mvp-eval, api:lint, targeted client tests) and record outcomes.
  • Add deterministic performance regression test gate for single-tool and multi-step latency targets.

Session Plan (2026-02-24, Live Latency + Reply Quality Hardening)

  • Add environment-gated live latency benchmark test that exercises real LLM network calls and records p95 for single-tool and multi-step prompts.
  • Add deterministic reply-quality eval checks (clarity/actionability/anti-disclaimer guardrails) on representative prompts.
  • Add npm script(s) for the new benchmark/eval paths and document how to run locally.
  • Run focused verification (test:ai, test:mvp-eval, new quality and live latency commands) and capture evidence.
  • Update critical requirements and presearch docs with latest evidence and any remaining gaps.

Session Plan (2026-02-24, Remaining Gap Closure)

  • Add explicit eval metrics for hallucination rate and verification accuracy.
  • Add open-source eval package scaffold with dataset artifact and framework-agnostic runner.
  • Add condensed architecture summary document derived from docs/MVP-VERIFICATION.md.
  • Re-run focused verification and capture updated evidence.

Session Plan (2026-02-24, Tool Gating + Routing Hardening)

  • Replace planner unknown-intent fallback with no-tool route ([]) to prevent deterministic over-tooling.
  • Add deterministic policy gate at executor boundary to enforce route decisions (direct|tools|clarify) and tool allowlist filtering.
  • Emit policy metrics in runtime output (blocked_by_policy, block_reason, forced_direct) via verification checks and observability logging.
  • Add/adjust unit tests for planner fallback, policy enforcement, and no-tool execution path.
  • Run focused verification (npm run test:ai, npm run test:mvp-eval) and capture evidence.

Session Plan (2026-02-24, OSS Publish + External PRs)

  • Confirm in-repo open-source tool package is committed and documented for direct repo consumption.
  • Create high-signal upstream PR in openai/evals fork with finance-agent eval dataset integration docs/template.
  • Create high-signal upstream PR in langchain fork with a focused docs contribution for finance-agent eval interoperability.
  • Push fork branches and open PRs against upstream repositories.
  • Update Tasks.md and plan artifact with PR links and current status.

Session Plan (2026-02-24, Chat Persistence + Simple Query Handling)

  • Review current chat panel storage behavior and AI policy direct-route behavior.
  • Implement localStorage-backed chat session/message persistence with bounded history in ai-chat-panel.component.ts.
  • Extend direct no-tool query handling for simple assistant capability/help prompts in ai-agent.policy.utils.ts.
  • Add or update unit tests for chat persistence and policy simple-query routing.
  • Run focused verification on touched frontend/backend AI suites and update task tracking artifacts.

Session Plan (2026-02-24, Per-LLM LangSmith Invocation Tracing)

  • Audit current AI provider call path and verify where LangSmith/LangChain tracing is missing.
  • Add explicit per-provider LLM invocation tracing hooks before/after each generateText provider call.
  • Thread query/session/user context into LLM invocation tracing payloads for easier LangSmith filtering.
  • Update AI and observability unit tests to assert LLM invocation trace behavior and keep provider fallback behavior stable.
  • Run focused verification for touched AI suites and update task tracking notes.

Verification Notes

  • nx run api:lint completed successfully (existing workspace warnings only).
  • Full nx test api currently fails in pre-existing portfolio calculator suites unrelated to AI endpoint changes.
  • Focused MVP verification passed:
    • npm run test:ai
    • npm run test:mvp-eval
    • npm run hostinger:check
    • npx dotenv-cli -e .env.example -- npx jest apps/client/src/app/pages/portfolio/analysis/ai-chat-panel/ai-chat-panel.component.spec.ts --config apps/client/jest.config.ts
    • npm run railway:check
    • npm run railway:setup
    • npm run database:seed:ai-mvp
    • npx nx run client:build:development-en
    • npx nx run client:lint
    • npx dotenv-cli -e .env -- npx -y @railway/cli@latest up --detach
    • npx dotenv-cli -e .env -- npx -y @railway/cli@latest service status
    • curl -i https://ghostfolio-api-production.up.railway.app/api/v1/health
  • Railway crash recovery verification (local):
    • node main.js fails with MODULE_NOT_FOUND for /ghostfolio/main.js (old command)
    • node dist/apps/api/main.js starts successfully
    • curl -fsS http://127.0.0.1:3333/api/v1/health returns {"status":"OK"}
  • Railway crash recovery verification (production):
    • npx dotenv-cli -e .env -- npx -y @railway/cli@latest up --detach
    • npx dotenv-cli -e .env -- npx -y @railway/cli@latest service status reached Status: SUCCESS on deployment 4f26063a-97e5-43dd-b2dd-360e9e12a951
    • curl -i https://ghostfolio-api-production.up.railway.app/api/v1/health returned HTTP/2 200 with {"status":"OK"}
  • AI chat intent recovery verification:
    • npx dotenv-cli -e .env.example -- npx jest apps/api/src/app/endpoints/ai/ai-agent.utils.spec.ts apps/api/src/app/endpoints/ai/ai.service.spec.ts --config apps/api/jest.config.ts
    • npm run test:ai (passed)
  • LangSmith relevance gate verification:
    • npm run test:mvp-eval (passes with the new investment relevance checks)
    • npm run test:ai (6/6 suites, 34/34 tests)
    • npm run test:ai:langsmith -> Overall suite: 53/53 passed (100.0%), Investment relevance subset: 25/25 passed (100.0%)
  • Full requirements closure verification (local, 2026-02-24):
    • npm run test:mvp-eval (passes with 50+ eval cases and category minimums)
    • npm run test:ai (7 suites passed, includes reply quality and timeout fallback assertions)
    • npm run test:ai:performance (service-level p95 regression gate for <5s / <15s targets)
    • npm run test:ai:quality (reply-quality eval slice passed)
    • npm run test:ai:live-latency (env-backed live benchmark passed with strict targets enabled)
    • npm run test:ai:live-latency:strict (single-tool p95 3514ms, multi-step p95 3505ms, both within thresholds)
    • npx nx run api:lint (passed with existing non-blocking workspace warnings)
  • Remaining-gap closure verification (local, 2026-02-24):
    • npm run test:ai (9/9 suites, 40/40 tests)
    • npm run test:mvp-eval (includes hallucination-rate and verification-accuracy assertions)
    • npm run test:ai:quality (3/3 tests)
    • npm run test:ai:performance (p95 under service-level targets)
    • npm run test:ai:live-latency:strict (real model/network strict targets pass)
    • (cd tools/evals/finance-agent-evals && npm run check) (package scaffold smoke test pass)
    • (cd tools/evals/finance-agent-evals && npm run pack:dry-run) (packaging dry run pass)
  • Railway latency + Redis auth fix verification (production):
    • railway up --service ghostfolio-api --detach produced successful deployment d7f73e4a-0a11-4c06-b066-3cbe58368094
    • railway logs -s ghostfolio-api -d d7f73e4a-0a11-4c06-b066-3cbe58368094 -n 800 | rg "ERR AUTH|Redis health check failed" returned no matches
    • curl probes improved from ~1.8-2.2s TTFB to ~0.16-0.47s on /api/v1/health
    • /en/accounts now serves in ~0.27-0.42s TTFB in repeated probes
  • Quality lift verification (local, 2026-02-24):
    • npm run test:ai (9 suites passed, includes new ai-observability.service.spec.ts and deterministic performance gate)
    • npx dotenv-cli -e .env.example -- npx jest apps/client/src/app/pages/portfolio/analysis/ai-chat-panel/ai-chat-panel.component.spec.ts --config apps/client/jest.config.ts (4/4 tests passed)
    • npx nx run api:lint (passes with existing workspace warnings)
    • npx nx run client:lint (passes with existing workspace warnings)
  • Tool gating + routing hardening verification (local, 2026-02-24):
    • npx jest apps/api/src/app/endpoints/ai/ai-agent.utils.spec.ts apps/api/src/app/endpoints/ai/ai.service.spec.ts --config apps/api/jest.config.ts (passes after policy-gating assertion updates)
    • npm run test:ai (9/9 suites, 44/44 tests)
    • npm run test:mvp-eval (pass rate threshold test still passes)
    • npx nx run api:lint (passes with existing workspace warnings)
  • Chat persistence + simple query handling verification (local, 2026-02-24):
    • npx jest apps/client/src/app/pages/portfolio/analysis/ai-chat-panel/ai-chat-panel.component.spec.ts --config apps/client/jest.config.ts (7/7 tests passed)
    • npx jest apps/api/src/app/endpoints/ai/ai-agent.utils.spec.ts apps/api/src/app/endpoints/ai/ai.service.spec.ts --config apps/api/jest.config.ts (31/31 tests passed)
    • npx nx run api:lint (passes with existing workspace warnings)
    • npx nx run client:lint (passes with existing workspace warnings)
  • Per-LLM LangSmith invocation tracing verification (local + deploy config, 2026-02-24):
    • npx jest apps/api/src/app/endpoints/ai/ai-observability.service.spec.ts apps/api/src/app/endpoints/ai/ai.service.spec.ts apps/api/src/app/endpoints/ai/evals/mvp-eval.runner.spec.ts apps/api/src/app/endpoints/ai/ai-performance.spec.ts apps/api/src/app/endpoints/ai/evals/ai-quality-eval.spec.ts apps/api/src/app/endpoints/ai/evals/ai-live-latency.spec.ts --config apps/api/jest.config.ts (5/5 suites passed, live-latency suite skipped by env gate)
    • npx nx run api:lint (passes with existing workspace warnings)
    • railway variable set -s ghostfolio-api --skip-deploys LANGCHAIN_API_KEY=... LANGSMITH_API_KEY=... LANGCHAIN_TRACING_V2=true LANGSMITH_TRACING=true LANGSMITH_PROJECT=ghostfolio-ai-agent
    • railway variable list -s ghostfolio-api --kv confirms: LANGCHAIN_API_KEY, LANGSMITH_API_KEY, LANGCHAIN_TRACING_V2, LANGSMITH_TRACING, LANGSMITH_PROJECT