18 KiB

Raw Blame History

Todo

Updated: 2026-02-24

Verify current repository state and missing required files
Create docs/adr/ for architecture decisions
Save Tasks.md at repository root
Populate docs/tasks/tasks.md
Create tasks/improvements.md
Create tasks/lessons.md
Confirm files exist on disk
Kick off MVP slice after Presearch refresh (this session)

Tasks

Last updated: 2026-02-24

Active Tickets

ID	Feature	Status	Tests	PR / Commit
T-001	Presearch package and architecture direction	Complete	Doc review checklist	Local docs update
T-002	ADR foundation in `docs/adr/`	Complete	ADR template and first ADR review	Local docs update
T-003	Agent MVP tool 1: `portfolio_analysis`	Complete	`apps/api/src/app/endpoints/ai/ai.service.spec.ts`	Planned
T-004	Agent memory and response formatter	Complete	`apps/api/src/app/endpoints/ai/ai.service.spec.ts`	Planned
T-005	Eval dataset baseline (MVP 5-10)	Complete	`apps/api/src/app/endpoints/ai/evals/mvp-eval.runner.spec.ts`	Planned
T-006	Full eval dataset (50+)	Complete	`apps/api/src/app/endpoints/ai/evals/mvp-eval.runner.spec.ts`	Local implementation
T-007	Observability wiring (LangSmith traces and metrics)	Complete	`apps/api/src/app/endpoints/ai/ai.service.spec.ts`, `apps/api/src/app/endpoints/ai/ai-feedback.service.spec.ts`, `apps/api/src/app/endpoints/ai/evals/mvp-eval.runner.spec.ts`	Local implementation
T-008	Deployment and submission bundle	Complete	`npm run test:ai` + Railway healthcheck + submission docs checklist	`2b6506de8`
T-009	Open source eval framework contribution	In Review	`@ghostfolio/finance-agent-evals` package scaffold + dataset export + smoke/pack checks	openai/evals PR #1625 + langchain PR #35421

Notes

Canonical project requirements live in docs/requirements.md.
Architecture decisions live in docs/adr/.
Detailed task board mirror lives in docs/tasks/tasks.md.

MVP Start (Finance Agent on Ghostfolio)

Inspect existing AI endpoint and integration points (ai.controller.ts, ai.service.ts, portfolio and data-provider services)
Add POST /api/v1/ai/chat endpoint with request validation
Implement 3 MVP tools in AI service
Add Redis-backed session memory for conversation continuity
Add verification checks and structured output formatter (citations, confidence, verification details)
Add targeted API unit tests for tool selection and response contract
Run lint and API tests
Share required and optional .env keys for local MVP run

Session Plan (2026-02-23)

Refresh docs/PRESEARCH.md with source-backed framework and eval notes
Add root Tasks.md mirror for submission checklist compliance
Add AI chat service tests (tool execution, memory, verification, confidence)
Add MVP runbook snippet for local execution and API invocation
Execute focused verification (lint/test on touched surface)
Update ticket status and evidence links

Session Plan (2026-02-23, UI + Deploy + Test Data)

Add client chat integration method for POST /api/v1/ai/chat
Build MVP chat interface in portfolio analysis page
Add focused frontend tests for chat request and response rendering
Verify AI test suite + eval suite after UI changes
Validate Railway API key and project visibility
Install/use Railway CLI and initialize or link project configuration
Add local data setup path and seed script command for MVP testing
Run final verification commands and capture evidence

Session Plan (2026-02-23, Visibility + Test URL)

Diagnose why AI panel and activities are not visible in local UI
Remove analysis page visibility gate that hides AI panel for non-experimental users
Seed MVP activities for all local users to make testing deterministic
Update docs/CODE-REVIEW.md with exact local test URLs and validation steps
Run focused tests for touched files and record verification evidence
Capture lesson for UI discoverability and test-path communication

Session Plan (2026-02-23, Publish via CLI)

Validate Railway CLI availability and auth path
Switch to supported Railway CLI version (@railway/cli)
Link current repository to configured Railway project/service
Trigger production deployment from local repository
Return deployed URL and post-deploy health check evidence

Session Plan (2026-02-23, Seed Expansion)

Expand AI MVP seed dataset with more symbols and transactions
Add a second account per user for diversification scenarios
Run seeding command and verify row counts and sample orders
Share exact seeded coverage for local and deploy testing

Session Plan (2026-02-23, Submission Bundle Completion)

Switch Railway service from source-build deploys to GHCR image deploys
Trigger redeploy and verify production health endpoint on image-based deploy
Create 1-page AI development log document for submission
Create AI cost analysis document with 100/1K/10K/100K projections
Push submission documents and deployment updates to origin/main

Session Plan (2026-02-23, Railway Crash Recovery)

Reproduce Railway start-command failure locally
Correct Railway start command to built API entrypoint
Verify fixed command resolves module-not-found crash
Update task tracker evidence for deploy follow-up

Session Plan (2026-02-23, AI Chat Intent Recovery)

Diagnose why allocation/invest prompts return memory-only fallback
Expand tool-intent routing for invest/allocate/rebalance prompts
Improve deterministic fallback answer content for allocation guidance
Normalize risk concentration math for leveraged/liability portfolios
Run focused AI test suite and eval regression checks

Session Plan (2026-02-24, LangSmith Relevance Gate)

Add deterministic investment-relevance expectations to MVP eval dataset
Add direct eval case for the prompt "Where should I invest?"
Add runnable LangSmith eval script for full suite + investment subset summary
Run LangSmith eval command and capture pass/fail evidence

Session Plan (2026-02-23, Railway Latency + Redis Auth Fix)

Reproduce production slowness and capture health endpoint latency
Identify Redis AUTH error spam source from cache URL construction
Fix Redis cache URL to avoid credentials when password is empty
Correct railway.toml start command for Docker runtime (node main.js)
Redeploy and verify logs + latency improvements in production

Session Plan (2026-02-23, Core Features Expansion)

Run focused AI verification gate before feature work (npm run test:ai, nx run api:lint)
Expand agent toolset from 3 to 5 meaningful finance tools
Add deterministic tests for new tool planning and orchestration
Extend MVP eval dataset with coverage for new tools
Run focused AI regression suite and push to origin/main

Session Plan (2026-02-23, Full Requirements Closure - Local)

Expand eval dataset to 50+ cases with required category coverage (happy/edge/adversarial/multi-step)
Add LangSmith observability integration for AI chat traces and key metrics
Add/adjust tests to validate observability payload and expanded eval pass gate
Update submission docs to reflect 5-tool architecture and 50+ eval status
Run local verification (npm run test:ai, npm run test:mvp-eval, nx run api:lint) without pushing

Session Plan (2026-02-24, Requirement Closure Execution)

Expand eval dataset to at least 50 deterministic test cases with explicit category tags and category-level assertions.
Wire AiObservabilityService into AiService.chat and capture total latency, tool latency, LLM latency, error traces, and token estimates.
Integrate optional LangSmith eval run upload path in eval runner with environment-based gating.
Update AI endpoint tests for observability payload and updated eval thresholds.
Update .env.example, docs/LOCAL-TESTING.md, Tasks.md, and docs/tasks/tasks.md to reflect LangSmith setup and new eval baseline.
Run focused verification and record outcomes.

Session Plan (2026-02-24, Quality Lift to 9+)

Fix AI service typing regression and ensure extended AI quality/performance suites compile and pass.
Make observability non-blocking on the request path and harden env defaults to prevent accidental tracing overhead.
Improve chat panel quality for theming consistency, i18n coverage, and accessibility semantics.
Expand AI verification gate scripts to include quality/performance/feedback suites.
Re-run verification (test:ai, test:mvp-eval, api:lint, targeted client tests) and record outcomes.
Add deterministic performance regression test gate for single-tool and multi-step latency targets.

Session Plan (2026-02-24, Live Latency + Reply Quality Hardening)

Add environment-gated live latency benchmark test that exercises real LLM network calls and records p95 for single-tool and multi-step prompts.
Add deterministic reply-quality eval checks (clarity/actionability/anti-disclaimer guardrails) on representative prompts.
Add npm script(s) for the new benchmark/eval paths and document how to run locally.
Run focused verification (test:ai, test:mvp-eval, new quality and live latency commands) and capture evidence.
Update critical requirements and presearch docs with latest evidence and any remaining gaps.

Session Plan (2026-02-24, Remaining Gap Closure)

Add explicit eval metrics for hallucination rate and verification accuracy.
Add open-source eval package scaffold with dataset artifact and framework-agnostic runner.
Add condensed architecture summary document derived from docs/MVP-VERIFICATION.md.
Re-run focused verification and capture updated evidence.

Session Plan (2026-02-24, Tool Gating + Routing Hardening)

Replace planner unknown-intent fallback with no-tool route ([]) to prevent deterministic over-tooling.
Add deterministic policy gate at executor boundary to enforce route decisions (direct|tools|clarify) and tool allowlist filtering.
Emit policy metrics in runtime output (blocked_by_policy, block_reason, forced_direct) via verification checks and observability logging.
Add/adjust unit tests for planner fallback, policy enforcement, and no-tool execution path.
Run focused verification (npm run test:ai, npm run test:mvp-eval) and capture evidence.

Session Plan (2026-02-24, OSS Publish + External PRs)

Confirm in-repo open-source tool package is committed and documented for direct repo consumption.
Create high-signal upstream PR in openai/evals fork with finance-agent eval dataset integration docs/template.
Create high-signal upstream PR in langchain fork with a focused docs contribution for finance-agent eval interoperability.
Push fork branches and open PRs against upstream repositories.
Update Tasks.md and plan artifact with PR links and current status.

Session Plan (2026-02-24, Chat Persistence + Simple Query Handling)

Review current chat panel storage behavior and AI policy direct-route behavior.
Implement localStorage-backed chat session/message persistence with bounded history in ai-chat-panel.component.ts.
Extend direct no-tool query handling for simple assistant capability/help prompts in ai-agent.policy.utils.ts.
Add or update unit tests for chat persistence and policy simple-query routing.
Run focused verification on touched frontend/backend AI suites and update task tracking artifacts.

Session Plan (2026-02-24, Per-LLM LangSmith Invocation Tracing)

Audit current AI provider call path and verify where LangSmith/LangChain tracing is missing.
Add explicit per-provider LLM invocation tracing hooks before/after each generateText provider call.
Thread query/session/user context into LLM invocation tracing payloads for easier LangSmith filtering.
Update AI and observability unit tests to assert LLM invocation trace behavior and keep provider fallback behavior stable.
Run focused verification for touched AI suites and update task tracking notes.

Verification Notes

nx run api:lint completed successfully (existing workspace warnings only).
Full nx test api currently fails in pre-existing portfolio calculator suites unrelated to AI endpoint changes.
Focused MVP verification passed:
- npm run test:ai
- npm run test:mvp-eval
- npm run hostinger:check
- npx dotenv-cli -e .env.example -- npx jest apps/client/src/app/pages/portfolio/analysis/ai-chat-panel/ai-chat-panel.component.spec.ts --config apps/client/jest.config.ts
- npm run railway:check
- npm run railway:setup
- npm run database:seed:ai-mvp
- npx nx run client:build:development-en
- npx nx run client:lint
- npx dotenv-cli -e .env -- npx -y @railway/cli@latest up --detach
- npx dotenv-cli -e .env -- npx -y @railway/cli@latest service status
- curl -i https://ghostfolio-api-production.up.railway.app/api/v1/health
Railway crash recovery verification (local):
- node main.js fails with MODULE_NOT_FOUND for /ghostfolio/main.js (old command)
- node dist/apps/api/main.js starts successfully
- curl -fsS http://127.0.0.1:3333/api/v1/health returns {"status":"OK"}
Railway crash recovery verification (production):
- npx dotenv-cli -e .env -- npx -y @railway/cli@latest up --detach
- npx dotenv-cli -e .env -- npx -y @railway/cli@latest service status reached Status: SUCCESS on deployment 4f26063a-97e5-43dd-b2dd-360e9e12a951
- curl -i https://ghostfolio-api-production.up.railway.app/api/v1/health returned HTTP/2 200 with {"status":"OK"}
AI chat intent recovery verification:
- npx dotenv-cli -e .env.example -- npx jest apps/api/src/app/endpoints/ai/ai-agent.utils.spec.ts apps/api/src/app/endpoints/ai/ai.service.spec.ts --config apps/api/jest.config.ts
- npm run test:ai (passed)
LangSmith relevance gate verification:
- npm run test:mvp-eval (passes with the new investment relevance checks)
- npm run test:ai (6/6 suites, 34/34 tests)
- npm run test:ai:langsmith -> Overall suite: 53/53 passed (100.0%), Investment relevance subset: 25/25 passed (100.0%)
Full requirements closure verification (local, 2026-02-24):
- npm run test:mvp-eval (passes with 50+ eval cases and category minimums)
- npm run test:ai (7 suites passed, includes reply quality and timeout fallback assertions)
- npm run test:ai:performance (service-level p95 regression gate for <5s / <15s targets)
- npm run test:ai:quality (reply-quality eval slice passed)
- npm run test:ai:live-latency (env-backed live benchmark passed with strict targets enabled)
- npm run test:ai:live-latency:strict (single-tool p95 3514ms, multi-step p95 3505ms, both within thresholds)
- npx nx run api:lint (passed with existing non-blocking workspace warnings)
Remaining-gap closure verification (local, 2026-02-24):
- npm run test:ai (9/9 suites, 40/40 tests)
- npm run test:mvp-eval (includes hallucination-rate and verification-accuracy assertions)
- npm run test:ai:quality (3/3 tests)
- npm run test:ai:performance (p95 under service-level targets)
- npm run test:ai:live-latency:strict (real model/network strict targets pass)
- (cd tools/evals/finance-agent-evals && npm run check) (package scaffold smoke test pass)
- (cd tools/evals/finance-agent-evals && npm run pack:dry-run) (packaging dry run pass)
Railway latency + Redis auth fix verification (production):
- railway up --service ghostfolio-api --detach produced successful deployment d7f73e4a-0a11-4c06-b066-3cbe58368094
- railway logs -s ghostfolio-api -d d7f73e4a-0a11-4c06-b066-3cbe58368094 -n 800 | rg "ERR AUTH|Redis health check failed" returned no matches
- curl probes improved from ~1.8-2.2s TTFB to ~0.16-0.47s on /api/v1/health
- /en/accounts now serves in ~0.27-0.42s TTFB in repeated probes
Quality lift verification (local, 2026-02-24):
- npm run test:ai (9 suites passed, includes new ai-observability.service.spec.ts and deterministic performance gate)
- npx dotenv-cli -e .env.example -- npx jest apps/client/src/app/pages/portfolio/analysis/ai-chat-panel/ai-chat-panel.component.spec.ts --config apps/client/jest.config.ts (4/4 tests passed)
- npx nx run api:lint (passes with existing workspace warnings)
- npx nx run client:lint (passes with existing workspace warnings)
Tool gating + routing hardening verification (local, 2026-02-24):
- npx jest apps/api/src/app/endpoints/ai/ai-agent.utils.spec.ts apps/api/src/app/endpoints/ai/ai.service.spec.ts --config apps/api/jest.config.ts (passes after policy-gating assertion updates)
- npm run test:ai (9/9 suites, 44/44 tests)
- npm run test:mvp-eval (pass rate threshold test still passes)
- npx nx run api:lint (passes with existing workspace warnings)
Chat persistence + simple query handling verification (local, 2026-02-24):
- npx jest apps/client/src/app/pages/portfolio/analysis/ai-chat-panel/ai-chat-panel.component.spec.ts --config apps/client/jest.config.ts (7/7 tests passed)
- npx jest apps/api/src/app/endpoints/ai/ai-agent.utils.spec.ts apps/api/src/app/endpoints/ai/ai.service.spec.ts --config apps/api/jest.config.ts (31/31 tests passed)
- npx nx run api:lint (passes with existing workspace warnings)
- npx nx run client:lint (passes with existing workspace warnings)
Per-LLM LangSmith invocation tracing verification (local + deploy config, 2026-02-24):
- npx jest apps/api/src/app/endpoints/ai/ai-observability.service.spec.ts apps/api/src/app/endpoints/ai/ai.service.spec.ts apps/api/src/app/endpoints/ai/evals/mvp-eval.runner.spec.ts apps/api/src/app/endpoints/ai/ai-performance.spec.ts apps/api/src/app/endpoints/ai/evals/ai-quality-eval.spec.ts apps/api/src/app/endpoints/ai/evals/ai-live-latency.spec.ts --config apps/api/jest.config.ts (5/5 suites passed, live-latency suite skipped by env gate)
- npx nx run api:lint (passes with existing workspace warnings)
- railway variable set -s ghostfolio-api --skip-deploys LANGCHAIN_API_KEY=... LANGSMITH_API_KEY=... LANGCHAIN_TRACING_V2=true LANGSMITH_TRACING=true LANGSMITH_PROJECT=ghostfolio-ai-agent
- railway variable list -s ghostfolio-api --kv confirms: LANGCHAIN_API_KEY, LANGSMITH_API_KEY, LANGCHAIN_TRACING_V2, LANGSMITH_TRACING, LANGSMITH_PROJECT

18 KiB Raw Blame History

Todo

Tasks

Active Tickets

Notes

MVP Start (Finance Agent on Ghostfolio)

Session Plan (2026-02-23)

Session Plan (2026-02-23, UI + Deploy + Test Data)

Session Plan (2026-02-23, Visibility + Test URL)

Session Plan (2026-02-23, Publish via CLI)

Session Plan (2026-02-23, Seed Expansion)

Session Plan (2026-02-23, Submission Bundle Completion)

Session Plan (2026-02-23, Railway Crash Recovery)

Session Plan (2026-02-23, AI Chat Intent Recovery)

Session Plan (2026-02-24, LangSmith Relevance Gate)

Session Plan (2026-02-23, Railway Latency + Redis Auth Fix)

Session Plan (2026-02-23, Core Features Expansion)

Session Plan (2026-02-23, Full Requirements Closure - Local)

Session Plan (2026-02-24, Requirement Closure Execution)

Session Plan (2026-02-24, Quality Lift to 9+)

Session Plan (2026-02-24, Live Latency + Reply Quality Hardening)

Session Plan (2026-02-24, Remaining Gap Closure)

Session Plan (2026-02-24, Tool Gating + Routing Hardening)

Session Plan (2026-02-24, OSS Publish + External PRs)

Session Plan (2026-02-24, Chat Persistence + Simple Query Handling)

Session Plan (2026-02-24, Per-LLM LangSmith Invocation Tracing)

Verification Notes

18 KiB

Raw Blame History