Early Submission Demo Video Script (3–5 minutes)

Record with QuickTime. Read callouts aloud.

PART 1: Deployed App + AI Chat (~2 min)

Scene 1 — Deployed URL (0:00)

Open browser to: https://ghostfolio-production-f9fe.up.railway.app
Say: "This is Ghostfolio with an AI financial agent, deployed on Railway."

Navigate to: https://ghostfolio-production-f9fe.up.railway.app/demo
Say: "Logging in as the demo user — pre-seeded portfolio with 5 holdings: Apple, Microsoft, Amazon, Google, and Vanguard Total Stock Market ETF."
Briefly show the portfolio overview.

Scene 3 — AI Chat: Holdings Query (0:30)

Click "AI Chat" in nav.
Type: "What are my current holdings?"
Wait for response.
Say: "The agent called the get_portfolio_holdings tool and returned real portfolio data — symbols, allocations, values, and performance."
Point out the disclaimer at the bottom: "Notice the financial disclaimer — this is one of three domain-specific verification checks."

Scene 4 — Multi-Turn: Performance (1:00)

Type: "How is my portfolio performing overall?"
Wait for response.
Say: "This is a follow-up in the same conversation — conversation history is maintained. The agent used the get_portfolio_performance tool — total return of about 42% on a $15,000 investment."

Scene 5 — Third Tool: Accounts (1:20)

Type: "Show me my accounts"
Say: "Third tool — get_account_summary. We have 9 tools total wrapping existing Ghostfolio services."

Scene 6 — Error Handling (1:35)

Type: "Sell all my stocks immediately"
Say: "The agent is read-only — it gracefully refuses unsafe requests without crashing."

PART 2: Verification Checks (~30 sec)

Scene 7 — Verification in Response (1:50)

Scroll to a response with financial data.
Say: "We have three verification checks running on every response. First, financial disclaimer injection — you can see it at the bottom of every data-bearing response. Second, data-backed claim verification — the system extracts numbers from the response and verifies they appear in the tool results. Third, portfolio scope validation — if the agent mentions a stock symbol, it confirms that symbol actually exists in the user's portfolio."
If the response JSON is accessible (dev tools or API), briefly show the verificationChecks field.

PART 3: Observability Dashboard (~45 sec)

Scene 8 — Langfuse Traces (2:20)

Open a new tab: https://us.cloud.langfuse.com (log in if needed).
Navigate to the Ghostfolio project → Traces.
Say: "Every agent interaction is traced in Langfuse. You can see the full request lifecycle — input, LLM reasoning, tool calls, and output."
Click into one trace to show detail: latency breakdown, token usage, tool calls.
Say: "We're tracking latency, token usage, cost per query, and tool selection accuracy. This gives us full visibility for debugging and improvement."

PART 4: Eval Suite (~1 min)

Scene 9 — Run Evals (3:05)

Switch to terminal.
Say: "The eval suite has 58 test cases across four categories: happy path, edge cases, adversarial inputs, and multi-step reasoning."

Run:

cd ~/Projects/Gauntlet/ghostfolio
SKIP_JUDGE=1 AUTH_TOKEN="<token>" npx tsx apps/api/src/app/endpoints/ai/eval/eval.ts

Wait for results. Should show ~55/58 passing (94.8%).
Say: "55 out of 58 tests passing — 94.8% pass rate, above the 80% target. The suite tests tool selection, response coherence, safety refusals, hallucination detection, and multi-step reasoning."

PART 5: Wrap-Up (4:00)

Say: "To summarize what's been added since MVP: Langfuse observability with full request tracing and cost tracking. Three domain-specific verification checks — financial disclaimers, data-backed claim verification, and portfolio scope validation. And the eval suite expanded from 10 to 58 test cases across all required categories. The agent has 9 tools wrapping real Ghostfolio services, maintains conversation history, handles errors gracefully, and is deployed publicly. Thanks for watching."

Before Recording Checklist

Railway deployment is up (visit URL to confirm)
Langfuse dashboard has recent traces (run a query first to generate one)
Browser open with no other tabs visible
Terminal ready with eval command and AUTH_TOKEN set
QuickTime set to record full screen
You've done one silent dry run through the whole script

4.5 KiB Raw Blame History