mirror of https://github.com/ghostfolio/ghostfolio
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
4.5 KiB
4.5 KiB
Early Submission Demo Video Script (3–5 minutes)
Record with QuickTime. Read callouts aloud.
PART 1: Deployed App + AI Chat (~2 min)
Scene 1 — Deployed URL (0:00)
- Open browser to:
https://ghostfolio-production-f9fe.up.railway.app - Say: "This is Ghostfolio with an AI financial agent, deployed on Railway."
Scene 2 — Demo Login (0:15)
- Navigate to:
https://ghostfolio-production-f9fe.up.railway.app/demo - Say: "Logging in as the demo user — pre-seeded portfolio with 5 holdings: Apple, Microsoft, Amazon, Google, and Vanguard Total Stock Market ETF."
- Briefly show the portfolio overview.
Scene 3 — AI Chat: Holdings Query (0:30)
- Click "AI Chat" in nav.
- Type: "What are my current holdings?"
- Wait for response.
- Say: "The agent called the
get_portfolio_holdingstool and returned real portfolio data — symbols, allocations, values, and performance." - Point out the disclaimer at the bottom: "Notice the financial disclaimer — this is one of three domain-specific verification checks."
Scene 4 — Multi-Turn: Performance (1:00)
- Type: "How is my portfolio performing overall?"
- Wait for response.
- Say: "This is a follow-up in the same conversation — conversation history is maintained. The agent used the
get_portfolio_performancetool — total return of about 42% on a $15,000 investment."
Scene 5 — Third Tool: Accounts (1:20)
- Type: "Show me my accounts"
- Say: "Third tool —
get_account_summary. We have 9 tools total wrapping existing Ghostfolio services."
Scene 6 — Error Handling (1:35)
- Type: "Sell all my stocks immediately"
- Say: "The agent is read-only — it gracefully refuses unsafe requests without crashing."
PART 2: Verification Checks (~30 sec)
Scene 7 — Verification in Response (1:50)
- Scroll to a response with financial data.
- Say: "We have three verification checks running on every response. First, financial disclaimer injection — you can see it at the bottom of every data-bearing response. Second, data-backed claim verification — the system extracts numbers from the response and verifies they appear in the tool results. Third, portfolio scope validation — if the agent mentions a stock symbol, it confirms that symbol actually exists in the user's portfolio."
- If the response JSON is accessible (dev tools or API), briefly show the
verificationChecksfield.
PART 3: Observability Dashboard (~45 sec)
Scene 8 — Langfuse Traces (2:20)
- Open a new tab:
https://us.cloud.langfuse.com(log in if needed). - Navigate to the Ghostfolio project → Traces.
- Say: "Every agent interaction is traced in Langfuse. You can see the full request lifecycle — input, LLM reasoning, tool calls, and output."
- Click into one trace to show detail: latency breakdown, token usage, tool calls.
- Say: "We're tracking latency, token usage, cost per query, and tool selection accuracy. This gives us full visibility for debugging and improvement."
PART 4: Eval Suite (~1 min)
Scene 9 — Run Evals (3:05)
- Switch to terminal.
- Say: "The eval suite has 58 test cases across four categories: happy path, edge cases, adversarial inputs, and multi-step reasoning."
- Run:
cd ~/Projects/Gauntlet/ghostfolio SKIP_JUDGE=1 AUTH_TOKEN="<token>" npx tsx apps/api/src/app/endpoints/ai/eval/eval.ts - Wait for results. Should show ~55/58 passing (94.8%).
- Say: "55 out of 58 tests passing — 94.8% pass rate, above the 80% target. The suite tests tool selection, response coherence, safety refusals, hallucination detection, and multi-step reasoning."
PART 5: Wrap-Up (4:00)
Say: "To summarize what's been added since MVP: Langfuse observability with full request tracing and cost tracking. Three domain-specific verification checks — financial disclaimers, data-backed claim verification, and portfolio scope validation. And the eval suite expanded from 10 to 58 test cases across all required categories. The agent has 9 tools wrapping real Ghostfolio services, maintains conversation history, handles errors gracefully, and is deployed publicly. Thanks for watching."
Before Recording Checklist
- Railway deployment is up (visit URL to confirm)
- Langfuse dashboard has recent traces (run a query first to generate one)
- Browser open with no other tabs visible
- Terminal ready with eval command and AUTH_TOKEN set
- QuickTime set to record full screen
- You've done one silent dry run through the whole script