mirror of https://github.com/ghostfolio/ghostfolio
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
152 lines
4.4 KiB
152 lines
4.4 KiB
---
|
|
description: Product strategy, vision, and accountability for the Ghostfolio CSV Import Auditor reasoning agent
|
|
globs:
|
|
alwaysApply: true
|
|
---
|
|
|
|
# Ghostfolio CSV Import Auditor — Product Strategy
|
|
|
|
## Product Vision
|
|
|
|
For Ghostfolio users managing personal investment portfolios who need reliable, safe, and verifiable transaction imports, the CSV Import Auditor is a deterministic reasoning agent embedded in Ghostfolio that validates, audits, previews, and safely commits broker CSV imports.
|
|
|
|
**Core Purpose**: Eliminate financial data corruption during portfolio imports.
|
|
|
|
**Differentiation**: Unlike manual CSV imports or loosely validated parsers, this product guarantees schema correctness, duplicate prevention, accounting invariants, deterministic state verification, and transactional safety before any database mutation.
|
|
|
|
## Insights
|
|
|
|
### Market
|
|
|
|
- Ghostfolio serves thousands of personal portfolio users
|
|
- CSV import reliability is a real friction point
|
|
- Imports mutate persistent financial state — high trust requirement
|
|
- Most portfolio tools provide basic parsers without deterministic validation, transactional state verification, or idempotency protection
|
|
|
|
### Trends
|
|
|
|
- Movement toward AI-assisted financial tooling
|
|
- Demand for explainability and auditability
|
|
- Increased distrust in opaque AI automation
|
|
- Shift toward deterministic + AI hybrid systems
|
|
|
|
### Customer Personas
|
|
|
|
**Primary — DIY Investor**: Imports broker exports, low tolerance for data corruption, wants preview + confirmation before commit.
|
|
|
|
**Secondary — Power User**: Imports from multiple brokers, needs repeatable idempotent workflow.
|
|
|
|
**Core need**: Correctness > speed.
|
|
|
|
## Challenges
|
|
|
|
### Technical
|
|
|
|
- Broker CSV variability
|
|
- Deterministic schema validation
|
|
- Duplicate detection accuracy
|
|
- Safe transactional DB writes
|
|
- State mutation protection
|
|
|
|
### Customer Pain Points
|
|
|
|
- Fear of corrupting portfolio data
|
|
- Confusing CSV error messages
|
|
- Duplicate transactions after re-import
|
|
- Lack of preview transparency
|
|
|
|
### GTM Risks
|
|
|
|
- PR acceptance into Ghostfolio OSS repo
|
|
- Must integrate without architectural disruption
|
|
- Must demonstrate reliability via eval data
|
|
|
|
### Compliance
|
|
|
|
- Financial data integrity expectations
|
|
- Audit logging required
|
|
- Must prevent partial writes
|
|
|
|
## Architecture & Approach
|
|
|
|
### Pipeline (6 Tools)
|
|
|
|
```
|
|
parseCSV → mapBrokerFields → validateTransactions → detectDuplicates → previewImportReport → commitImport
|
|
```
|
|
|
|
- Custom deterministic orchestrator (no LangChain/LangGraph/CrewAI)
|
|
- Strict Zod schemas, deterministic outputs, explicit error handling
|
|
- Only `commitImport` mutates state
|
|
|
|
### Verification Layer (Non-Negotiable)
|
|
|
|
No commit without 100% deterministic pass on ALL checks:
|
|
|
|
1. Schema validation (required fields, types, formats)
|
|
2. Numeric invariants (quantity >= 0, commission >= 0, price logic)
|
|
3. Currency consistency
|
|
4. Duplicate prevention (idempotency)
|
|
5. Deterministic state signature
|
|
6. Transactional DB boundary
|
|
|
|
### Observability (Langfuse)
|
|
|
|
Per import: one trace (`agent.import_audit`), one span per tool, `toolCountPlanned` vs `toolCountExecuted`, latency per tool, token usage, commit status, validation metrics.
|
|
|
|
### Evaluation Strategy
|
|
|
|
**Layer 1 — Deterministic (Primary)**: 50+ CSV test cases, schema + invariant enforcement, idempotency validation, CI gated.
|
|
|
|
**Layer 2 — LLM-as-Judge (Secondary)**: Scores explanation clarity, logical completeness, issue detection quality, hallucination presence. Judge never controls commit.
|
|
|
|
## Do's and Don'ts
|
|
|
|
### Do
|
|
|
|
- Deterministic before AI
|
|
- Verification before mutation
|
|
- Audit everything
|
|
- Eval-gated release
|
|
|
|
### Don't
|
|
|
|
- No auto-commit without user approval
|
|
- No per-row LLM inference
|
|
- No silent partial writes
|
|
- No hidden tool side effects
|
|
|
|
## Accountability
|
|
|
|
### North Star Metric
|
|
|
|
**Zero corrupted imports.**
|
|
|
|
### Success Metrics
|
|
|
|
| Metric | Target |
|
|
|---|---|
|
|
| Tool success rate | >95% |
|
|
| Deterministic validation pass rate | 100% |
|
|
| Eval pass rate | >80% |
|
|
| Duplicate detection accuracy | >95% |
|
|
| Small CSV latency | <2s |
|
|
| Large CSV latency | <15s |
|
|
| Hallucination rate (preview) | <5% |
|
|
|
|
### Operational Metrics
|
|
|
|
- Import failure reasons categorized
|
|
- State signature verification rate
|
|
- Commit rollback occurrences
|
|
- Mapping confidence scores
|
|
|
|
## Strategic Positioning
|
|
|
|
This is NOT an AI CSV parser. This is:
|
|
|
|
- A **deterministic financial safety layer**
|
|
- A **structured reasoning agent**
|
|
- An **audit-first import system**
|
|
- A **verification-gated mutation controller**
|
|
- An **eval-driven OSS contribution**
|
|
|