---
description: Ghostfolio CSV Import Auditor project specification and requirements
globs:
alwaysApply: true
---

# Ghostfolio CSV Import Auditor — Project Spec

## What We're Building

A **CSV Import Auditor Reasoning Agent** for the Ghostfolio finance platform that ensures safe, deterministic, and verifiable transaction imports.

## Use Cases

1. Parsing broker CSV exports
2. Mapping broker-specific fields to Ghostfolio schema
3. Validating transaction correctness
4. Detecting duplicates and conflicts
5. Generating a structured preview report
6. Safely committing transactions to the database

## Architecture

- **Backend**: TypeScript + NestJS (Ghostfolio native)
- **Database**: PostgreSQL (Railway)
- **Validation**: Zod schemas
- **Agent Orchestration**: Custom lightweight controller (no LangChain/LangGraph/CrewAI)
- **LLM**: Reasoning-capable model with tool/function calling
- **Observability**: Langfuse
- **Evaluation**: Deterministic + LLM-as-Judge (dual-layer)
- **CI**: Eval gating
- **Deployment**: Docker-compatible Ghostfolio backend

## Agent Pipeline (6 Tools)

```
CSV → parseCSV → mapBrokerFields → validateTransactions → detectDuplicates → previewImportReport → commitImport
```

Each tool has: strict Zod input/output schemas, deterministic behavior, explicit error handling, no hidden side effects. All tools are pure except `commitImport()`.

## Verification Requirements (Non-Negotiable)

- Schema validation (required fields, types, formats)
- Accounting invariants (quantity >= 0, commission >= 0, price logic)
- Currency consistency
- Duplicate prevention (idempotency)
- Deterministic normalization
- Transactional database commits
- Before/after state signature validation
- No commit occurs unless ALL deterministic checks pass

## Human-in-the-Loop

- Always generate `previewImportReport()` before commit
- Require explicit user confirmation
- Never auto-commit without approval

## LLM Usage (Cost-Efficient)

- Parsing and validation are deterministic code (no LLM)
- LLM used ONLY for: broker column mapping suggestions + human-readable preview explanation
- Maximum 1-2 LLM calls per import
- No per-row inference

## Performance Targets

- Small CSV: < 2 seconds
- Large CSV: < 10-15 seconds
- Progress feedback required
- Dozens of concurrent imports supported

## Evaluation Requirements

### Layer 1: Deterministic (Primary)
- Schema correctness, numeric invariants, duplicate detection, idempotency, state signature comparison
- Minimum 50 test CSV cases, pass/fail enforced in CI

### Layer 2: LLM-as-a-Judge (Secondary)
- Mapping explanation clarity (1-5), logical completeness (1-5), issue detection completeness (1-5), hallucination presence (0/1), overall quality (1-10)
- Judge does NOT control commit, only scores language outputs

## Observability (Langfuse)

Per import: one trace (`agent.import_audit`), one span per tool, latency per tool, token usage, `toolCountPlanned` vs `toolCountExecuted`, validation result, duplicate detection metrics, commit success/failure.

## Security

- CSV treated as data only
- Prompt injection prevention
- No arbitrary CSV content in prompts
- API keys secured server-side
- Audit logging enforced

## Testing Strategy

- Unit tests per tool
- Integration tests for full pipeline
- Adversarial malformed CSV tests
- Regression test suite in CI
- Deterministic eval dataset

## Deadlines

| Checkpoint | Deadline | Focus |
|---|---|---|
| Pre-Search | 2 hours after receiving | Architecture, Plan |
| MVP | Tuesday (24 hours) | Basic agent with tool use |
| Early Submission | Friday (4 days) | Eval framework + observability |
| Final | Sunday (7 days) | Production-ready + open source |

## MVP Checklist

- Agent responds to natural language queries in finance domain
- At least 3 functional tools the agent can invoke
- Tool calls execute successfully and return structured results
- Agent synthesizes tool results into coherent responses
- Conversation history maintained across turns
- Basic error handling (graceful failure, not crashes)
- At least one domain-specific verification check
- Simple evaluation: 5+ test cases with expected outcomes
- Deployed and publicly accessible