diff --git a/.github/agents/copilot-instructions.md b/.github/agents/copilot-instructions.md index 89c495b5e..f3100a9b4 100644 --- a/.github/agents/copilot-instructions.md +++ b/.github/agents/copilot-instructions.md @@ -28,9 +28,9 @@ TypeScript 5.9.2, Node.js ≥22.18.0: Follow standard conventions ## Recent Changes - 004-k1-scan-import: Added TypeScript 5.9.2, Node.js ≥ 22.18.0 + NestJS 11.x (backend), Angular 21.x (frontend), Prisma 6.x (ORM), pdf-parse (PDF text), @azure/ai-form-recognizer (cloud OCR), tesseract.js (local OCR fallback) +- 004-k1-scan-import: Added TypeScript 5.9.2, Node.js ≥ 22.18.0 + NestJS 11.x (backend), Angular 21.x (frontend), Prisma 6.x (ORM), pdf-parse (PDF text), @azure/ai-form-recognizer (cloud OCR), tesseract.js (local OCR fallback) - 003-portfolio-performance-views: Added TypeScript 5.9.2, Node.js >= 22.18.0 + Angular 21.1.1, NestJS 11.1.14, Angular Material 21.1.1, Prisma 6.19.0, big.js, date-fns 4.1.0 -- 001-family-office-transform: Added TypeScript 5.9.2, Node.js ≥22.18.0 + NestJS 11.1.14 (API), Angular 21.1.1 + Angular Material 21.1.1 (client), Prisma 6.19.0 (ORM), Nx 22.5.3 (monorepo), big.js (decimal math), date-fns 4.1.0, chart.js 4.5.1, Bull 4.16.5 (job queues), Redis (caching), yahoo-finance2 3.13.2 diff --git a/specs/004-k1-scan-import/contracts/k1-import-api.md b/specs/004-k1-scan-import/contracts/k1-import-api.md index eb9451548..f9eaedaf2 100644 --- a/specs/004-k1-scan-import/contracts/k1-import-api.md +++ b/specs/004-k1-scan-import/contracts/k1-import-api.md @@ -1,6 +1,6 @@ # API Contracts: K-1 Import -**Phase 1 Output** | **Date**: 2026-03-18 +**Phase 1 Output** | **Date**: 2026-03-18 | **Updated**: 2026-03-18 (post-clarification) ## Base Path @@ -136,7 +136,8 @@ Submit user-verified/edited extraction data. Transitions status from EXTRACTED t "numericValue": 52340, "confidence": 0.95, "confidenceLevel": "HIGH", - "isUserEdited": false + "isUserEdited": false, + "isReviewed": true }, { "boxNumber": "11", @@ -146,7 +147,19 @@ Submit user-verified/edited extraction data. Transitions status from EXTRACTED t "numericValue": 8200, "confidence": 0.72, "confidenceLevel": "MEDIUM", - "isUserEdited": true + "isUserEdited": true, + "isReviewed": true + } + ], + "unmappedItems": [ + { + "rawLabel": "State tax adjustment", + "rawValue": "$1,200", + "numericValue": 1200, + "confidence": 0.65, + "pageNumber": 3, + "resolution": "discarded", + "assignedBoxNumber": null } ] } @@ -160,6 +173,8 @@ Submit user-verified/edited extraction data. Transitions status from EXTRACTED t | ------ | ----------------------------------------------- | | 400 | Import session is not in EXTRACTED status | | 400 | Fields array is empty | +| 400 | Medium/low-confidence fields not all reviewed (isReviewed must be true) | +| 400 | Unmapped items not all resolved (each must be 'assigned' or 'discarded') | | 404 | Import session not found | --- @@ -379,3 +394,132 @@ Reset a partnership's cell mappings to IRS defaults (deletes all custom mappings | `partnershipId` | `string` | Yes | Partnership UUID | **Response**: `200 OK` + +--- + +## Aggregation Rule Endpoints + +### GET /api/v1/cell-mapping/aggregation-rules + +Get aggregation rules for a partnership (with global defaults for partnerships without custom rules). + +**Permission**: `readKDocument` + +**Query Parameters**: + +| Param | Type | Required | Description | +| --------------- | -------- | -------- | ---------------------------------------------- | +| `partnershipId` | `string` | No | Partnership UUID (omit for global defaults) | + +**Response**: `200 OK` + +```json +[ + { + "id": "uuid", + "partnershipId": null, + "name": "Total Ordinary Income", + "operation": "SUM", + "sourceCells": ["1"], + "sortOrder": 1 + }, + { + "id": "uuid", + "partnershipId": null, + "name": "Total Capital Gains", + "operation": "SUM", + "sourceCells": ["8", "9a", "9b", "9c", "10"], + "sortOrder": 2 + }, + { + "id": "uuid", + "partnershipId": null, + "name": "Total Deductions", + "operation": "SUM", + "sourceCells": ["12", "13"], + "sortOrder": 3 + } +] +``` + +--- + +### PUT /api/v1/cell-mapping/aggregation-rules + +Create or update aggregation rules for a partnership. + +**Permission**: `updateKDocument` + +**Request**: `application/json` + +```json +{ + "partnershipId": "uuid", + "rules": [ + { + "name": "Income Summary", + "operation": "SUM", + "sourceCells": ["1", "2", "3", "4b", "5", "6a", "7"] + }, + { + "name": "Total Capital Gains", + "operation": "SUM", + "sourceCells": ["8", "9a", "10"] + } + ] +} +``` + +**Response**: `200 OK` — Updated rules array + +**Errors**: + +| Status | Condition | +| ------ | ---------------------------------------------------- | +| 400 | Source cell box number not found in cell mappings | +| 400 | Duplicate rule name for the same partnership | +| 400 | Empty sourceCells array | + +--- + +### GET /api/v1/cell-mapping/aggregation-rules/compute + +Compute aggregation values for a specific KDocument's data. Returns the dynamically calculated totals. + +**Permission**: `readKDocument` + +**Query Parameters**: + +| Param | Type | Required | Description | +| --------------- | -------- | -------- | ---------------------------------------------- | +| `kDocumentId` | `string` | Yes | KDocument UUID to compute aggregates for | +| `partnershipId` | `string` | No | Override which partnership's rules to use | + +**Response**: `200 OK` + +```json +[ + { + "ruleId": "uuid", + "name": "Income Summary", + "operation": "SUM", + "sourceCells": ["1", "2", "3", "4b", "5", "6a", "7"], + "computedValue": 187520.00, + "breakdown": { + "1": 52340, + "2": 35000, + "3": 0, + "4b": 15000, + "5": 8200, + "6a": 72980, + "7": 4000 + } + } +] +``` + +**Errors**: + +| Status | Condition | +| ------ | --------------------------- | +| 404 | KDocument not found | diff --git a/specs/004-k1-scan-import/data-model.md b/specs/004-k1-scan-import/data-model.md index 419cef2db..94c0c04ac 100644 --- a/specs/004-k1-scan-import/data-model.md +++ b/specs/004-k1-scan-import/data-model.md @@ -1,10 +1,10 @@ # Data Model: K-1 PDF Scan Import -**Phase 1 Output** | **Date**: 2026-03-18 +**Phase 1 Output** | **Date**: 2026-03-18 | **Updated**: 2026-03-18 (post-clarification) ## Overview -This feature adds 2 new Prisma models and 1 new enum to support K-1 PDF scanning, import session tracking, and cell mapping configuration. It extends the existing models from spec 001-family-office-transform (KDocument, Distribution, Document, PartnershipMembership) with automatic creation from scanned data. +This feature adds 3 new Prisma models and 1 new enum to support K-1 PDF scanning, import session tracking, cell mapping configuration, and aggregation rules. It extends the existing models from spec 001-family-office-transform (KDocument, Distribution, Document, PartnershipMembership) with automatic creation from scanned data. ### Entity Relationship Diagram (Conceptual) @@ -18,9 +18,12 @@ User (existing) │ └── [K-1 allocations computed at confirm time] ├── KDocument[] (existing from 001) │ └── Distribution[] (auto-created from Box 19, existing from 001) - └── CellMapping[] (new model, per-partnership overrides) + ├── CellMapping[] (new model, per-partnership overrides) + └── CellAggregationRule[] (new model, per-partnership or global) + └── [computed totals derived dynamically from raw box values] Global CellMapping (partnershipId = null) ── IRS default box definitions +Global CellAggregationRule (partnershipId = null) ── default summary rules ``` ## New Enum @@ -93,16 +96,40 @@ A configuration defining how K-1 box numbers map to labels. Supports a global IR **Unique constraint**: `@@unique([partnershipId, boxNumber])` — one mapping per box per partnership (or per box globally when partnershipId is null). +### CellAggregationRule + +A named rule that combines multiple K-1 cells into a computed summary value. Computed totals are NOT stored — they are derived dynamically from raw box values each time they are displayed (FR-039). + +| Field | Type | Constraints | Description | +| --------------- | ---------- | -------------------------------------- | --------------------------------------------------------------- | +| `id` | `String` | PK, UUID, auto-generated | Unique identifier | +| `partnershipId` | `String?` | FK → Partnership.id, optional, indexed | Partnership this rule applies to (null = global default) | +| `name` | `String` | Required | Display name (e.g., "Income Summary", "Total Capital Gains") | +| `operation` | `String` | Required, Default: "SUM" | Aggregation operation (SUM for V1; future: AVG, MIN, MAX) | +| `sourceCells` | `Json` | Required | Array of box numbers to aggregate (e.g., ["1", "2", "3"]) | +| `sortOrder` | `Int` | Required | Display order in the aggregation summary section | +| `createdAt` | `DateTime` | Default: now() | Creation timestamp | +| `updatedAt` | `DateTime` | Auto-updated | Last modification timestamp | + +**Relations**: + +- `partnership` → `Partnership?` (many-to-one, optional, cascade delete) + +**Unique constraint**: `@@unique([partnershipId, name])` — one rule per name per partnership (or globally). + +**Note**: No `computedValue` column. Totals are always computed on-the-fly from the KDocument's raw box values using the `sourceCells` array and `operation`. This ensures summaries auto-update when underlying values change (e.g., estimated→final K-1 transition). + ## Modifications to Existing Models ### Partnership (from spec 001) Add back-references — no column changes: -| New Field | Type | Description | -| ----------------- | -------------------- | ------------------------------------ | -| `importSessions` | `K1ImportSession[]` | Import attempts for this partnership | -| `cellMappings` | `CellMapping[]` | Custom cell mapping configurations | +| New Field | Type | Description | +| -------------------- | ------------------------ | ------------------------------------ | +| `importSessions` | `K1ImportSession[]` | Import attempts for this partnership | +| `cellMappings` | `CellMapping[]` | Custom cell mapping configurations | +| `aggregationRules` | `CellAggregationRule[]` | Custom aggregation rule definitions | ### KDocument (from spec 001) @@ -131,9 +158,12 @@ interface K1ExtractionResult { isFinal: boolean; }; - /** Extracted box values */ + /** Extracted box values — mapped to known cells */ fields: K1ExtractedField[]; + /** Extracted values that didn't match any configured cell mapping */ + unmappedItems: K1UnmappedItem[]; + /** Overall extraction confidence (0.0–1.0) */ overallConfidence: number; @@ -168,6 +198,32 @@ interface K1ExtractedField { /** Whether user has manually edited this value */ isUserEdited: boolean; + + /** Whether user has explicitly reviewed this field (required for medium/low confidence) */ + isReviewed: boolean; +} + +interface K1UnmappedItem { + /** Raw text label extracted from the PDF */ + rawLabel: string; + + /** Raw text value extracted */ + rawValue: string; + + /** Parsed numeric value (null if unparseable) */ + numericValue: number | null; + + /** Confidence score (0.0–1.0) */ + confidence: number; + + /** Page number where this was extracted */ + pageNumber: number; + + /** User action: 'assigned' (to a cell), 'discarded', or null (pending) */ + resolution: 'assigned' | 'discarded' | null; + + /** If assigned, the box number it was assigned to */ + assignedBoxNumber: string | null; } ``` @@ -239,3 +295,6 @@ The standard box definitions seeded as global CellMapping records (partnershipId 6. **Confirmation prerequisites**: Can only confirm when status is VERIFIED, partnership has at least one active member, and verifiedData is not null. 7. **Duplicate KDocument check**: Before creating a KDocument, check for existing (partnershipId, type=K1, taxYear). If found, require explicit user decision (update existing or reject). 8. **Distribution allocation**: Box 19a/19b amounts are allocated to members by ownership percentage as of the tax year's fiscal year end. Allocation amounts must sum exactly to the partnership-level total (handle rounding by adjusting the largest member's allocation). +9. **Aggregation rule source cells**: All box numbers in `sourceCells` must reference valid cell mapping entries. If a source cell has no value in the KDocument, it contributes 0 to the aggregate. +10. **Unmapped items resolution**: All unmapped items must be resolved (assigned to a cell or discarded) before the import session can transition to VERIFIED status. +11. **Review requirement**: All medium and low-confidence fields must have `isReviewed: true` before confirmation is allowed (FR-035). High-confidence fields are auto-set to `isReviewed: true`. diff --git a/specs/004-k1-scan-import/plan.md b/specs/004-k1-scan-import/plan.md index c2baf2eec..8e948291d 100644 --- a/specs/004-k1-scan-import/plan.md +++ b/specs/004-k1-scan-import/plan.md @@ -5,7 +5,7 @@ ## Summary -Automated K-1 PDF scanning that extracts structured IRS Schedule K-1 (Form 1065) data from uploaded PDFs, presents a verification screen for manual review/correction, and auto-creates downstream model objects (KDocument, Distributions, member allocations, Document). Uses a two-tier extraction approach: `pdf-parse` for digital PDFs (free, instant, local) and Azure AI Document Intelligence / `tesseract.js` fallback for scanned PDFs. Supports per-partnership cell mapping customization and import history with re-processing. +Automated K-1 PDF scanning that extracts structured IRS Schedule K-1 (Form 1065) data from uploaded PDFs, presents a verification screen with auto-accepted high-confidence values and explicit review for medium/low-confidence fields, and auto-creates downstream model objects (KDocument, Distributions, member allocations, Document). Uses a two-tier extraction approach: `pdf-parse` for digital PDFs (free, instant, local) and Azure AI Document Intelligence / `tesseract.js` fallback for scanned PDFs. Supports per-partnership cell mapping customization, administrator-defined aggregation rules (dynamically computed summaries displayed on verification screen and KDocument detail view), an "Unmapped Items" section for unrecognized extractions, and import history with re-processing. ## Technical Context @@ -17,7 +17,7 @@ Automated K-1 PDF scanning that extracts structured IRS Schedule K-1 (Form 1065) **Project Type**: Web application (NestJS API + Angular SPA) — Nx monorepo **Performance Goals**: PDF extraction < 30 seconds (SC-001), model creation < 5 seconds (SC-005), 90%+ accuracy for digital PDFs (SC-002) **Constraints**: Self-hosted capable (Azure OCR optional), max PDF size 25 MB, K-1 Form 1065 only (V1) -**Scale/Scope**: Single family office (10–50 partnerships, 10–50 K-1s/year), 2 new API modules, 3 new frontend pages +**Scale/Scope**: Single family office (10–50 partnerships, 10–50 K-1s/year), 2 new API modules, 4 new frontend pages ## Constitution Check @@ -29,11 +29,11 @@ No constitution.md exists for this project. Gates assessed against standard engi |------|--------|-------| | No unnecessary dependencies | PASS | 3 new packages (`pdf-parse`, `@azure/ai-form-recognizer`, `tesseract.js`) — each serves a distinct, justified purpose per research.md | | Follows existing patterns | PASS | New NestJS modules follow existing controller/service/DTO pattern (mirrors `k-document`, `upload` modules) | -| No breaking changes | PASS | 2 new Prisma models + 1 enum, back-references only on existing models — no column changes | -| Test coverage | PASS | Unit tests for extractors, mapper, allocation; integration tests for full pipeline | +| No breaking changes | PASS | 3 new Prisma models + 1 enum, back-references only on existing models — no column changes | +| Test coverage | PASS | Unit tests for extractors, mapper, allocation, aggregation; integration tests for full pipeline | | Self-hosted compatible | PASS | Core extraction (pdf-parse) is fully local; Azure is optional with tesseract.js fallback | -**Post-Phase 1 re-check**: PASS — data model adds 2 models/1 enum, no existing schema changes beyond back-references. API contracts follow existing REST patterns. No violations identified. +**Post-Phase 1 re-check**: PASS — data model adds 3 models/1 enum (K1ImportSession, CellMapping, CellAggregationRule, K1ImportStatus). No existing schema changes beyond back-references. API contracts follow existing REST patterns. Aggregation rules are dynamically computed — no stored denormalization. No violations identified. ## Project Structure @@ -43,7 +43,7 @@ No constitution.md exists for this project. Gates assessed against standard engi specs/004-k1-scan-import/ ├── plan.md # This file ├── research.md # Phase 0: OCR provider research & decisions -├── data-model.md # Phase 1: K1ImportSession, CellMapping models +├── data-model.md # Phase 1: K1ImportSession, CellMapping, CellAggregationRule models ├── quickstart.md # Phase 1: Setup & dev guide ├── contracts/ │ └── k1-import-api.md # Phase 1: REST API contracts @@ -71,10 +71,11 @@ apps/api/src/app/ │ │ └── tesseract-extractor.ts │ ├── k1-field-mapper.service.ts │ ├── k1-allocation.service.ts -│ └── k1-confidence.service.ts +│ ├── k1-confidence.service.ts +│ └── k1-aggregation.service.ts # Dynamically computes aggregation summaries ├── cell-mapping/ │ ├── cell-mapping.module.ts -│ ├── cell-mapping.controller.ts +│ ├── cell-mapping.controller.ts # Cell mapping + aggregation rule CRUD │ └── cell-mapping.service.ts apps/client/src/app/ @@ -85,17 +86,19 @@ apps/client/src/app/ │ │ ├── k1-import-page.scss │ │ ├── k1-import-page.routes.ts │ │ ├── k1-verification/ -│ │ │ ├── k1-verification.component.ts +│ │ │ ├── k1-verification.component.ts # Mapped cells + unmapped items + aggregations │ │ │ ├── k1-verification.html │ │ │ └── k1-verification.scss │ │ └── k1-confirmation/ │ │ ├── k1-confirmation.component.ts │ │ ├── k1-confirmation.html │ │ └── k1-confirmation.scss -│ └── cell-mapping/ -│ ├── cell-mapping-page.component.ts -│ ├── cell-mapping-page.html -│ └── cell-mapping-page.routes.ts +│ ├── cell-mapping/ +│ │ ├── cell-mapping-page.component.ts # Cell mapping + aggregation rule config +│ │ ├── cell-mapping-page.html +│ │ └── cell-mapping-page.routes.ts +│ └── k-document/ # Existing page — extended +│ └── k-document-detail/ # Add aggregation summary section (FR-036) ├── services/ │ └── k1-import-data.service.ts @@ -109,7 +112,7 @@ libs/common/src/lib/ │ └── confirm-k1-import.dto.ts prisma/ -├── schema.prisma # + K1ImportSession, CellMapping, K1ImportStatus +├── schema.prisma # + K1ImportSession, CellMapping, CellAggregationRule, K1ImportStatus ├── migrations/ │ └── 2026XXXX_added_k1_import/ # New migration @@ -118,4 +121,4 @@ test/import/ └── sample-k1-scanned.pdf # Test fixture: scanned K-1 ``` -**Structure Decision**: Follows the existing Nx monorepo convention with new NestJS modules under `apps/api/src/app/` and new Angular pages under `apps/client/src/app/pages/`. Shared interfaces and DTOs in `libs/common/`. This mirrors the existing `k-document`, `upload`, and `family-office` module patterns. +**Structure Decision**: Follows the existing Nx monorepo convention with new NestJS modules under `apps/api/src/app/` and new Angular pages under `apps/client/src/app/pages/`. Shared interfaces and DTOs in `libs/common/`. This mirrors the existing `k-document`, `upload`, and `family-office` module patterns. The KDocument detail view is extended (not replaced) to display aggregation summaries. diff --git a/specs/004-k1-scan-import/quickstart.md b/specs/004-k1-scan-import/quickstart.md index 348f51739..395665f5e 100644 --- a/specs/004-k1-scan-import/quickstart.md +++ b/specs/004-k1-scan-import/quickstart.md @@ -1,6 +1,6 @@ # Quickstart: K-1 PDF Scan Import -**Phase 1 Output** | **Date**: 2026-03-18 +**Phase 1 Output** | **Date**: 2026-03-18 | **Updated**: 2026-03-18 (post-clarification) ## Prerequisites @@ -28,7 +28,7 @@ npm install -D @types/pdf-parse ## Database Migration -After adding the new Prisma models (`K1ImportSession`, `CellMapping`, `K1ImportStatus` enum): +After adding the new Prisma models (`K1ImportSession`, `CellMapping`, `CellAggregationRule`, `K1ImportStatus` enum): ```bash npx prisma db push # Development: sync schema @@ -36,7 +36,7 @@ npx prisma db push # Development: sync schema npx prisma migrate dev # Create a migration file ``` -Seed the default IRS cell mappings (28 rows with partnershipId = null) via the existing seed mechanism or a dedicated seed script. +Seed the default IRS cell mappings (28 rows with partnershipId = null) and default aggregation rules (e.g., "Total Ordinary Income", "Total Capital Gains", "Total Deductions") via the existing seed mechanism or a dedicated seed script. ## Key Files to Create @@ -58,12 +58,13 @@ app/k1-import/ │ └── tesseract-extractor.ts # Tier 2 fallback: tesseract.js OCR ├── k1-field-mapper.service.ts # Maps raw extraction → K1ExtractedField[] ├── k1-allocation.service.ts # Allocates K-1 amounts to members by ownership % -└── k1-confidence.service.ts # Computes confidence scores with validation heuristics +├── k1-confidence.service.ts # Computes confidence scores with validation heuristics +└── k1-aggregation.service.ts # Dynamically computes aggregation summaries from rules app/cell-mapping/ ├── cell-mapping.module.ts # NestJS module -├── cell-mapping.controller.ts # CRUD for cell mappings -└── cell-mapping.service.ts # Cell mapping business logic + seed data +├── cell-mapping.controller.ts # CRUD for cell mappings + aggregation rules +└── cell-mapping.service.ts # Cell mapping + aggregation rule business logic + seed data ``` ### Shared Types (libs/common/src/lib/) @@ -87,7 +88,7 @@ pages/k1-import/ ├── k1-import-page.scss ├── k1-import-page.routes.ts ├── k1-verification/ -│ ├── k1-verification.component.ts # Verification/edit screen +│ ├── k1-verification.component.ts # Verification/edit screen (mapped + unmapped + aggregations) │ ├── k1-verification.html │ └── k1-verification.scss └── k1-confirmation/ @@ -96,7 +97,7 @@ pages/k1-import/ └── k1-confirmation.scss pages/cell-mapping/ -├── cell-mapping-page.component.ts # Cell mapping configuration UI +├── cell-mapping-page.component.ts # Cell mapping + aggregation rule configuration UI ├── cell-mapping-page.html └── cell-mapping-page.routes.ts @@ -108,13 +109,18 @@ services/ 1. **Upload**: User selects PDF → `POST /api/v1/k1-import/upload` → session created with status PROCESSING 2. **Extract**: Backend detects PDF type (digital vs. scanned) → routes to appropriate extractor → status becomes EXTRACTED -3. **Review**: Frontend polls/fetches session → displays verification screen with extracted fields, confidence indicators -4. **Edit**: User corrects values, overrides labels → `PUT /api/v1/k1-import/:id/verify` → status becomes VERIFIED +3. **Review**: Frontend polls/fetches session → displays verification screen with: + - **Mapped cells**: extracted fields with confidence indicators. High-confidence values are pre-accepted. Medium/low-confidence values require explicit review (acknowledge or edit). + - **Unmapped items**: separate section for values that didn't match any cell. User assigns to a cell or discards. + - **Aggregation summaries**: dynamically computed from mapped values using aggregation rules. Recalculate live when cell values are edited. +4. **Verify**: User reviews all medium/low fields and resolves unmapped items → `PUT /api/v1/k1-import/:id/verify` → status becomes VERIFIED 5. **Confirm**: User clicks "Confirm & Save" → `POST /api/v1/k1-import/:id/confirm` → KDocument + Distributions + Document created → status becomes CONFIRMED ## Testing Strategy -- **Unit tests**: Extractors (pdf-parse, azure, tesseract), field mapper, confidence scoring, allocation math +- **Unit tests**: Extractors (pdf-parse, azure, tesseract), field mapper, confidence scoring, allocation math, aggregation computation - **Integration tests**: Full upload → extract → verify → confirm flow with test PDF fixtures - **Test fixtures**: Include sample K-1 PDFs (digital and scanned) in `test/import/` directory - **Allocation accuracy**: Verify rounding behavior — allocated amounts must sum exactly to partnership total +- **Aggregation tests**: Verify dynamic computation from rules, auto-recalculation on value edit, behavior when source cells are empty +- **Review enforcement**: Verify confirmation blocked when medium/low-confidence fields not reviewed or unmapped items unresolved diff --git a/specs/004-k1-scan-import/research.md b/specs/004-k1-scan-import/research.md index 17be04f4c..976dc8c50 100644 --- a/specs/004-k1-scan-import/research.md +++ b/specs/004-k1-scan-import/research.md @@ -152,3 +152,54 @@ AZURE_DOCUMENT_INTELLIGENCE_KEY — Azure API key **Alternatives Considered**: - `pdfjs-dist` directly instead of `pdf-parse` — more boilerplate, `pdf-parse` wraps it with a simpler API - Only cloud OCR — loses the self-hosted story and adds cost for digital PDFs + +--- + +## Decision 10: Cell Aggregation Rules — Dynamic Computation + +**Decision**: Persist only aggregation rule definitions (name, source cells, operation). Compute totals dynamically from raw K-1 box values at display time. Do NOT store computed totals. + +**Rationale**: +- K-1 values can change during the import lifecycle (estimated → final transitions, manual edits after confirmation) +- Storing computed totals creates a denormalization risk — stale aggregates when underlying values change +- Computation is trivial (summing a handful of numbers) with no performance concern at family office scale +- Keeps a single source of truth: the raw box values in K1Data +- Aggregation rules are displayed on both the verification screen (FR-033) and KDocument detail view (FR-036) + +**Alternatives Considered**: +- Persist computed totals alongside raw data — creates stale data risk, requires update triggers +- Persist both (snapshot + live) for audit — adds complexity V1 doesn't need; audit trail exists in import session history + +--- + +## Decision 11: Unmapped Items Handling + +**Decision**: Display extracted values that don't match any configured cell mapping in a separate "Unmapped Items" section on the verification screen. Administrator can assign to an existing cell, create a new custom cell, or discard. + +**Rationale**: +- OCR/extraction may pull supplemental schedule items, footnotes, state-specific addenda +- Silently discarding loses potentially important data +- Auto-creating cells for every unmatched value creates noise +- Explicit user decision preserves data integrity while keeping mapped cells clean +- Assigned unmapped items update the cell mapping for future imports (learning effect) + +**Alternatives Considered**: +- Silent discard — loses data, violates user's expectation of completeness +- Auto-create custom cells — too noisy; PDF footnotes and headers would create junk cells + +--- + +## Decision 12: Verification Auto-Accept Strategy + +**Decision**: Auto-accept (pre-check) high-confidence values on the verification screen. Require explicit review (acknowledge or edit) for medium and low-confidence values before allowing confirmation. + +**Rationale**: +- V1 is "partially manual, partially automated" per user intent +- High-confidence values (≥ 0.85) from digital PDFs are reliably accurate (90%+ per SC-002) +- Forcing explicit review of every cell wastes time on correct values +- Blocking confirmation until medium/low-confidence fields are reviewed catches the errors +- All values remain visible and editable — user can override any pre-accepted value + +**Alternatives Considered**: +- Every cell requires explicit accept — too slow for 15+ fields, doesn't match "partially automated" intent +- Spot-check model (everything auto-accepted) — too risky for tax data; OCR errors would go unreviewed