plan: update K-1 scan import plan with clarification decisions

Post-clarification updates (5 decisions integrated): - research.md: +3 decisions (aggregation, unmapped items, auto-accept) - data-model.md: +CellAggregationRule model, K1UnmappedItem interface, isReviewed field - contracts/k1-import-api.md: +3 aggregation rule endpoints, unmapped items in verify - quickstart.md: +aggregation service, updated workflow, updated tests - plan.md: rebuilt with updated summary, 3 models/1 enum, extended project structure - copilot-instructions.md: agent context refreshed
4 months ago · a759b94ada
6 changed files with 301 additions and 38 deletions
--- a/.github/agents/copilot-instructions.md
+++ b/.github/agents/copilot-instructions.md
@ -28,9 +28,9 @@ TypeScript 5.9.2, Node.js ≥22.18.0: Follow standard conventions
 ## Recent Changes
 - 004-k1-scan-import: Added TypeScript 5.9.2, Node.js ≥ 22.18.0 + NestJS 11.x (backend), Angular 21.x (frontend), Prisma 6.x (ORM), pdf-parse (PDF text), @azure/ai-form-recognizer (cloud OCR), tesseract.js (local OCR fallback)
 - 004-k1-scan-import: Added TypeScript 5.9.2, Node.js ≥ 22.18.0 + NestJS 11.x (backend), Angular 21.x (frontend), Prisma 6.x (ORM), pdf-parse (PDF text), @azure/ai-form-recognizer (cloud OCR), tesseract.js (local OCR fallback)
 - 003-portfolio-performance-views: Added TypeScript 5.9.2, Node.js >= 22.18.0 + Angular 21.1.1, NestJS 11.1.14, Angular Material 21.1.1, Prisma 6.19.0, big.js, date-fns 4.1.0
 - 001-family-office-transform: Added TypeScript 5.9.2, Node.js ≥22.18.0 + NestJS 11.1.14 (API), Angular 21.1.1 + Angular Material 21.1.1 (client), Prisma 6.19.0 (ORM), Nx 22.5.3 (monorepo), big.js (decimal math), date-fns 4.1.0, chart.js 4.5.1, Bull 4.16.5 (job queues), Redis (caching), yahoo-finance2 3.13.2
 <!-- MANUAL ADDITIONS START -->
 <!-- MANUAL ADDITIONS END -->
--- a/specs/004-k1-scan-import/contracts/k1-import-api.md
+++ b/specs/004-k1-scan-import/contracts/k1-import-api.md
@ -1,6 +1,6 @@
 # API Contracts: K-1 Import
-**Phase 1 Output** | **Date**: 2026-03-18
+**Phase 1 Output** | **Date**: 2026-03-18 | **Updated**: 2026-03-18 (post-clarification)
 ## Base Path
@ -136,7 +136,8 @@ Submit user-verified/edited extraction data. Transitions status from EXTRACTED t
      "numericValue": 52340,
      "confidence": 0.95,
      "confidenceLevel": "HIGH",
-      "isUserEdited": false
+      "isUserEdited": false,
      "isReviewed": true
    },
    {
      "boxNumber": "11",
@ -146,7 +147,19 @@ Submit user-verified/edited extraction data. Transitions status from EXTRACTED t
      "numericValue": 8200,
      "confidence": 0.72,
      "confidenceLevel": "MEDIUM",
-      "isUserEdited": true
+      "isUserEdited": true,
      "isReviewed": true
    }
  ],
  "unmappedItems": [
    {
      "rawLabel": "State tax adjustment",
      "rawValue": "$1,200",
      "numericValue": 1200,
      "confidence": 0.65,
      "pageNumber": 3,
      "resolution": "discarded",
      "assignedBoxNumber": null
    }
  ]
 }
@ -160,6 +173,8 @@ Submit user-verified/edited extraction data. Transitions status from EXTRACTED t
 | ------ | ----------------------------------------------- |
 | 400    | Import session is not in EXTRACTED status        |
 | 400    | Fields array is empty                            |
 | 400    | Medium/low-confidence fields not all reviewed (isReviewed must be true) |
 | 400    | Unmapped items not all resolved (each must be 'assigned' or 'discarded') |
 | 404    | Import session not found                         |
 ---
@ -379,3 +394,132 @@ Reset a partnership's cell mappings to IRS defaults (deletes all custom mappings
 | `partnershipId` | `string` | Yes      | Partnership UUID   |
 **Response**: `200 OK`
 ---
 ## Aggregation Rule Endpoints
 ### GET /api/v1/cell-mapping/aggregation-rules
 Get aggregation rules for a partnership (with global defaults for partnerships without custom rules).
 **Permission**: `readKDocument`
 **Query Parameters**:
 | Param           | Type     | Required | Description                                    |
 | --------------- | -------- | -------- | ---------------------------------------------- |
 | `partnershipId` | `string` | No       | Partnership UUID (omit for global defaults)     |
 **Response**: `200 OK`
 ```json
 [
  {
    "id": "uuid",
    "partnershipId": null,
    "name": "Total Ordinary Income",
    "operation": "SUM",
    "sourceCells": ["1"],
    "sortOrder": 1
  },
  {
    "id": "uuid",
    "partnershipId": null,
    "name": "Total Capital Gains",
    "operation": "SUM",
    "sourceCells": ["8", "9a", "9b", "9c", "10"],
    "sortOrder": 2
  },
  {
    "id": "uuid",
    "partnershipId": null,
    "name": "Total Deductions",
    "operation": "SUM",
    "sourceCells": ["12", "13"],
    "sortOrder": 3
  }
 ]
 ```
 ---
 ### PUT /api/v1/cell-mapping/aggregation-rules
 Create or update aggregation rules for a partnership.
 **Permission**: `updateKDocument`
 **Request**: `application/json`
 ```json
 {
  "partnershipId": "uuid",
  "rules": [
    {
      "name": "Income Summary",
      "operation": "SUM",
      "sourceCells": ["1", "2", "3", "4b", "5", "6a", "7"]
    },
    {
      "name": "Total Capital Gains",
      "operation": "SUM",
      "sourceCells": ["8", "9a", "10"]
    }
  ]
 }
 ```
 **Response**: `200 OK` — Updated rules array
 **Errors**:
 | Status | Condition                                            |
 | ------ | ---------------------------------------------------- |
 | 400    | Source cell box number not found in cell mappings     |
 | 400    | Duplicate rule name for the same partnership          |
 | 400    | Empty sourceCells array                               |
 ---
 ### GET /api/v1/cell-mapping/aggregation-rules/compute
 Compute aggregation values for a specific KDocument's data. Returns the dynamically calculated totals.
 **Permission**: `readKDocument`
 **Query Parameters**:
 | Param           | Type     | Required | Description                                    |
 | --------------- | -------- | -------- | ---------------------------------------------- |
 | `kDocumentId`   | `string` | Yes      | KDocument UUID to compute aggregates for        |
 | `partnershipId` | `string` | No       | Override which partnership's rules to use        |
 **Response**: `200 OK`
 ```json
 [
  {
    "ruleId": "uuid",
    "name": "Income Summary",
    "operation": "SUM",
    "sourceCells": ["1", "2", "3", "4b", "5", "6a", "7"],
    "computedValue": 187520.00,
    "breakdown": {
      "1": 52340,
      "2": 35000,
      "3": 0,
      "4b": 15000,
      "5": 8200,
      "6a": 72980,
      "7": 4000
    }
  }
 ]
 ```
 **Errors**:
 | Status | Condition                   |
 | ------ | --------------------------- |
 | 404    | KDocument not found         |
--- a/specs/004-k1-scan-import/data-model.md
+++ b/specs/004-k1-scan-import/data-model.md
@ -1,10 +1,10 @@
 # Data Model: K-1 PDF Scan Import
-**Phase 1 Output** | **Date**: 2026-03-18
+**Phase 1 Output** | **Date**: 2026-03-18 | **Updated**: 2026-03-18 (post-clarification)
 ## Overview
-This feature adds 2 new Prisma models and 1 new enum to support K-1 PDF scanning, import session tracking, and cell mapping configuration. It extends the existing models from spec 001-family-office-transform (KDocument, Distribution, Document, PartnershipMembership) with automatic creation from scanned data.
+This feature adds 3 new Prisma models and 1 new enum to support K-1 PDF scanning, import session tracking, cell mapping configuration, and aggregation rules. It extends the existing models from spec 001-family-office-transform (KDocument, Distribution, Document, PartnershipMembership) with automatic creation from scanned data.
 ### Entity Relationship Diagram (Conceptual)
@ -18,9 +18,12 @@ User (existing)
      │    └── [K-1 allocations computed at confirm time]
      ├── KDocument[] (existing from 001)
      │    └── Distribution[] (auto-created from Box 19, existing from 001)
-      └── CellMapping[] (new model, per-partnership overrides)
+      ├── CellMapping[] (new model, per-partnership overrides)
      └── CellAggregationRule[] (new model, per-partnership or global)
           └── [computed totals derived dynamically from raw box values]
 Global CellMapping (partnershipId = null) ── IRS default box definitions
 Global CellAggregationRule (partnershipId = null) ── default summary rules
 ```
 ## New Enum
@ -93,6 +96,29 @@ A configuration defining how K-1 box numbers map to labels. Supports a global IR
 **Unique constraint**: `@@unique([partnershipId, boxNumber])` — one mapping per box per partnership (or per box globally when partnershipId is null).
 ### CellAggregationRule
 A named rule that combines multiple K-1 cells into a computed summary value. Computed totals are NOT stored — they are derived dynamically from raw box values each time they are displayed (FR-039).
 | Field           | Type       | Constraints                            | Description                                                     |
 | --------------- | ---------- | -------------------------------------- | --------------------------------------------------------------- |
 | `id`            | `String`   | PK, UUID, auto-generated               | Unique identifier                                               |
 | `partnershipId` | `String?`  | FK → Partnership.id, optional, indexed | Partnership this rule applies to (null = global default)        |
 | `name`          | `String`   | Required                               | Display name (e.g., "Income Summary", "Total Capital Gains")    |
 | `operation`     | `String`   | Required, Default: "SUM"               | Aggregation operation (SUM for V1; future: AVG, MIN, MAX)       |
 | `sourceCells`   | `Json`     | Required                               | Array of box numbers to aggregate (e.g., ["1", "2", "3"])      |
 | `sortOrder`     | `Int`      | Required                               | Display order in the aggregation summary section                |
 | `createdAt`     | `DateTime` | Default: now()                         | Creation timestamp                                              |
 | `updatedAt`     | `DateTime` | Auto-updated                           | Last modification timestamp                                     |
 **Relations**:
 - `partnership` → `Partnership?` (many-to-one, optional, cascade delete)
 **Unique constraint**: `@@unique([partnershipId, name])` — one rule per name per partnership (or globally).
 **Note**: No `computedValue` column. Totals are always computed on-the-fly from the KDocument's raw box values using the `sourceCells` array and `operation`. This ensures summaries auto-update when underlying values change (e.g., estimated→final K-1 transition).
 ## Modifications to Existing Models
 ### Partnership (from spec 001)
@ -100,9 +126,10 @@ A configuration defining how K-1 box numbers map to labels. Supports a global IR
 Add back-references — no column changes:
 | New Field            | Type                     | Description                          |
-| ----------------- | -------------------- | ------------------------------------ |
+| -------------------- | ------------------------ | ------------------------------------ |
 | `importSessions`     | `K1ImportSession[]`      | Import attempts for this partnership |
 | `cellMappings`       | `CellMapping[]`          | Custom cell mapping configurations   |
 | `aggregationRules`   | `CellAggregationRule[]`  | Custom aggregation rule definitions  |
 ### KDocument (from spec 001)
@ -131,9 +158,12 @@ interface K1ExtractionResult {
    isFinal: boolean;
  };
-  /** Extracted box values */
+  /** Extracted box values — mapped to known cells */
  fields: K1ExtractedField[];
  /** Extracted values that didn't match any configured cell mapping */
  unmappedItems: K1UnmappedItem[];
  /** Overall extraction confidence (0.0–1.0) */
  overallConfidence: number;
@ -168,6 +198,32 @@ interface K1ExtractedField {
  /** Whether user has manually edited this value */
  isUserEdited: boolean;
  /** Whether user has explicitly reviewed this field (required for medium/low confidence) */
  isReviewed: boolean;
 }
 interface K1UnmappedItem {
  /** Raw text label extracted from the PDF */
  rawLabel: string;
  /** Raw text value extracted */
  rawValue: string;
  /** Parsed numeric value (null if unparseable) */
  numericValue: number | null;
  /** Confidence score (0.0–1.0) */
  confidence: number;
  /** Page number where this was extracted */
  pageNumber: number;
  /** User action: 'assigned' (to a cell), 'discarded', or null (pending) */
  resolution: 'assigned' | 'discarded' | null;
  /** If assigned, the box number it was assigned to */
  assignedBoxNumber: string | null;
 }
 ```
@ -239,3 +295,6 @@ The standard box definitions seeded as global CellMapping records (partnershipId
 6. **Confirmation prerequisites**: Can only confirm when status is VERIFIED, partnership has at least one active member, and verifiedData is not null.
 7. **Duplicate KDocument check**: Before creating a KDocument, check for existing (partnershipId, type=K1, taxYear). If found, require explicit user decision (update existing or reject).
 8. **Distribution allocation**: Box 19a/19b amounts are allocated to members by ownership percentage as of the tax year's fiscal year end. Allocation amounts must sum exactly to the partnership-level total (handle rounding by adjusting the largest member's allocation).
 9. **Aggregation rule source cells**: All box numbers in `sourceCells` must reference valid cell mapping entries. If a source cell has no value in the KDocument, it contributes 0 to the aggregate.
 10. **Unmapped items resolution**: All unmapped items must be resolved (assigned to a cell or discarded) before the import session can transition to VERIFIED status.
 11. **Review requirement**: All medium and low-confidence fields must have `isReviewed: true` before confirmation is allowed (FR-035). High-confidence fields are auto-set to `isReviewed: true`.
--- a/specs/004-k1-scan-import/plan.md
+++ b/specs/004-k1-scan-import/plan.md
@ -5,7 +5,7 @@
 ## Summary
-Automated K-1 PDF scanning that extracts structured IRS Schedule K-1 (Form 1065) data from uploaded PDFs, presents a verification screen for manual review/correction, and auto-creates downstream model objects (KDocument, Distributions, member allocations, Document). Uses a two-tier extraction approach: `pdf-parse` for digital PDFs (free, instant, local) and Azure AI Document Intelligence / `tesseract.js` fallback for scanned PDFs. Supports per-partnership cell mapping customization and import history with re-processing.
+Automated K-1 PDF scanning that extracts structured IRS Schedule K-1 (Form 1065) data from uploaded PDFs, presents a verification screen with auto-accepted high-confidence values and explicit review for medium/low-confidence fields, and auto-creates downstream model objects (KDocument, Distributions, member allocations, Document). Uses a two-tier extraction approach: `pdf-parse` for digital PDFs (free, instant, local) and Azure AI Document Intelligence / `tesseract.js` fallback for scanned PDFs. Supports per-partnership cell mapping customization, administrator-defined aggregation rules (dynamically computed summaries displayed on verification screen and KDocument detail view), an "Unmapped Items" section for unrecognized extractions, and import history with re-processing.
 ## Technical Context
@ -17,7 +17,7 @@ Automated K-1 PDF scanning that extracts structured IRS Schedule K-1 (Form 1065)
 **Project Type**: Web application (NestJS API + Angular SPA) — Nx monorepo
 **Performance Goals**: PDF extraction < 30 seconds (SC-001), model creation < 5 seconds (SC-005), 90%+ accuracy for digital PDFs (SC-002)
 **Constraints**: Self-hosted capable (Azure OCR optional), max PDF size 25 MB, K-1 Form 1065 only (V1)
-**Scale/Scope**: Single family office (10–50 partnerships, 10–50 K-1s/year), 2 new API modules, 3 new frontend pages
+**Scale/Scope**: Single family office (10–50 partnerships, 10–50 K-1s/year), 2 new API modules, 4 new frontend pages
 ## Constitution Check
@ -29,11 +29,11 @@ No constitution.md exists for this project. Gates assessed against standard engi
 |------|--------|-------|
 | No unnecessary dependencies | PASS | 3 new packages (`pdf-parse`, `@azure/ai-form-recognizer`, `tesseract.js`) — each serves a distinct, justified purpose per research.md |
 | Follows existing patterns | PASS | New NestJS modules follow existing controller/service/DTO pattern (mirrors `k-document`, `upload` modules) |
-| No breaking changes | PASS | 2 new Prisma models + 1 enum, back-references only on existing models — no column changes |
+| No breaking changes | PASS | 3 new Prisma models + 1 enum, back-references only on existing models — no column changes |
-| Test coverage | PASS | Unit tests for extractors, mapper, allocation; integration tests for full pipeline |
+| Test coverage | PASS | Unit tests for extractors, mapper, allocation, aggregation; integration tests for full pipeline |
 | Self-hosted compatible | PASS | Core extraction (pdf-parse) is fully local; Azure is optional with tesseract.js fallback |
-**Post-Phase 1 re-check**: PASS — data model adds 2 models/1 enum, no existing schema changes beyond back-references. API contracts follow existing REST patterns. No violations identified.
+**Post-Phase 1 re-check**: PASS — data model adds 3 models/1 enum (K1ImportSession, CellMapping, CellAggregationRule, K1ImportStatus). No existing schema changes beyond back-references. API contracts follow existing REST patterns. Aggregation rules are dynamically computed — no stored denormalization. No violations identified.
 ## Project Structure
@ -43,7 +43,7 @@ No constitution.md exists for this project. Gates assessed against standard engi
 specs/004-k1-scan-import/
 ├── plan.md              # This file
 ├── research.md          # Phase 0: OCR provider research & decisions
-├── data-model.md        # Phase 1: K1ImportSession, CellMapping models
+├── data-model.md        # Phase 1: K1ImportSession, CellMapping, CellAggregationRule models
 ├── quickstart.md        # Phase 1: Setup & dev guide
 ├── contracts/
 │   └── k1-import-api.md # Phase 1: REST API contracts
@ -71,10 +71,11 @@ apps/api/src/app/
 │   │   └── tesseract-extractor.ts
 │   ├── k1-field-mapper.service.ts
 │   ├── k1-allocation.service.ts
-│   └── k1-confidence.service.ts
+│   ├── k1-confidence.service.ts
 │   └── k1-aggregation.service.ts       # Dynamically computes aggregation summaries
 ├── cell-mapping/
 │   ├── cell-mapping.module.ts
-│   ├── cell-mapping.controller.ts
+│   ├── cell-mapping.controller.ts       # Cell mapping + aggregation rule CRUD
 │   └── cell-mapping.service.ts
 apps/client/src/app/
@ -85,17 +86,19 @@ apps/client/src/app/
 │   │   ├── k1-import-page.scss
 │   │   ├── k1-import-page.routes.ts
 │   │   ├── k1-verification/
-│   │   │   ├── k1-verification.component.ts
+│   │   │   ├── k1-verification.component.ts   # Mapped cells + unmapped items + aggregations
 │   │   │   ├── k1-verification.html
 │   │   │   └── k1-verification.scss
 │   │   └── k1-confirmation/
 │   │       ├── k1-confirmation.component.ts
 │   │       ├── k1-confirmation.html
 │   │       └── k1-confirmation.scss
-│   └── cell-mapping/
+│   ├── cell-mapping/
-│       ├── cell-mapping-page.component.ts
+│   │   ├── cell-mapping-page.component.ts     # Cell mapping + aggregation rule config
-│       ├── cell-mapping-page.html
+│   │   ├── cell-mapping-page.html
-│       └── cell-mapping-page.routes.ts
+│   │   └── cell-mapping-page.routes.ts
 │   └── k-document/                             # Existing page — extended
 │       └── k-document-detail/                  # Add aggregation summary section (FR-036)
 ├── services/
 │   └── k1-import-data.service.ts
@ -109,7 +112,7 @@ libs/common/src/lib/
 │       └── confirm-k1-import.dto.ts
 prisma/
-├── schema.prisma                    # + K1ImportSession, CellMapping, K1ImportStatus
+├── schema.prisma                    # + K1ImportSession, CellMapping, CellAggregationRule, K1ImportStatus
 ├── migrations/
 │   └── 2026XXXX_added_k1_import/    # New migration
@ -118,4 +121,4 @@ test/import/
 └── sample-k1-scanned.pdf            # Test fixture: scanned K-1
 ```
-**Structure Decision**: Follows the existing Nx monorepo convention with new NestJS modules under `apps/api/src/app/` and new Angular pages under `apps/client/src/app/pages/`. Shared interfaces and DTOs in `libs/common/`. This mirrors the existing `k-document`, `upload`, and `family-office` module patterns.
+**Structure Decision**: Follows the existing Nx monorepo convention with new NestJS modules under `apps/api/src/app/` and new Angular pages under `apps/client/src/app/pages/`. Shared interfaces and DTOs in `libs/common/`. This mirrors the existing `k-document`, `upload`, and `family-office` module patterns. The KDocument detail view is extended (not replaced) to display aggregation summaries.
--- a/specs/004-k1-scan-import/quickstart.md
+++ b/specs/004-k1-scan-import/quickstart.md
@ -1,6 +1,6 @@
 # Quickstart: K-1 PDF Scan Import
-**Phase 1 Output** | **Date**: 2026-03-18
+**Phase 1 Output** | **Date**: 2026-03-18 | **Updated**: 2026-03-18 (post-clarification)
 ## Prerequisites
@ -28,7 +28,7 @@ npm install -D @types/pdf-parse
 ## Database Migration
-After adding the new Prisma models (`K1ImportSession`, `CellMapping`, `K1ImportStatus` enum):
+After adding the new Prisma models (`K1ImportSession`, `CellMapping`, `CellAggregationRule`, `K1ImportStatus` enum):
 ```bash
 npx prisma db push          # Development: sync schema
@ -36,7 +36,7 @@ npx prisma db push          # Development: sync schema
 npx prisma migrate dev      # Create a migration file
 ```
-Seed the default IRS cell mappings (28 rows with partnershipId = null) via the existing seed mechanism or a dedicated seed script.
+Seed the default IRS cell mappings (28 rows with partnershipId = null) and default aggregation rules (e.g., "Total Ordinary Income", "Total Capital Gains", "Total Deductions") via the existing seed mechanism or a dedicated seed script.
 ## Key Files to Create
@ -58,12 +58,13 @@ app/k1-import/
 │   └── tesseract-extractor.ts       # Tier 2 fallback: tesseract.js OCR
 ├── k1-field-mapper.service.ts       # Maps raw extraction → K1ExtractedField[]
 ├── k1-allocation.service.ts         # Allocates K-1 amounts to members by ownership %
-└── k1-confidence.service.ts         # Computes confidence scores with validation heuristics
+├── k1-confidence.service.ts         # Computes confidence scores with validation heuristics
 └── k1-aggregation.service.ts        # Dynamically computes aggregation summaries from rules
 app/cell-mapping/
 ├── cell-mapping.module.ts           # NestJS module
-├── cell-mapping.controller.ts       # CRUD for cell mappings
+├── cell-mapping.controller.ts       # CRUD for cell mappings + aggregation rules
-└── cell-mapping.service.ts          # Cell mapping business logic + seed data
+└── cell-mapping.service.ts          # Cell mapping + aggregation rule business logic + seed data
 ```
 ### Shared Types (libs/common/src/lib/)
@ -87,7 +88,7 @@ pages/k1-import/
 ├── k1-import-page.scss
 ├── k1-import-page.routes.ts
 ├── k1-verification/
-│   ├── k1-verification.component.ts # Verification/edit screen
+│   ├── k1-verification.component.ts # Verification/edit screen (mapped + unmapped + aggregations)
 │   ├── k1-verification.html
 │   └── k1-verification.scss
 └── k1-confirmation/
@ -96,7 +97,7 @@ pages/k1-import/
    └── k1-confirmation.scss
 pages/cell-mapping/
-├── cell-mapping-page.component.ts   # Cell mapping configuration UI
+├── cell-mapping-page.component.ts   # Cell mapping + aggregation rule configuration UI
 ├── cell-mapping-page.html
 └── cell-mapping-page.routes.ts
@ -108,13 +109,18 @@ services/
 1. **Upload**: User selects PDF → `POST /api/v1/k1-import/upload` → session created with status PROCESSING
 2. **Extract**: Backend detects PDF type (digital vs. scanned) → routes to appropriate extractor → status becomes EXTRACTED
-3. **Review**: Frontend polls/fetches session → displays verification screen with extracted fields, confidence indicators
+3. **Review**: Frontend polls/fetches session → displays verification screen with:
-4. **Edit**: User corrects values, overrides labels → `PUT /api/v1/k1-import/:id/verify` → status becomes VERIFIED
+   - **Mapped cells**: extracted fields with confidence indicators. High-confidence values are pre-accepted. Medium/low-confidence values require explicit review (acknowledge or edit).
   - **Unmapped items**: separate section for values that didn't match any cell. User assigns to a cell or discards.
   - **Aggregation summaries**: dynamically computed from mapped values using aggregation rules. Recalculate live when cell values are edited.
 4. **Verify**: User reviews all medium/low fields and resolves unmapped items → `PUT /api/v1/k1-import/:id/verify` → status becomes VERIFIED
 5. **Confirm**: User clicks "Confirm & Save" → `POST /api/v1/k1-import/:id/confirm` → KDocument + Distributions + Document created → status becomes CONFIRMED
 ## Testing Strategy
- **Unit tests**: Extractors (pdf-parse, azure, tesseract), field mapper, confidence scoring, allocation math
+- **Unit tests**: Extractors (pdf-parse, azure, tesseract), field mapper, confidence scoring, allocation math, aggregation computation
 - **Integration tests**: Full upload → extract → verify → confirm flow with test PDF fixtures
 - **Test fixtures**: Include sample K-1 PDFs (digital and scanned) in `test/import/` directory
 - **Allocation accuracy**: Verify rounding behavior — allocated amounts must sum exactly to partnership total
 - **Aggregation tests**: Verify dynamic computation from rules, auto-recalculation on value edit, behavior when source cells are empty
 - **Review enforcement**: Verify confirmation blocked when medium/low-confidence fields not reviewed or unmapped items unresolved
--- a/specs/004-k1-scan-import/research.md
+++ b/specs/004-k1-scan-import/research.md
@ -152,3 +152,54 @@ AZURE_DOCUMENT_INTELLIGENCE_KEY       — Azure API key
 **Alternatives Considered**:
 - `pdfjs-dist` directly instead of `pdf-parse` — more boilerplate, `pdf-parse` wraps it with a simpler API
 - Only cloud OCR — loses the self-hosted story and adds cost for digital PDFs
 ---
 ## Decision 10: Cell Aggregation Rules — Dynamic Computation
 **Decision**: Persist only aggregation rule definitions (name, source cells, operation). Compute totals dynamically from raw K-1 box values at display time. Do NOT store computed totals.
 **Rationale**:
 - K-1 values can change during the import lifecycle (estimated → final transitions, manual edits after confirmation)
 - Storing computed totals creates a denormalization risk — stale aggregates when underlying values change
 - Computation is trivial (summing a handful of numbers) with no performance concern at family office scale
 - Keeps a single source of truth: the raw box values in K1Data
 - Aggregation rules are displayed on both the verification screen (FR-033) and KDocument detail view (FR-036)
 **Alternatives Considered**:
 - Persist computed totals alongside raw data — creates stale data risk, requires update triggers
 - Persist both (snapshot + live) for audit — adds complexity V1 doesn't need; audit trail exists in import session history
 ---
 ## Decision 11: Unmapped Items Handling
 **Decision**: Display extracted values that don't match any configured cell mapping in a separate "Unmapped Items" section on the verification screen. Administrator can assign to an existing cell, create a new custom cell, or discard.
 **Rationale**:
 - OCR/extraction may pull supplemental schedule items, footnotes, state-specific addenda
 - Silently discarding loses potentially important data
 - Auto-creating cells for every unmatched value creates noise
 - Explicit user decision preserves data integrity while keeping mapped cells clean
 - Assigned unmapped items update the cell mapping for future imports (learning effect)
 **Alternatives Considered**:
 - Silent discard — loses data, violates user's expectation of completeness
 - Auto-create custom cells — too noisy; PDF footnotes and headers would create junk cells
 ---
 ## Decision 12: Verification Auto-Accept Strategy
 **Decision**: Auto-accept (pre-check) high-confidence values on the verification screen. Require explicit review (acknowledge or edit) for medium and low-confidence values before allowing confirmation.
 **Rationale**:
 - V1 is "partially manual, partially automated" per user intent
 - High-confidence values (≥ 0.85) from digital PDFs are reliably accurate (90%+ per SC-002)
 - Forcing explicit review of every cell wastes time on correct values
 - Blocking confirmation until medium/low-confidence fields are reviewed catches the errors
 - All values remain visible and editable — user can override any pre-accepted value
 **Alternatives Considered**:
 - Every cell requires explicit accept — too slow for 15+ fields, doesn't match "partially automated" intent
 - Spot-check model (everything auto-accepted) — too risky for tax data; OCR errors would go unreviewed