plan: K-1 scan import implementation plan (Phase 0+1)

- research.md: 9 architectural decisions (two-tier OCR, Azure+tesseract fallback) - data-model.md: K1ImportSession, CellMapping models, K1ImportStatus enum - contracts/k1-import-api.md: 10 REST endpoints - quickstart.md: file structure, setup guide, workflow - plan.md: summary, technical context, constitution check, project structure - Updated copilot agent context with new tech stack entries
4 months ago · 269eed7faf
6 changed files with 1021 additions and 1 deletions
--- a/.github/agents/copilot-instructions.md
+++ b/.github/agents/copilot-instructions.md
@ -1,10 +1,12 @@
 # portfolio-management Development Guidelines

-Auto-generated from all feature plans. Last updated: 2026-03-16
+Auto-generated from all feature plans. Last updated: 2026-03-18

 ## Active Technologies
 - TypeScript 5.9.2, Node.js >= 22.18.0 + Angular 21.1.1, NestJS 11.1.14, Angular Material 21.1.1, Prisma 6.19.0, big.js, date-fns 4.1.0 (003-portfolio-performance-views)
 - PostgreSQL via Prisma ORM (003-portfolio-performance-views)
+- TypeScript 5.9.2, Node.js ≥ 22.18.0 + NestJS 11.x (backend), Angular 21.x (frontend), Prisma 6.x (ORM), pdf-parse (PDF text), @azure/ai-form-recognizer (cloud OCR), tesseract.js (local OCR fallback) (004-k1-scan-import)
+- PostgreSQL via Prisma (structured data), local filesystem `uploads/` (PDF files) (004-k1-scan-import)

 - TypeScript 5.9.2, Node.js ≥22.18.0 + NestJS 11.1.14 (API), Angular 21.1.1 + Angular Material 21.1.1 (client), Prisma 6.19.0 (ORM), Nx 22.5.3 (monorepo), big.js (decimal math), date-fns 4.1.0, chart.js 4.5.1, Bull 4.16.5 (job queues), Redis (caching), yahoo-finance2 3.13.2 (001-family-office-transform)

@ -25,6 +27,7 @@ npm test; npm run lint
 TypeScript 5.9.2, Node.js ≥22.18.0: Follow standard conventions

 ## Recent Changes
+- 004-k1-scan-import: Added TypeScript 5.9.2, Node.js ≥ 22.18.0 + NestJS 11.x (backend), Angular 21.x (frontend), Prisma 6.x (ORM), pdf-parse (PDF text), @azure/ai-form-recognizer (cloud OCR), tesseract.js (local OCR fallback)
 - 003-portfolio-performance-views: Added TypeScript 5.9.2, Node.js >= 22.18.0 + Angular 21.1.1, NestJS 11.1.14, Angular Material 21.1.1, Prisma 6.19.0, big.js, date-fns 4.1.0

 - 001-family-office-transform: Added TypeScript 5.9.2, Node.js ≥22.18.0 + NestJS 11.1.14 (API), Angular 21.1.1 + Angular Material 21.1.1 (client), Prisma 6.19.0 (ORM), Nx 22.5.3 (monorepo), big.js (decimal math), date-fns 4.1.0, chart.js 4.5.1, Bull 4.16.5 (job queues), Redis (caching), yahoo-finance2 3.13.2
--- a/specs/004-k1-scan-import/contracts/k1-import-api.md
+++ b/specs/004-k1-scan-import/contracts/k1-import-api.md
@ -0,0 +1,381 @@
+# API Contracts: K-1 Import
+
+**Phase 1 Output** | **Date**: 2026-03-18
+
+## Base Path
+
+All endpoints under `/api/v1/k1-import/`
+
+## Authentication
+
+All endpoints require JWT authentication (`AuthGuard('jwt')`) and appropriate permissions via `HasPermissionGuard`.
+
+---
+
+## Endpoints
+
+### POST /api/v1/k1-import/upload
+
+Upload a K-1 PDF and initiate extraction.
+
+**Permission**: `createKDocument`
+
+**Request**: `multipart/form-data`
+
+| Field           | Type     | Required | Description                          |
+| --------------- | -------- | -------- | ------------------------------------ |
+| `file`          | File     | Yes      | PDF file (max 25 MB, MIME: application/pdf) |
+| `partnershipId` | `string` | Yes      | Target partnership UUID              |
+| `taxYear`       | `number` | Yes      | Tax year for this K-1               |
+
+**Response**: `201 Created`
+
+```json
+{
+  "id": "uuid",
+  "partnershipId": "uuid",
+  "status": "PROCESSING",
+  "taxYear": 2025,
+  "fileName": "K1-Smith-Capital-2025.pdf",
+  "fileSize": 245760,
+  "extractionMethod": "pdf-parse",
+  "createdAt": "2026-03-18T00:00:00.000Z"
+}
+```
+
+**Errors**:
+
+| Status | Condition                              |
+| ------ | -------------------------------------- |
+| 400    | File is not a valid PDF                |
+| 400    | File exceeds 25 MB size limit          |
+| 400    | Partnership not found or not owned by user |
+| 400    | Partnership has no active members      |
+| 400    | Tax year < partnership inception year  |
+
+---
+
+### GET /api/v1/k1-import/:id
+
+Get the current state of an import session, including extraction results.
+
+**Permission**: `readKDocument`
+
+**Response**: `200 OK`
+
+```json
+{
+  "id": "uuid",
+  "partnershipId": "uuid",
+  "status": "EXTRACTED",
+  "taxYear": 2025,
+  "fileName": "K1-Smith-Capital-2025.pdf",
+  "fileSize": 245760,
+  "extractionMethod": "pdf-parse",
+  "rawExtraction": {
+    "metadata": {
+      "partnershipName": "Smith Capital Partners LP",
+      "partnershipEin": "12-3456789",
+      "partnerName": "Smith Family Trust",
+      "partnerEin": "98-7654321",
+      "taxYear": 2025,
+      "isAmended": false,
+      "isFinal": true
+    },
+    "fields": [
+      {
+        "boxNumber": "1",
+        "label": "Ordinary business income (loss)",
+        "customLabel": null,
+        "rawValue": "$52,340",
+        "numericValue": 52340,
+        "confidence": 0.95,
+        "confidenceLevel": "HIGH",
+        "isUserEdited": false
+      }
+    ],
+    "overallConfidence": 0.92,
+    "method": "pdf-parse",
+    "pagesProcessed": 2
+  },
+  "verifiedData": null,
+  "documentId": "uuid",
+  "kDocumentId": null,
+  "errorMessage": null,
+  "createdAt": "2026-03-18T00:00:00.000Z",
+  "updatedAt": "2026-03-18T00:00:05.000Z"
+}
+```
+
+**Errors**:
+
+| Status | Condition                           |
+| ------ | ----------------------------------- |
+| 404    | Import session not found            |
+| 403    | Import session belongs to different user |
+
+---
+
+### PUT /api/v1/k1-import/:id/verify
+
+Submit user-verified/edited extraction data. Transitions status from EXTRACTED to VERIFIED.
+
+**Permission**: `updateKDocument`
+
+**Request**: `application/json`
+
+```json
+{
+  "taxYear": 2025,
+  "fields": [
+    {
+      "boxNumber": "1",
+      "label": "Ordinary business income (loss)",
+      "customLabel": null,
+      "rawValue": "$52,340",
+      "numericValue": 52340,
+      "confidence": 0.95,
+      "confidenceLevel": "HIGH",
+      "isUserEdited": false
+    },
+    {
+      "boxNumber": "11",
+      "label": "Other income (loss)",
+      "customLabel": "Section 1256 contracts",
+      "rawValue": "$8,200",
+      "numericValue": 8200,
+      "confidence": 0.72,
+      "confidenceLevel": "MEDIUM",
+      "isUserEdited": true
+    }
+  ]
+}
+```
+
+**Response**: `200 OK` — Updated import session with status `VERIFIED`
+
+**Errors**:
+
+| Status | Condition                                       |
+| ------ | ----------------------------------------------- |
+| 400    | Import session is not in EXTRACTED status        |
+| 400    | Fields array is empty                            |
+| 404    | Import session not found                         |
+
+---
+
+### POST /api/v1/k1-import/:id/confirm
+
+Confirm verified data and trigger automatic model object creation (KDocument, Distributions, Document linkage).
+
+**Permission**: `createKDocument`
+
+**Request**: `application/json`
+
+```json
+{
+  "filingStatus": "DRAFT",
+  "existingKDocumentAction": null
+}
+```
+
+| Field                     | Type                          | Required | Description                              |
+| ------------------------- | ----------------------------- | -------- | ---------------------------------------- |
+| `filingStatus`            | `"DRAFT" \| "ESTIMATED" \| "FINAL"` | Yes      | Status for the created/updated KDocument |
+| `existingKDocumentAction` | `"UPDATE" \| "CREATE_NEW" \| null`   | No       | Action if KDocument already exists       |
+
+**Response**: `201 Created`
+
+```json
+{
+  "importSession": {
+    "id": "uuid",
+    "status": "CONFIRMED"
+  },
+  "kDocument": {
+    "id": "uuid",
+    "partnershipId": "uuid",
+    "type": "K1",
+    "taxYear": 2025,
+    "filingStatus": "DRAFT",
+    "data": { "ordinaryIncome": 52340, "..." : "..." }
+  },
+  "distributions": [
+    {
+      "id": "uuid",
+      "entityId": "uuid",
+      "partnershipId": "uuid",
+      "type": "RETURN_OF_CAPITAL",
+      "amount": 60000,
+      "date": "2025-12-31T00:00:00.000Z"
+    }
+  ],
+  "allocations": [
+    {
+      "entityId": "uuid",
+      "entityName": "Smith Family Trust",
+      "ownershipPercent": 60,
+      "allocatedValues": { "ordinaryIncome": 31404, "..." : "..." }
+    }
+  ],
+  "document": {
+    "id": "uuid",
+    "type": "K1",
+    "name": "K1-Smith-Capital-2025.pdf"
+  }
+}
+```
+
+**Errors**:
+
+| Status | Condition                                                     |
+| ------ | ------------------------------------------------------------- |
+| 400    | Import session is not in VERIFIED status                       |
+| 400    | Partnership has no active members                              |
+| 409    | KDocument already exists for this partnership/year and no action specified |
+
+---
+
+### POST /api/v1/k1-import/:id/cancel
+
+Cancel an import session. No model objects are created.
+
+**Permission**: `updateKDocument`
+
+**Response**: `200 OK` — Updated import session with status `CANCELLED`
+
+**Errors**:
+
+| Status | Condition                                     |
+| ------ | --------------------------------------------- |
+| 400    | Import session is already CONFIRMED or CANCELLED |
+| 404    | Import session not found                       |
+
+---
+
+### GET /api/v1/k1-import/history
+
+List import sessions for a partnership, ordered by creation date descending.
+
+**Permission**: `readKDocument`
+
+**Query Parameters**:
+
+| Param           | Type     | Required | Description                    |
+| --------------- | -------- | -------- | ------------------------------ |
+| `partnershipId` | `string` | Yes      | Partnership UUID               |
+| `taxYear`       | `number` | No       | Filter by tax year             |
+
+**Response**: `200 OK` — Array of import session summaries
+
+```json
+[
+  {
+    "id": "uuid",
+    "partnershipId": "uuid",
+    "status": "CONFIRMED",
+    "taxYear": 2025,
+    "fileName": "K1-Smith-Capital-2025.pdf",
+    "extractionMethod": "pdf-parse",
+    "kDocumentId": "uuid",
+    "createdAt": "2026-03-18T00:00:00.000Z"
+  }
+]
+```
+
+---
+
+### POST /api/v1/k1-import/:id/reprocess
+
+Re-run extraction on a previously uploaded PDF using the current cell mapping configuration.
+
+**Permission**: `updateKDocument`
+
+**Response**: `200 OK` — New import session with status `PROCESSING` (original session unchanged)
+
+**Errors**:
+
+| Status | Condition                                   |
+| ------ | ------------------------------------------- |
+| 400    | Original import session has no stored document |
+| 404    | Import session not found                     |
+
+---
+
+## Cell Mapping Endpoints
+
+### GET /api/v1/cell-mapping
+
+Get cell mappings for a partnership (with global defaults for unmapped boxes).
+
+**Permission**: `readKDocument`
+
+**Query Parameters**:
+
+| Param           | Type     | Required | Description                              |
+| --------------- | -------- | -------- | ---------------------------------------- |
+| `partnershipId` | `string` | No       | Partnership UUID (omit for global defaults) |
+
+**Response**: `200 OK`
+
+```json
+[
+  {
+    "id": "uuid",
+    "partnershipId": null,
+    "boxNumber": "1",
+    "label": "Ordinary business income (loss)",
+    "description": "IRS Schedule K-1 Box 1",
+    "isCustom": false,
+    "sortOrder": 1
+  }
+]
+```
+
+---
+
+### PUT /api/v1/cell-mapping
+
+Update or create cell mappings for a partnership.
+
+**Permission**: `updateKDocument`
+
+**Request**: `application/json`
+
+```json
+{
+  "partnershipId": "uuid",
+  "mappings": [
+    {
+      "boxNumber": "11",
+      "label": "Section 1256 contracts",
+      "description": "Custom label for Box 11",
+      "isCustom": false
+    },
+    {
+      "boxNumber": "20-Z",
+      "label": "Qualified Business Income (Section 199A)",
+      "description": "Custom additional box",
+      "isCustom": true
+    }
+  ]
+}
+```
+
+**Response**: `200 OK` — Updated mappings array
+
+---
+
+### DELETE /api/v1/cell-mapping/reset
+
+Reset a partnership's cell mappings to IRS defaults (deletes all custom mappings for the partnership).
+
+**Permission**: `updateKDocument`
+
+**Query Parameters**:
+
+| Param           | Type     | Required | Description        |
+| --------------- | -------- | -------- | ------------------ |
+| `partnershipId` | `string` | Yes      | Partnership UUID   |
+
+**Response**: `200 OK`
--- a/specs/004-k1-scan-import/data-model.md
+++ b/specs/004-k1-scan-import/data-model.md
@ -0,0 +1,241 @@
+# Data Model: K-1 PDF Scan Import
+
+**Phase 1 Output** | **Date**: 2026-03-18
+
+## Overview
+
+This feature adds 2 new Prisma models and 1 new enum to support K-1 PDF scanning, import session tracking, and cell mapping configuration. It extends the existing models from spec 001-family-office-transform (KDocument, Distribution, Document, PartnershipMembership) with automatic creation from scanned data.
+
+### Entity Relationship Diagram (Conceptual)
+
+```
+User (existing)
+ └── Partnership (existing from 001)
+      ├── K1ImportSession[] ──┬── Document (uploaded PDF, existing from 001)
+      │   (new model)         ├── KDocument (auto-created, existing from 001)
+      │                       └── CellMapping (per-partnership config)
+      ├── PartnershipMembership[] (existing from 001)
+      │    └── [K-1 allocations computed at confirm time]
+      ├── KDocument[] (existing from 001)
+      │    └── Distribution[] (auto-created from Box 19, existing from 001)
+      └── CellMapping[] (new model, per-partnership overrides)
+
+Global CellMapping (partnershipId = null) ── IRS default box definitions
+```
+
+## New Enum
+
+### K1ImportStatus
+
+Tracks the lifecycle of a K-1 import session.
+
+| Value        | Description                                                    |
+| ------------ | -------------------------------------------------------------- |
+| `PROCESSING` | PDF uploaded, extraction in progress                           |
+| `EXTRACTED`  | Extraction complete, awaiting user review                      |
+| `VERIFIED`   | User has reviewed/edited values, ready for confirmation        |
+| `CONFIRMED`  | User confirmed, model objects created (KDocument, Distributions) |
+| `CANCELLED`  | User cancelled, no model objects created                       |
+| `FAILED`     | Extraction failed (invalid PDF, OCR error, etc.)               |
+
+## New Models
+
+### K1ImportSession
+
+A record of a single K-1 PDF import attempt, tracking the full lifecycle from upload through confirmation.
+
+| Field              | Type             | Constraints                  | Description                                                     |
+| ------------------ | ---------------- | ---------------------------- | --------------------------------------------------------------- |
+| `id`               | `String`         | PK, UUID, auto-generated     | Unique identifier                                               |
+| `partnershipId`    | `String`         | FK → Partnership.id, indexed | Target partnership for this K-1 import                         |
+| `userId`           | `String`         | FK → User.id, indexed        | User who initiated the import                                   |
+| `status`           | `K1ImportStatus` | Required, Default: PROCESSING | Current lifecycle status                                       |
+| `taxYear`          | `Int`            | Required                     | Tax year extracted or specified by user                         |
+| `fileName`         | `String`         | Required                     | Original filename of uploaded PDF                               |
+| `fileSize`         | `Int`            | Required                     | File size in bytes                                              |
+| `extractionMethod` | `String`         | Required                     | Method used: "pdf-parse", "azure", "tesseract"                 |
+| `rawExtraction`    | `Json?`          | Optional                     | Raw extraction results before user edits                       |
+| `verifiedData`     | `Json?`          | Optional                     | User-verified/edited extraction results (K1ExtractionResult)   |
+| `documentId`       | `String?`        | FK → Document.id, optional   | Linked uploaded PDF Document record                            |
+| `kDocumentId`      | `String?`        | FK → KDocument.id, optional  | Resulting KDocument (set on CONFIRMED status)                  |
+| `errorMessage`     | `String?`        | Optional                     | Error details if status is FAILED                              |
+| `createdAt`        | `DateTime`       | Default: now()               | Upload timestamp                                               |
+| `updatedAt`        | `DateTime`       | Auto-updated                 | Last modification timestamp                                    |
+
+**Relations**:
+
+- `partnership` → `Partnership` (many-to-one, cascade delete)
+- `user` → `User` (many-to-one, cascade delete)
+- `document` → `Document?` (many-to-one, optional)
+- `kDocument` → `KDocument?` (many-to-one, optional)
+
+**Indexes**: `@@index([partnershipId, taxYear])` for import history queries per partnership/year.
+
+### CellMapping
+
+A configuration defining how K-1 box numbers map to labels. Supports a global IRS-default set (partnershipId = null) and per-partnership customizations.
+
+| Field           | Type       | Constraints                            | Description                                          |
+| --------------- | ---------- | -------------------------------------- | ---------------------------------------------------- |
+| `id`            | `String`   | PK, UUID, auto-generated               | Unique identifier                                    |
+| `partnershipId` | `String?`  | FK → Partnership.id, optional, indexed | Partnership this mapping applies to (null = global)  |
+| `boxNumber`     | `String`   | Required                               | K-1 box identifier (e.g., "1", "6a", "19a", "20-A") |
+| `label`         | `String`   | Required                               | Display label (e.g., "Ordinary business income")     |
+| `description`   | `String?`  | Optional                               | Extended description or IRS instructions              |
+| `isCustom`      | `Boolean`  | Default: false                         | Whether this is a user-added custom cell             |
+| `sortOrder`     | `Int`      | Required                               | Display order in the verification screen             |
+| `createdAt`     | `DateTime` | Default: now()                         | Creation timestamp                                   |
+| `updatedAt`     | `DateTime` | Auto-updated                           | Last modification timestamp                          |
+
+**Relations**:
+
+- `partnership` → `Partnership?` (many-to-one, optional, cascade delete)
+
+**Unique constraint**: `@@unique([partnershipId, boxNumber])` — one mapping per box per partnership (or per box globally when partnershipId is null).
+
+## Modifications to Existing Models
+
+### Partnership (from spec 001)
+
+Add back-references — no column changes:
+
+| New Field         | Type                 | Description                          |
+| ----------------- | -------------------- | ------------------------------------ |
+| `importSessions`  | `K1ImportSession[]`  | Import attempts for this partnership |
+| `cellMappings`    | `CellMapping[]`      | Custom cell mapping configurations   |
+
+### KDocument (from spec 001)
+
+Add back-reference — no column changes:
+
+| New Field        | Type                | Description                              |
+| ---------------- | ------------------- | ---------------------------------------- |
+| `importSession`  | `K1ImportSession?`  | Import session that created this record  |
+
+## Application-Layer Types
+
+### K1ExtractionResult (TypeScript interface)
+
+The structure returned by the extraction service and stored in `K1ImportSession.rawExtraction` and `K1ImportSession.verifiedData`.
+
+```typescript
+interface K1ExtractionResult {
+  /** Extracted metadata from the K-1 header */
+  metadata: {
+    partnershipName: string | null;
+    partnershipEin: string | null;
+    partnerName: string | null;
+    partnerEin: string | null;
+    taxYear: number | null;
+    isAmended: boolean;
+    isFinal: boolean;
+  };
+
+  /** Extracted box values */
+  fields: K1ExtractedField[];
+
+  /** Overall extraction confidence (0.0–1.0) */
+  overallConfidence: number;
+
+  /** Extraction method used */
+  method: 'pdf-parse' | 'azure' | 'tesseract';
+
+  /** Number of pages processed */
+  pagesProcessed: number;
+}
+
+interface K1ExtractedField {
+  /** Box identifier (e.g., "1", "6a", "19a") */
+  boxNumber: string;
+
+  /** Display label from cell mapping */
+  label: string;
+
+  /** Custom label override by user (null if not overridden) */
+  customLabel: string | null;
+
+  /** Extracted raw text value */
+  rawValue: string;
+
+  /** Parsed numeric value (null if unparseable) */
+  numericValue: number | null;
+
+  /** Confidence score (0.0–1.0) */
+  confidence: number;
+
+  /** Confidence level for display */
+  confidenceLevel: 'HIGH' | 'MEDIUM' | 'LOW';
+
+  /** Whether user has manually edited this value */
+  isUserEdited: boolean;
+}
+```
+
+### K1ConfirmationRequest (TypeScript interface)
+
+The request body when the user confirms verified K-1 data.
+
+```typescript
+interface K1ConfirmationRequest {
+  /** Import session ID */
+  importSessionId: string;
+
+  /** Tax year (may have been overridden by user) */
+  taxYear: number;
+
+  /** Filing status for the new KDocument */
+  filingStatus: 'DRAFT' | 'ESTIMATED' | 'FINAL';
+
+  /** Verified fields with any user edits applied */
+  fields: K1ExtractedField[];
+
+  /** Whether to update an existing KDocument (null = create new) */
+  existingKDocumentAction: 'UPDATE' | 'CREATE_NEW' | null;
+}
+```
+
+### Default IRS K-1 Cell Mapping
+
+The standard box definitions seeded as global CellMapping records (partnershipId = null):
+
+| boxNumber | label                                     | sortOrder |
+| --------- | ----------------------------------------- | --------- |
+| 1         | Ordinary business income (loss)            | 1         |
+| 2         | Net rental real estate income (loss)       | 2         |
+| 3         | Other net rental income (loss)             | 3         |
+| 4         | Guaranteed payments for services           | 4         |
+| 4a        | Guaranteed payments for capital            | 5         |
+| 4b        | Total guaranteed payments                  | 6         |
+| 5         | Interest income                            | 7         |
+| 6a        | Ordinary dividends                         | 8         |
+| 6b        | Qualified dividends                        | 9         |
+| 6c        | Dividend equivalents                       | 10        |
+| 7         | Royalties                                  | 11        |
+| 8         | Net short-term capital gain (loss)         | 12        |
+| 9a        | Net long-term capital gain (loss)          | 13        |
+| 9b        | Collectibles (28%) gain (loss)             | 14        |
+| 9c        | Unrecaptured section 1250 gain             | 15        |
+| 10        | Net section 1231 gain (loss)               | 16        |
+| 11        | Other income (loss)                        | 17        |
+| 12        | Section 179 deduction                      | 18        |
+| 13        | Other deductions                           | 19        |
+| 14        | Self-employment earnings (loss)            | 20        |
+| 15        | Credits                                    | 21        |
+| 16        | Foreign transactions                       | 22        |
+| 17        | Alternative minimum tax (AMT) items        | 23        |
+| 18        | Tax-exempt income and nondeductible expenses | 24      |
+| 19a       | Distributions — Cash and marketable securities | 25    |
+| 19b       | Distributions — Other property             | 26        |
+| 20        | Other information                          | 27        |
+| 21        | Foreign taxes paid or accrued              | 28        |
+
+## Validation Rules
+
+1. **Import session partnership**: Must reference an existing partnership owned by the current user.
+2. **Import session tax year**: Must be ≥ year of the partnership's inception date.
+3. **File upload**: Must be a valid PDF, ≤ 25 MB. System rejects non-PDF MIME types.
+4. **Extraction status transitions**: Only valid transitions: PROCESSING → EXTRACTED → VERIFIED → CONFIRMED/CANCELLED, or PROCESSING → FAILED. No backwards transitions.
+5. **Cell mapping uniqueness**: One mapping per (partnershipId, boxNumber). Custom mappings for a partnership override the global default for that box number.
+6. **Confirmation prerequisites**: Can only confirm when status is VERIFIED, partnership has at least one active member, and verifiedData is not null.
+7. **Duplicate KDocument check**: Before creating a KDocument, check for existing (partnershipId, type=K1, taxYear). If found, require explicit user decision (update existing or reject).
+8. **Distribution allocation**: Box 19a/19b amounts are allocated to members by ownership percentage as of the tax year's fiscal year end. Allocation amounts must sum exactly to the partnership-level total (handle rounding by adjusting the largest member's allocation).
--- a/specs/004-k1-scan-import/plan.md
+++ b/specs/004-k1-scan-import/plan.md
@ -0,0 +1,121 @@
+# Implementation Plan: K-1 PDF Scan Import
+
+**Branch**: `004-k1-scan-import` | **Date**: 2026-03-18 | **Spec**: [spec.md](spec.md)
+**Input**: Feature specification from `/specs/004-k1-scan-import/spec.md`
+
+## Summary
+
+Automated K-1 PDF scanning that extracts structured IRS Schedule K-1 (Form 1065) data from uploaded PDFs, presents a verification screen for manual review/correction, and auto-creates downstream model objects (KDocument, Distributions, member allocations, Document). Uses a two-tier extraction approach: `pdf-parse` for digital PDFs (free, instant, local) and Azure AI Document Intelligence / `tesseract.js` fallback for scanned PDFs. Supports per-partnership cell mapping customization and import history with re-processing.
+
+## Technical Context
+
+**Language/Version**: TypeScript 5.9.2, Node.js ≥ 22.18.0
+**Primary Dependencies**: NestJS 11.x (backend), Angular 21.x (frontend), Prisma 6.x (ORM), pdf-parse (PDF text), @azure/ai-form-recognizer (cloud OCR), tesseract.js (local OCR fallback)
+**Storage**: PostgreSQL via Prisma (structured data), local filesystem `uploads/` (PDF files)
+**Testing**: Jest (unit + integration), test K-1 PDF fixtures in `test/import/`
+**Target Platform**: Docker (node:22-slim), self-hosted or Railway
+**Project Type**: Web application (NestJS API + Angular SPA) — Nx monorepo
+**Performance Goals**: PDF extraction < 30 seconds (SC-001), model creation < 5 seconds (SC-005), 90%+ accuracy for digital PDFs (SC-002)
+**Constraints**: Self-hosted capable (Azure OCR optional), max PDF size 25 MB, K-1 Form 1065 only (V1)
+**Scale/Scope**: Single family office (10–50 partnerships, 10–50 K-1s/year), 2 new API modules, 3 new frontend pages
+
+## Constitution Check
+
+_GATE: Must pass before Phase 0 research. Re-check after Phase 1 design._
+
+No constitution.md exists for this project. Gates assessed against standard engineering principles:
+
+| Gate | Status | Notes |
+|------|--------|-------|
+| No unnecessary dependencies | PASS | 3 new packages (`pdf-parse`, `@azure/ai-form-recognizer`, `tesseract.js`) — each serves a distinct, justified purpose per research.md |
+| Follows existing patterns | PASS | New NestJS modules follow existing controller/service/DTO pattern (mirrors `k-document`, `upload` modules) |
+| No breaking changes | PASS | 2 new Prisma models + 1 enum, back-references only on existing models — no column changes |
+| Test coverage | PASS | Unit tests for extractors, mapper, allocation; integration tests for full pipeline |
+| Self-hosted compatible | PASS | Core extraction (pdf-parse) is fully local; Azure is optional with tesseract.js fallback |
+
+**Post-Phase 1 re-check**: PASS — data model adds 2 models/1 enum, no existing schema changes beyond back-references. API contracts follow existing REST patterns. No violations identified.
+
+## Project Structure
+
+### Documentation (this feature)
+
+```text
+specs/004-k1-scan-import/
+├── plan.md              # This file
+├── research.md          # Phase 0: OCR provider research & decisions
+├── data-model.md        # Phase 1: K1ImportSession, CellMapping models
+├── quickstart.md        # Phase 1: Setup & dev guide
+├── contracts/
+│   └── k1-import-api.md # Phase 1: REST API contracts
+├── checklists/
+│   └── requirements.md  # Spec quality checklist
+└── tasks.md             # Phase 2 output (created by /speckit.tasks)
+```
+
+### Source Code (repository root)
+
+```text
+apps/api/src/app/
+├── k1-import/
+│   ├── k1-import.module.ts
+│   ├── k1-import.controller.ts
+│   ├── k1-import.service.ts
+│   ├── dto/
+│   │   ├── upload-k1.dto.ts
+│   │   ├── verify-k1.dto.ts
+│   │   └── confirm-k1.dto.ts
+│   ├── extractors/
+│   │   ├── k1-extractor.interface.ts
+│   │   ├── pdf-parse-extractor.ts
+│   │   ├── azure-extractor.ts
+│   │   └── tesseract-extractor.ts
+│   ├── k1-field-mapper.service.ts
+│   ├── k1-allocation.service.ts
+│   └── k1-confidence.service.ts
+├── cell-mapping/
+│   ├── cell-mapping.module.ts
+│   ├── cell-mapping.controller.ts
+│   └── cell-mapping.service.ts
+
+apps/client/src/app/
+├── pages/
+│   ├── k1-import/
+│   │   ├── k1-import-page.component.ts
+│   │   ├── k1-import-page.html
+│   │   ├── k1-import-page.scss
+│   │   ├── k1-import-page.routes.ts
+│   │   ├── k1-verification/
+│   │   │   ├── k1-verification.component.ts
+│   │   │   ├── k1-verification.html
+│   │   │   └── k1-verification.scss
+│   │   └── k1-confirmation/
+│   │       ├── k1-confirmation.component.ts
+│   │       ├── k1-confirmation.html
+│   │       └── k1-confirmation.scss
+│   └── cell-mapping/
+│       ├── cell-mapping-page.component.ts
+│       ├── cell-mapping-page.html
+│       └── cell-mapping-page.routes.ts
+├── services/
+│   └── k1-import-data.service.ts
+
+libs/common/src/lib/
+├── interfaces/
+│   └── k1-import.interface.ts
+├── dtos/
+│   └── k1-import/
+│       ├── create-k1-import.dto.ts
+│       ├── verify-k1-import.dto.ts
+│       └── confirm-k1-import.dto.ts
+
+prisma/
+├── schema.prisma                    # + K1ImportSession, CellMapping, K1ImportStatus
+├── migrations/
+│   └── 2026XXXX_added_k1_import/    # New migration
+
+test/import/
+├── sample-k1-digital.pdf            # Test fixture: digital K-1
+└── sample-k1-scanned.pdf            # Test fixture: scanned K-1
+```
+
+**Structure Decision**: Follows the existing Nx monorepo convention with new NestJS modules under `apps/api/src/app/` and new Angular pages under `apps/client/src/app/pages/`. Shared interfaces and DTOs in `libs/common/`. This mirrors the existing `k-document`, `upload`, and `family-office` module patterns.
--- a/specs/004-k1-scan-import/quickstart.md
+++ b/specs/004-k1-scan-import/quickstart.md
@ -0,0 +1,120 @@
+# Quickstart: K-1 PDF Scan Import
+
+**Phase 1 Output** | **Date**: 2026-03-18
+
+## Prerequisites
+
+1. Spec 001-family-office-transform models are implemented (Entity, Partnership, PartnershipMembership, KDocument, Distribution, Document)
+2. At least one Partnership with one or more member Entities exists in the database
+3. The existing upload infrastructure (`UploadController`, `uploads/` directory) is functional
+4. Node.js ≥ 22.18.0, Docker for PostgreSQL/Redis
+
+## Environment Setup
+
+Add to `.env` (optional — for Azure OCR of scanned PDFs):
+```
+AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-resource.cognitiveservices.azure.com/
+AZURE_DOCUMENT_INTELLIGENCE_KEY=your-api-key
+```
+
+If these are empty, scanned PDFs fall back to `tesseract.js` (lower accuracy but fully self-hosted).
+
+## New Dependencies
+
+```bash
+npm install pdf-parse @azure/ai-form-recognizer tesseract.js
+npm install -D @types/pdf-parse
+```
+
+## Database Migration
+
+After adding the new Prisma models (`K1ImportSession`, `CellMapping`, `K1ImportStatus` enum):
+
+```bash
+npx prisma db push          # Development: sync schema
+# OR
+npx prisma migrate dev      # Create a migration file
+```
+
+Seed the default IRS cell mappings (28 rows with partnershipId = null) via the existing seed mechanism or a dedicated seed script.
+
+## Key Files to Create
+
+### Backend (apps/api/src/)
+
+```
+app/k1-import/
+├── k1-import.module.ts              # NestJS module
+├── k1-import.controller.ts          # REST endpoints (see contracts/k1-import-api.md)
+├── k1-import.service.ts             # Orchestration: upload → extract → verify → confirm
+├── dto/
+│   ├── upload-k1.dto.ts             # Multipart upload DTO
+│   ├── verify-k1.dto.ts             # Verification submission DTO
+│   └── confirm-k1.dto.ts            # Confirmation request DTO
+├── extractors/
+│   ├── k1-extractor.interface.ts    # Common extraction interface
+│   ├── pdf-parse-extractor.ts       # Tier 1: digital PDF text extraction
+│   ├── azure-extractor.ts           # Tier 2: Azure Document Intelligence
+│   └── tesseract-extractor.ts       # Tier 2 fallback: tesseract.js OCR
+├── k1-field-mapper.service.ts       # Maps raw extraction → K1ExtractedField[]
+├── k1-allocation.service.ts         # Allocates K-1 amounts to members by ownership %
+└── k1-confidence.service.ts         # Computes confidence scores with validation heuristics
+
+app/cell-mapping/
+├── cell-mapping.module.ts           # NestJS module
+├── cell-mapping.controller.ts       # CRUD for cell mappings
+└── cell-mapping.service.ts          # Cell mapping business logic + seed data
+```
+
+### Shared Types (libs/common/src/lib/)
+
+```
+interfaces/
+├── k1-import.interface.ts           # K1ExtractionResult, K1ExtractedField, K1ConfirmationRequest
+dtos/
+├── k1-import/
+│   ├── create-k1-import.dto.ts
+│   ├── verify-k1-import.dto.ts
+│   └── confirm-k1-import.dto.ts
+```
+
+### Frontend (apps/client/src/app/)
+
+```
+pages/k1-import/
+├── k1-import-page.component.ts      # Upload + history view
+├── k1-import-page.html
+├── k1-import-page.scss
+├── k1-import-page.routes.ts
+├── k1-verification/
+│   ├── k1-verification.component.ts # Verification/edit screen
+│   ├── k1-verification.html
+│   └── k1-verification.scss
+└── k1-confirmation/
+    ├── k1-confirmation.component.ts  # Confirmation result screen
+    ├── k1-confirmation.html
+    └── k1-confirmation.scss
+
+pages/cell-mapping/
+├── cell-mapping-page.component.ts   # Cell mapping configuration UI
+├── cell-mapping-page.html
+└── cell-mapping-page.routes.ts
+
+services/
+├── k1-import-data.service.ts        # HTTP client for k1-import endpoints
+```
+
+## Verification Workflow
+
+1. **Upload**: User selects PDF → `POST /api/v1/k1-import/upload` → session created with status PROCESSING
+2. **Extract**: Backend detects PDF type (digital vs. scanned) → routes to appropriate extractor → status becomes EXTRACTED
+3. **Review**: Frontend polls/fetches session → displays verification screen with extracted fields, confidence indicators
+4. **Edit**: User corrects values, overrides labels → `PUT /api/v1/k1-import/:id/verify` → status becomes VERIFIED
+5. **Confirm**: User clicks "Confirm & Save" → `POST /api/v1/k1-import/:id/confirm` → KDocument + Distributions + Document created → status becomes CONFIRMED
+
+## Testing Strategy
+
+- **Unit tests**: Extractors (pdf-parse, azure, tesseract), field mapper, confidence scoring, allocation math
+- **Integration tests**: Full upload → extract → verify → confirm flow with test PDF fixtures
+- **Test fixtures**: Include sample K-1 PDFs (digital and scanned) in `test/import/` directory
+- **Allocation accuracy**: Verify rounding behavior — allocated amounts must sum exactly to partnership total
--- a/specs/004-k1-scan-import/research.md
+++ b/specs/004-k1-scan-import/research.md
@ -0,0 +1,154 @@
+# Research: K-1 PDF Scan Import
+
+**Phase 0 Output** | **Date**: 2026-03-18
+
+## Decision 1: PDF Text Extraction (Tier 1 — Digital PDFs)
+
+**Decision**: Use `pdf-parse` npm package for digitally-generated K-1 PDFs.
+
+**Rationale**: Digitally-generated PDFs from fund administrators contain embedded text. `pdf-parse` extracts this text losslessly, is free, fully self-hosted, and instant. It has 3M+ weekly npm downloads and a stable API. No external API calls needed.
+
+**Alternatives Considered**:
+- `pdfjs-dist` (Mozilla pdf.js) — lower-level, requires more boilerplate for text extraction; `pdf-parse` wraps this already.
+- Cloud OCR for all PDFs — unnecessary cost and latency for digital PDFs where text extraction is 100% accurate.
+
+---
+
+## Decision 2: OCR for Scanned PDFs (Tier 2)
+
+**Decision**: Use Azure AI Document Intelligence (Layout model) as primary Tier 2 provider, with `tesseract.js` as self-hosted fallback.
+
+**Rationale**:
+- Azure has the best tax-form pedigree among cloud providers (prebuilt IRS models for W-2, 1098, 1099)
+- Returns per-field confidence scores (0.0–1.0) natively, directly fulfilling FR-006/FR-009
+- 500 free pages/month covers typical family office volume (10–50 K-1s/year)
+- `@azure/ai-form-recognizer` has full TypeScript types, aligns with NestJS patterns
+- `tesseract.js` runs as WASM in Node.js (no system install), provides ~75% accuracy fallback
+
+**Alternatives Considered**:
+- Google Document AI — good form parsing but no tax-specific models, more expensive for custom processors ($30/1K pages)
+- AWS Textract — strong table extraction but less established for tax forms, requires IAM setup
+- Tesseract.js only — accuracy drops to 70–85% for clean scans, no layout understanding; acceptable as fallback but not primary
+
+---
+
+## Decision 3: Two-Tier Extraction Architecture
+
+**Decision**: Implement a PDF type detection step that routes digital PDFs to local extraction (free, instant) and scanned PDFs to cloud OCR.
+
+**Rationale**: Most K-1s from fund administrators are digitally generated. The two-tier approach avoids unnecessary API calls and costs for the majority case, while still supporting scanned documents.
+
+**Detection heuristic**: Extract text via `pdf-parse`; if extracted text length < 100 characters or does not contain K-1 keywords ("Schedule K-1", "Form 1065", "Partner's Share"), route to Tier 2 OCR.
+
+**Alternatives Considered**:
+- Cloud OCR for everything — simpler but adds cost ($0.15/page) and latency (3–10s) for digital PDFs that don't need it
+- Local OCR only (Tesseract.js) — insufficient accuracy (75%) for production tax data; too many manual corrections needed
+
+---
+
+## Decision 4: K-1 Box Extraction Strategy
+
+**Decision**: Use regex-based box extraction for Tier 1 (digital text), and key-value pair extraction from the OCR provider for Tier 2. Both feed into a shared K-1 field mapper that applies the cell mapping configuration.
+
+**Rationale**: The IRS Schedule K-1 (Form 1065) has a consistent, standardized layout:
+- Page 1: Header + Part I (partnership info) + Part II (partner info) + Boxes 1–11
+- Page 2: Boxes 12–20+ with code/sub-code details
+- Box values sit in a numbered two-column grid: number label → description → value field
+- Layout has been structurally stable for years, making template/regex extraction reliable
+
+**Challenges addressed**:
+- Multi-line sub-codes (Boxes 11, 13, 15, 16, 17, 18, 20) — handle by extracting code-letter/value pairs within each box section
+- Supplemental schedules — out of scope for V1 auto-extraction; captured as additional Document attachments
+- Multi-entity PDFs — detect via repeated "Schedule K-1" headers; split and process each K-1 separately
+
+**Alternatives Considered**:
+- Fixed coordinate-based extraction — too brittle across different PDF generators (varying margins, fonts)
+- Machine learning model — overkill for V1 given the standardized form layout
+
+---
+
+## Decision 5: Confidence Scoring Approach
+
+**Decision**: Three-level confidence display (High/Medium/Low) derived from extraction method and validation heuristics.
+
+**Rationale**:
+
+For **Tier 1** (digital text):
+- Base confidence: 0.90 (text extraction is inherently reliable)
+- +0.05 if box number regex matched cleanly
+- +0.05 if value format validated (currency, percentage, integer)
+- -0.10 to -0.30 for potential adjacent-box text contamination
+
+For **Tier 2** (cloud OCR):
+- Use Azure's native per-field confidence score directly
+- Layer cross-field validation (e.g., Box 6b ≤ Box 6a, sub-boxes sum to parent)
+
+**Display mapping**:
+- High (≥ 0.85): Green — no user attention needed
+- Medium (0.60–0.84): Yellow — optional review
+- Low (< 0.60): Red — highlighted, requires manual review (FR-009)
+
+**Alternatives Considered**:
+- Binary confidence (confident/not) — too coarse; doesn't guide the user's review attention
+- Numeric score display — too technical for a non-engineer user; three levels with color coding is more actionable
+
+---
+
+## Decision 6: New Database Models
+
+**Decision**: Add two new Prisma models (`K1ImportSession`, `CellMapping`) to support import tracking and cell mapping configuration, alongside the existing K-document models from spec 001.
+
+**Rationale**: 
+- `K1ImportSession` tracks the full import lifecycle (upload → processing → extracted → verified → confirmed/cancelled), enabling import history (FR-022) and re-processing (FR-023)
+- `CellMapping` stores per-partnership cell label customizations (FR-017 through FR-021) separate from the KDocument data itself
+
+**Alternatives Considered**:
+- Store import sessions as JSON metadata on KDocument — would conflate document data with import workflow state; makes import history harder to query
+- Store cell mappings as JSON on Partnership — would work but loses the ability to query/manage mappings independently and doesn't support a global default set
+
+---
+
+## Decision 7: File Storage
+
+**Decision**: Use the existing `uploads/` directory and `Document` model from spec 001. Uploaded K-1 PDFs are stored on the local filesystem, with metadata in the `Document` table.
+
+**Rationale**: The existing upload infrastructure (UploadController with `FileInterceptor`, Document model, `uploads/` directory) is already in place. No need to add a new storage mechanism.
+
+**Alternatives Considered**:
+- S3/cloud storage — would require new infrastructure; the self-hosted philosophy favors local storage
+- Database blob storage — increases database size and backup time for binary files
+
+---
+
+## Decision 8: New Environment Variables
+
+**Decision**: Add two optional environment variables for Azure Document Intelligence, following the existing `ConfigurationService` pattern with `str({ default: '' })`.
+
+```
+AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT  — Azure resource endpoint URL
+AZURE_DOCUMENT_INTELLIGENCE_KEY       — Azure API key
+```
+
+**Rationale**: When both are empty (default), the system falls back to `tesseract.js` for scanned PDFs. This makes Azure optional — the feature works fully self-hosted with degraded OCR accuracy.
+
+**Alternatives Considered**:
+- Separate feature flag — unnecessary; empty credentials are sufficient to indicate "not configured"
+- Google/AWS credentials — Azure recommended as primary; could add additional providers later
+
+---
+
+## Decision 9: New npm Dependencies
+
+**Decision**: Add the following packages:
+
+| Package | Purpose | Tier |
+|---|---|---|
+| `pdf-parse` | Text extraction from digital PDFs | Tier 1 (required) |
+| `@azure/ai-form-recognizer` | Cloud OCR for scanned PDFs | Tier 2 (optional) |
+| `tesseract.js` | Self-hosted OCR fallback | Tier 2 fallback |
+
+**Rationale**: `pdf-parse` is essential for the Tier 1 (free, local) path. Azure SDK is optional (only loaded when credentials are configured). `tesseract.js` provides a zero-config fallback that runs as WASM — no system dependencies needed, works in the existing `node:22-slim` Docker image.
+
+**Alternatives Considered**:
+- `pdfjs-dist` directly instead of `pdf-parse` — more boilerplate, `pdf-parse` wraps it with a simpler API
+- Only cloud OCR — loses the self-hosted story and adds cost for digital PDFs