# Implementation Plan: K-1 PDF Scan Import

**Branch**: `004-k1-scan-import` | **Date**: 2026-03-18 | **Spec**: [spec.md](spec.md)
**Input**: Feature specification from `/specs/004-k1-scan-import/spec.md`

## Summary

Automated K-1 PDF scanning that extracts structured IRS Schedule K-1 (Form 1065) data from uploaded PDFs, presents a verification screen with auto-accepted high-confidence values and explicit review for medium/low-confidence fields, and auto-creates downstream model objects (KDocument, Distributions, member allocations, Document). Uses a two-tier extraction approach: `pdf-parse` for digital PDFs (free, instant, local) and Azure AI Document Intelligence / `tesseract.js` fallback for scanned PDFs. Supports per-partnership cell mapping customization, administrator-defined aggregation rules (dynamically computed summaries displayed on verification screen and KDocument detail view), an "Unmapped Items" section for unrecognized extractions, and import history with re-processing.

## Technical Context

**Language/Version**: TypeScript 5.9.2, Node.js ≥ 22.18.0
**Primary Dependencies**: NestJS 11.x (backend), Angular 21.x (frontend), Prisma 6.x (ORM), pdf-parse (PDF text), @azure/ai-form-recognizer (cloud OCR), tesseract.js (local OCR fallback)
**Storage**: PostgreSQL via Prisma (structured data), local filesystem `uploads/` (PDF files)
**Testing**: Jest (unit + integration), test K-1 PDF fixtures in `test/import/`
**Target Platform**: Docker (node:22-slim), self-hosted or Railway
**Project Type**: Web application (NestJS API + Angular SPA) — Nx monorepo
**Performance Goals**: PDF extraction < 30 seconds (SC-001), model creation < 5 seconds (SC-005), 90%+ accuracy for digital PDFs (SC-002)
**Constraints**: Self-hosted capable (Azure OCR optional), max PDF size 25 MB, K-1 Form 1065 only (V1)
**Scale/Scope**: Single family office (10–50 partnerships, 10–50 K-1s/year), 2 new API modules, 4 new frontend pages

## Constitution Check

_GATE: Must pass before Phase 0 research. Re-check after Phase 1 design._

No constitution.md exists for this project. Gates assessed against standard engineering principles:

| Gate | Status | Notes |
|------|--------|-------|
| No unnecessary dependencies | PASS | 3 new packages (`pdf-parse`, `@azure/ai-form-recognizer`, `tesseract.js`) — each serves a distinct, justified purpose per research.md |
| Follows existing patterns | PASS | New NestJS modules follow existing controller/service/DTO pattern (mirrors `k-document`, `upload` modules) |
| No breaking changes | PASS | 3 new Prisma models + 1 enum, back-references only on existing models — no column changes |
| Test coverage | PASS | Unit tests for extractors, mapper, allocation, aggregation; integration tests for full pipeline |
| Self-hosted compatible | PASS | Core extraction (pdf-parse) is fully local; Azure is optional with tesseract.js fallback |

**Post-Phase 1 re-check**: PASS — data model adds 3 models/1 enum (K1ImportSession, CellMapping, CellAggregationRule, K1ImportStatus). No existing schema changes beyond back-references. API contracts follow existing REST patterns. Aggregation rules are dynamically computed — no stored denormalization. No violations identified.

## Project Structure

### Documentation (this feature)

```text
specs/004-k1-scan-import/
├── plan.md              # This file
├── research.md          # Phase 0: OCR provider research & decisions
├── data-model.md        # Phase 1: K1ImportSession, CellMapping, CellAggregationRule models
├── quickstart.md        # Phase 1: Setup & dev guide
├── contracts/
│   └── k1-import-api.md # Phase 1: REST API contracts
├── checklists/
│   └── requirements.md  # Spec quality checklist
└── tasks.md             # Phase 2 output (created by /speckit.tasks)
```

### Source Code (repository root)

```text
apps/api/src/app/
├── k1-import/
│   ├── k1-import.module.ts
│   ├── k1-import.controller.ts
│   ├── k1-import.service.ts
│   ├── dto/
│   │   ├── upload-k1.dto.ts
│   │   ├── verify-k1.dto.ts
│   │   └── confirm-k1.dto.ts
│   ├── extractors/
│   │   ├── k1-extractor.interface.ts
│   │   ├── pdf-parse-extractor.ts
│   │   ├── azure-extractor.ts
│   │   └── tesseract-extractor.ts
│   ├── k1-field-mapper.service.ts
│   ├── k1-allocation.service.ts
│   ├── k1-confidence.service.ts
│   └── k1-aggregation.service.ts       # Dynamically computes aggregation summaries
├── cell-mapping/
│   ├── cell-mapping.module.ts
│   ├── cell-mapping.controller.ts       # Cell mapping + aggregation rule CRUD
│   └── cell-mapping.service.ts

apps/client/src/app/
├── pages/
│   ├── k1-import/
│   │   ├── k1-import-page.component.ts
│   │   ├── k1-import-page.html
│   │   ├── k1-import-page.scss
│   │   ├── k1-import-page.routes.ts
│   │   ├── k1-verification/
│   │   │   ├── k1-verification.component.ts   # Mapped cells + unmapped items + aggregations
│   │   │   ├── k1-verification.html
│   │   │   └── k1-verification.scss
│   │   └── k1-confirmation/
│   │       ├── k1-confirmation.component.ts
│   │       ├── k1-confirmation.html
│   │       └── k1-confirmation.scss
│   ├── cell-mapping/
│   │   ├── cell-mapping-page.component.ts     # Cell mapping + aggregation rule config
│   │   ├── cell-mapping-page.html
│   │   └── cell-mapping-page.routes.ts
│   └── k-document/                             # Existing page — extended
│       └── k-document-detail/                  # Add aggregation summary section (FR-036)
├── services/
│   └── k1-import-data.service.ts

libs/common/src/lib/
├── interfaces/
│   └── k1-import.interface.ts
├── dtos/
│   └── k1-import/
│       ├── create-k1-import.dto.ts
│       ├── verify-k1-import.dto.ts
│       └── confirm-k1-import.dto.ts

prisma/
├── schema.prisma                    # + K1ImportSession, CellMapping, CellAggregationRule, K1ImportStatus
├── migrations/
│   └── 2026XXXX_added_k1_import/    # New migration

test/import/
├── sample-k1-digital.pdf            # Test fixture: digital K-1
└── sample-k1-scanned.pdf            # Test fixture: scanned K-1
```

**Structure Decision**: Follows the existing Nx monorepo convention with new NestJS modules under `apps/api/src/app/` and new Angular pages under `apps/client/src/app/pages/`. Shared interfaces and DTOs in `libs/common/`. This mirrors the existing `k-document`, `upload`, and `family-office` module patterns. The KDocument detail view is extended (not replaced) to display aggregation summaries.