# Quickstart: K-1 PDF Scan Import **Phase 1 Output** | **Date**: 2026-03-18 | **Updated**: 2026-03-18 (post-clarification) ## Prerequisites 1. Spec 001-family-office-transform models are implemented (Entity, Partnership, PartnershipMembership, KDocument, Distribution, Document) 2. At least one Partnership with one or more member Entities exists in the database 3. The existing upload infrastructure (`UploadController`, `uploads/` directory) is functional 4. Node.js ≥ 22.18.0, Docker for PostgreSQL/Redis ## Environment Setup Add to `.env` (optional — for Azure OCR of scanned PDFs): ``` AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-resource.cognitiveservices.azure.com/ AZURE_DOCUMENT_INTELLIGENCE_KEY=your-api-key ``` If these are empty, scanned PDFs fall back to `tesseract.js` (lower accuracy but fully self-hosted). ## New Dependencies ```bash npm install pdf-parse @azure/ai-form-recognizer tesseract.js npm install -D @types/pdf-parse ``` ## Database Migration After adding the new Prisma models (`K1ImportSession`, `CellMapping`, `CellAggregationRule`, `K1ImportStatus` enum): ```bash npx prisma db push # Development: sync schema # OR npx prisma migrate dev # Create a migration file ``` Seed the default IRS cell mappings (28 rows with partnershipId = null) and default aggregation rules (e.g., "Total Ordinary Income", "Total Capital Gains", "Total Deductions") via the existing seed mechanism or a dedicated seed script. ## Key Files to Create ### Backend (apps/api/src/) ``` app/k1-import/ ├── k1-import.module.ts # NestJS module ├── k1-import.controller.ts # REST endpoints (see contracts/k1-import-api.md) ├── k1-import.service.ts # Orchestration: upload → extract → verify → confirm ├── dto/ │ ├── upload-k1.dto.ts # Multipart upload DTO │ ├── verify-k1.dto.ts # Verification submission DTO │ └── confirm-k1.dto.ts # Confirmation request DTO ├── extractors/ │ ├── k1-extractor.interface.ts # Common extraction interface │ ├── pdf-parse-extractor.ts # Tier 1: digital PDF text extraction │ ├── azure-extractor.ts # Tier 2: Azure Document Intelligence │ └── tesseract-extractor.ts # Tier 2 fallback: tesseract.js OCR ├── k1-field-mapper.service.ts # Maps raw extraction → K1ExtractedField[] ├── k1-allocation.service.ts # Allocates K-1 amounts to members by ownership % ├── k1-confidence.service.ts # Computes confidence scores with validation heuristics └── k1-aggregation.service.ts # Dynamically computes aggregation summaries from rules app/cell-mapping/ ├── cell-mapping.module.ts # NestJS module ├── cell-mapping.controller.ts # CRUD for cell mappings + aggregation rules └── cell-mapping.service.ts # Cell mapping + aggregation rule business logic + seed data ``` ### Shared Types (libs/common/src/lib/) ``` interfaces/ ├── k1-import.interface.ts # K1ExtractionResult, K1ExtractedField, K1ConfirmationRequest dtos/ ├── k1-import/ │ ├── create-k1-import.dto.ts │ ├── verify-k1-import.dto.ts │ └── confirm-k1-import.dto.ts ``` ### Frontend (apps/client/src/app/) ``` pages/k1-import/ ├── k1-import-page.component.ts # Upload + history view ├── k1-import-page.html ├── k1-import-page.scss ├── k1-import-page.routes.ts ├── k1-verification/ │ ├── k1-verification.component.ts # Verification/edit screen (mapped + unmapped + aggregations) │ ├── k1-verification.html │ └── k1-verification.scss └── k1-confirmation/ ├── k1-confirmation.component.ts # Confirmation result screen ├── k1-confirmation.html └── k1-confirmation.scss pages/cell-mapping/ ├── cell-mapping-page.component.ts # Cell mapping + aggregation rule configuration UI ├── cell-mapping-page.html └── cell-mapping-page.routes.ts services/ ├── k1-import-data.service.ts # HTTP client for k1-import endpoints ``` ## Verification Workflow 1. **Upload**: User selects PDF → `POST /api/v1/k1-import/upload` → session created with status PROCESSING 2. **Extract**: Backend detects PDF type (digital vs. scanned) → routes to appropriate extractor → status becomes EXTRACTED 3. **Review**: Frontend polls/fetches session → displays verification screen with: - **Mapped cells**: extracted fields with confidence indicators. High-confidence values are pre-accepted. Medium/low-confidence values require explicit review (acknowledge or edit). - **Unmapped items**: separate section for values that didn't match any cell. User assigns to a cell or discards. - **Aggregation summaries**: dynamically computed from mapped values using aggregation rules. Recalculate live when cell values are edited. 4. **Verify**: User reviews all medium/low fields and resolves unmapped items → `PUT /api/v1/k1-import/:id/verify` → status becomes VERIFIED 5. **Confirm**: User clicks "Confirm & Save" → `POST /api/v1/k1-import/:id/confirm` → KDocument + Distributions + Document created → status becomes CONFIRMED ## Testing Strategy - **Unit tests**: Extractors (pdf-parse, azure, tesseract), field mapper, confidence scoring, allocation math, aggregation computation - **Integration tests**: Full upload → extract → verify → confirm flow with test PDF fixtures - **Test fixtures**: Include sample K-1 PDFs (digital and scanned) in `test/import/` directory - **Allocation accuracy**: Verify rounding behavior — allocated amounts must sum exactly to partnership total - **Aggregation tests**: Verify dynamic computation from rules, auto-recalculation on value edit, behavior when source cells are empty - **Review enforcement**: Verify confirmation blocked when medium/low-confidence fields not reviewed or unmapped items unresolved