5.9 KiB

Raw Blame History

Quickstart: K-1 PDF Scan Import

Phase 1 Output | Date: 2026-03-18 | Updated: 2026-03-18 (post-clarification)

Prerequisites

Spec 001-family-office-transform models are implemented (Entity, Partnership, PartnershipMembership, KDocument, Distribution, Document)
At least one Partnership with one or more member Entities exists in the database
The existing upload infrastructure (UploadController, uploads/ directory) is functional
Node.js ≥ 22.18.0, Docker for PostgreSQL/Redis

Environment Setup

Add to .env (optional — for Azure OCR of scanned PDFs):

AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-resource.cognitiveservices.azure.com/
AZURE_DOCUMENT_INTELLIGENCE_KEY=your-api-key

If these are empty, scanned PDFs fall back to tesseract.js (lower accuracy but fully self-hosted).

New Dependencies

npm install pdf-parse @azure/ai-form-recognizer tesseract.js
npm install -D @types/pdf-parse

Database Migration

After adding the new Prisma models (K1ImportSession, CellMapping, CellAggregationRule, K1ImportStatus enum):

npx prisma db push          # Development: sync schema
# OR
npx prisma migrate dev      # Create a migration file

Seed the default IRS cell mappings (28 rows with partnershipId = null) and default aggregation rules (e.g., "Total Ordinary Income", "Total Capital Gains", "Total Deductions") via the existing seed mechanism or a dedicated seed script.

Key Files to Create

Backend (apps/api/src/)

app/k1-import/
├── k1-import.module.ts              # NestJS module
├── k1-import.controller.ts          # REST endpoints (see contracts/k1-import-api.md)
├── k1-import.service.ts             # Orchestration: upload → extract → verify → confirm
├── dto/
│   ├── upload-k1.dto.ts             # Multipart upload DTO
│   ├── verify-k1.dto.ts             # Verification submission DTO
│   └── confirm-k1.dto.ts            # Confirmation request DTO
├── extractors/
│   ├── k1-extractor.interface.ts    # Common extraction interface
│   ├── pdf-parse-extractor.ts       # Tier 1: digital PDF text extraction
│   ├── azure-extractor.ts           # Tier 2: Azure Document Intelligence
│   └── tesseract-extractor.ts       # Tier 2 fallback: tesseract.js OCR
├── k1-field-mapper.service.ts       # Maps raw extraction → K1ExtractedField[]
├── k1-allocation.service.ts         # Allocates K-1 amounts to members by ownership %
├── k1-confidence.service.ts         # Computes confidence scores with validation heuristics
└── k1-aggregation.service.ts        # Dynamically computes aggregation summaries from rules

app/cell-mapping/
├── cell-mapping.module.ts           # NestJS module
├── cell-mapping.controller.ts       # CRUD for cell mappings + aggregation rules
└── cell-mapping.service.ts          # Cell mapping + aggregation rule business logic + seed data

Shared Types (libs/common/src/lib/)

interfaces/
├── k1-import.interface.ts           # K1ExtractionResult, K1ExtractedField, K1ConfirmationRequest
dtos/
├── k1-import/
│   ├── create-k1-import.dto.ts
│   ├── verify-k1-import.dto.ts
│   └── confirm-k1-import.dto.ts

Frontend (apps/client/src/app/)

pages/k1-import/
├── k1-import-page.component.ts      # Upload + history view
├── k1-import-page.html
├── k1-import-page.scss
├── k1-import-page.routes.ts
├── k1-verification/
│   ├── k1-verification.component.ts # Verification/edit screen (mapped + unmapped + aggregations)
│   ├── k1-verification.html
│   └── k1-verification.scss
└── k1-confirmation/
    ├── k1-confirmation.component.ts  # Confirmation result screen
    ├── k1-confirmation.html
    └── k1-confirmation.scss

pages/cell-mapping/
├── cell-mapping-page.component.ts   # Cell mapping + aggregation rule configuration UI
├── cell-mapping-page.html
└── cell-mapping-page.routes.ts

services/
├── k1-import-data.service.ts        # HTTP client for k1-import endpoints

Verification Workflow

Upload: User selects PDF → POST /api/v1/k1-import/upload → session created with status PROCESSING
Extract: Backend detects PDF type (digital vs. scanned) → routes to appropriate extractor → status becomes EXTRACTED
Review: Frontend polls/fetches session → displays verification screen with:
- Mapped cells: extracted fields with confidence indicators. High-confidence values are pre-accepted. Medium/low-confidence values require explicit review (acknowledge or edit).
- Unmapped items: separate section for values that didn't match any cell. User assigns to a cell or discards.
- Aggregation summaries: dynamically computed from mapped values using aggregation rules. Recalculate live when cell values are edited.
Verify: User reviews all medium/low fields and resolves unmapped items → PUT /api/v1/k1-import/:id/verify → status becomes VERIFIED
Confirm: User clicks "Confirm & Save" → POST /api/v1/k1-import/:id/confirm → KDocument + Distributions + Document created → status becomes CONFIRMED

Testing Strategy

Unit tests: Extractors (pdf-parse, azure, tesseract), field mapper, confidence scoring, allocation math, aggregation computation
Integration tests: Full upload → extract → verify → confirm flow with test PDF fixtures
Test fixtures: Include sample K-1 PDFs (digital and scanned) in test/import/ directory
Allocation accuracy: Verify rounding behavior — allocated amounts must sum exactly to partnership total
Aggregation tests: Verify dynamic computation from rules, auto-recalculation on value edit, behavior when source cells are empty
Review enforcement: Verify confirmation blocked when medium/low-confidence fields not reviewed or unmapped items unresolved

5.9 KiB Raw Blame History