You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 

5.9 KiB

Quickstart: K-1 PDF Scan Import

Phase 1 Output | Date: 2026-03-18 | Updated: 2026-03-18 (post-clarification)

Prerequisites

  1. Spec 001-family-office-transform models are implemented (Entity, Partnership, PartnershipMembership, KDocument, Distribution, Document)
  2. At least one Partnership with one or more member Entities exists in the database
  3. The existing upload infrastructure (UploadController, uploads/ directory) is functional
  4. Node.js ≥ 22.18.0, Docker for PostgreSQL/Redis

Environment Setup

Add to .env (optional — for Azure OCR of scanned PDFs):

AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-resource.cognitiveservices.azure.com/
AZURE_DOCUMENT_INTELLIGENCE_KEY=your-api-key

If these are empty, scanned PDFs fall back to tesseract.js (lower accuracy but fully self-hosted).

New Dependencies

npm install pdf-parse @azure/ai-form-recognizer tesseract.js
npm install -D @types/pdf-parse

Database Migration

After adding the new Prisma models (K1ImportSession, CellMapping, CellAggregationRule, K1ImportStatus enum):

npx prisma db push          # Development: sync schema
# OR
npx prisma migrate dev      # Create a migration file

Seed the default IRS cell mappings (28 rows with partnershipId = null) and default aggregation rules (e.g., "Total Ordinary Income", "Total Capital Gains", "Total Deductions") via the existing seed mechanism or a dedicated seed script.

Key Files to Create

Backend (apps/api/src/)

app/k1-import/
├── k1-import.module.ts              # NestJS module
├── k1-import.controller.ts          # REST endpoints (see contracts/k1-import-api.md)
├── k1-import.service.ts             # Orchestration: upload → extract → verify → confirm
├── dto/
│   ├── upload-k1.dto.ts             # Multipart upload DTO
│   ├── verify-k1.dto.ts             # Verification submission DTO
│   └── confirm-k1.dto.ts            # Confirmation request DTO
├── extractors/
│   ├── k1-extractor.interface.ts    # Common extraction interface
│   ├── pdf-parse-extractor.ts       # Tier 1: digital PDF text extraction
│   ├── azure-extractor.ts           # Tier 2: Azure Document Intelligence
│   └── tesseract-extractor.ts       # Tier 2 fallback: tesseract.js OCR
├── k1-field-mapper.service.ts       # Maps raw extraction → K1ExtractedField[]
├── k1-allocation.service.ts         # Allocates K-1 amounts to members by ownership %
├── k1-confidence.service.ts         # Computes confidence scores with validation heuristics
└── k1-aggregation.service.ts        # Dynamically computes aggregation summaries from rules

app/cell-mapping/
├── cell-mapping.module.ts           # NestJS module
├── cell-mapping.controller.ts       # CRUD for cell mappings + aggregation rules
└── cell-mapping.service.ts          # Cell mapping + aggregation rule business logic + seed data

Shared Types (libs/common/src/lib/)

interfaces/
├── k1-import.interface.ts           # K1ExtractionResult, K1ExtractedField, K1ConfirmationRequest
dtos/
├── k1-import/
│   ├── create-k1-import.dto.ts
│   ├── verify-k1-import.dto.ts
│   └── confirm-k1-import.dto.ts

Frontend (apps/client/src/app/)

pages/k1-import/
├── k1-import-page.component.ts      # Upload + history view
├── k1-import-page.html
├── k1-import-page.scss
├── k1-import-page.routes.ts
├── k1-verification/
│   ├── k1-verification.component.ts # Verification/edit screen (mapped + unmapped + aggregations)
│   ├── k1-verification.html
│   └── k1-verification.scss
└── k1-confirmation/
    ├── k1-confirmation.component.ts  # Confirmation result screen
    ├── k1-confirmation.html
    └── k1-confirmation.scss

pages/cell-mapping/
├── cell-mapping-page.component.ts   # Cell mapping + aggregation rule configuration UI
├── cell-mapping-page.html
└── cell-mapping-page.routes.ts

services/
├── k1-import-data.service.ts        # HTTP client for k1-import endpoints

Verification Workflow

  1. Upload: User selects PDF → POST /api/v1/k1-import/upload → session created with status PROCESSING
  2. Extract: Backend detects PDF type (digital vs. scanned) → routes to appropriate extractor → status becomes EXTRACTED
  3. Review: Frontend polls/fetches session → displays verification screen with:
    • Mapped cells: extracted fields with confidence indicators. High-confidence values are pre-accepted. Medium/low-confidence values require explicit review (acknowledge or edit).
    • Unmapped items: separate section for values that didn't match any cell. User assigns to a cell or discards.
    • Aggregation summaries: dynamically computed from mapped values using aggregation rules. Recalculate live when cell values are edited.
  4. Verify: User reviews all medium/low fields and resolves unmapped items → PUT /api/v1/k1-import/:id/verify → status becomes VERIFIED
  5. Confirm: User clicks "Confirm & Save" → POST /api/v1/k1-import/:id/confirm → KDocument + Distributions + Document created → status becomes CONFIRMED

Testing Strategy

  • Unit tests: Extractors (pdf-parse, azure, tesseract), field mapper, confidence scoring, allocation math, aggregation computation
  • Integration tests: Full upload → extract → verify → confirm flow with test PDF fixtures
  • Test fixtures: Include sample K-1 PDFs (digital and scanned) in test/import/ directory
  • Allocation accuracy: Verify rounding behavior — allocated amounts must sum exactly to partnership total
  • Aggregation tests: Verify dynamic computation from rules, auto-recalculation on value edit, behavior when source cells are empty
  • Review enforcement: Verify confirmation blocked when medium/low-confidence fields not reviewed or unmapped items unresolved