You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 

11 KiB

Data Model: K-1 Normalized Data Model

Feature Branch: 006-k1-model-review | Date: 2026-03-20 Research: research.md | Spec: spec.md


Entity Overview

                    ┌──────────────────┐
                    │   Partnership    │  (existing dimension)
                    └────────┬─────────┘
                             │
  ┌──────────────┐    ┌──────┴──────────┐    ┌──────────────────┐
  │   Entity     │────│   KDocument     │────│ K1BoxDefinition  │  (NEW — reference)
  │  (existing)  │    │   (existing)    │    │  PK = boxKey     │
  └──────────────┘    └────────┬────────┘    └────────┬─────────┘
                               │                      │
                        ┌──────┴──────────┐    ┌──────┴──────────┐
                        │  K1LineItem     │    │  K1BoxOverride  │  (NEW — per-partnership)
                        │  (NEW — fact)   │    │  display overrides
                        └─────────────────┘    └─────────────────┘

New Entities

K1BoxDefinition (Reference / Dimension)

Replaces the global rows of CellMapping. One row per unique IRS K-1 box identifier. Serves as the FK target for K1LineItem.boxKey.

Column Type Constraints Notes
boxKey String PK IRS box identifier: "1", "9a", "20-A", "11-ZZ*"
label String NOT NULL Human-readable: "Ordinary business income (loss)"
section String? HEADER, PART_I, PART_II, SECTION_J, SECTION_K, SECTION_L, SECTION_M, SECTION_N, PART_III
dataType String DEFAULT "number" number, string, percentage, boolean
sortOrder Int NOT NULL Display ordering (matches IRS form order)
irsFormLine String? "Box 1", "Section J, Line 1", "Part I, Line A"
description String? Extended description for LLM context
isCustom Boolean DEFAULT false true for auto-created box keys not in IRS standard set (FR-017)
createdAt DateTime DEFAULT now()
updatedAt DateTime @updatedAt

Indexes: @@index([section]), @@index([sortOrder]) Mapped name: k1_box_definition

Seed data: Migrated from IRS_DEFAULT_MAPPINGS array in cell-mapping.service.ts (80+ entries). Mapping: boxNumberboxKey, labellabel, cellTypedataType, sortOrdersortOrder, descriptiondescription. Section derived from sortOrder ranges (0–9 = HEADER, 10–19 = PART_I, 20–29 = PART_II, 30–39 = SECTION_J, 40–49 = SECTION_K, 50–59 = SECTION_L, 60–63 = SECTION_M/N, 100+ = PART_III).


K1BoxOverride (Per-Partnership Display Override)

Replaces the per-partnership rows of CellMapping. Controls display customization without affecting data integrity.

Column Type Constraints Notes
id String PK (UUID)
boxKey String FK → K1BoxDefinition.boxKey Which box to override
partnershipId String FK → Partnership.id Which partnership
customLabel String? Override display label
isIgnored Boolean DEFAULT false Hide this box for this partnership
createdAt DateTime DEFAULT now()
updatedAt DateTime @updatedAt

Unique: @@unique([boxKey, partnershipId]) Indexes: @@index([partnershipId]) Mapped name: k1_box_override On delete: CASCADE from both K1BoxDefinition and Partnership


K1LineItem (Fact Table)

One financial line item per box per K-1 document. Core normalized data store replacing KDocument.data JSON.

Column Type Constraints Notes
id String PK (UUID)
kDocumentId String FK → KDocument.id Which K-1 document
boxKey String FK → K1BoxDefinition.boxKey Which IRS box
amount Decimal? @db.Decimal(15,2) Dollar amount. NULL for non-numeric values.
textValue String? Non-numeric values: "SEE STMT", "true", etc.
rawText String? Original extracted text before parsing
confidence Decimal? @db.Decimal(3,2) OCR confidence 0.00–1.00. NULL if manual entry.
sourcePage Int? PDF page number where extracted
sourceCoords Json? {x, y, width, height} bounding box on page
isUserEdited Boolean DEFAULT false True if user modified during verification
isSuperseded Boolean DEFAULT false True if replaced by a newer version (ESTIMATED→FINAL)
createdAt DateTime DEFAULT now()
updatedAt DateTime @updatedAt

Partial unique index (raw SQL, not expressible in Prisma):

CREATE UNIQUE INDEX "k1_line_item_active_unique"
  ON "k1_line_item" ("k_document_id", "box_key")
  WHERE "is_superseded" = false;

Indexes: @@index([kDocumentId, boxKey]), @@index([kDocumentId]), @@index([boxKey]), @@index([isSuperseded]) Mapped name: k1_line_item On delete: CASCADE from KDocument


Modified Entities

KDocument (existing — minimal changes)

Change Detail
Add lineItems relation K1LineItem[] — reverse relation for Prisma
data column Retained as-is — immutable JSON archive (FR-008)
No column drops data, previousData preserved permanently

CellAggregationRule (existing — update references)

Change Detail
sourceCells JSON Values are already strings like ["1", "8", "9a"] — these match K1BoxDefinition.boxKey directly
No schema change The sourceCells array doesn't need migration; it naturally references boxKey strings

CellMapping (existing — to be dropped)

Phase Action
Phase 1 (additive) Leave in place alongside K1BoxDefinition
After backfill verified Drop table via migration. Remove CellMappingService, CellMappingController, CellMappingModule

Prisma Schema

/// Global IRS K-1 box reference. One row per unique box identifier.
/// Replaces the global (partnershipId = null) CellMapping rows.
/// NOTE: COMMENT ON annotations added in migration SQL for LLM discoverability.
model K1BoxDefinition {
  boxKey      String   @id @map("box_key")
  label       String
  section     String?
  dataType    String   @default("number") @map("data_type")
  sortOrder   Int      @map("sort_order")
  irsFormLine String?  @map("irs_form_line")
  description String?
  isCustom    Boolean  @default(false) @map("is_custom")
  createdAt   DateTime @default(now()) @map("created_at")
  updatedAt   DateTime @updatedAt @map("updated_at")

  lineItems   K1LineItem[]
  overrides   K1BoxOverride[]

  @@map("k1_box_definition")
  @@index([section])
  @@index([sortOrder])
}

/// Per-partnership display overrides for a K1BoxDefinition.
/// Controls custom labels, ignored status, etc. Does NOT affect data integrity.
/// Replaces the per-partnership (partnershipId != null) CellMapping rows.
model K1BoxOverride {
  id            String          @id @default(uuid())
  boxKey        String          @map("box_key")
  boxDefinition K1BoxDefinition @relation(fields: [boxKey], references: [boxKey], onDelete: Cascade)
  partnershipId String          @map("partnership_id")
  partnership   Partnership     @relation(fields: [partnershipId], references: [id], onDelete: Cascade)
  customLabel   String?         @map("custom_label")
  isIgnored     Boolean         @default(false) @map("is_ignored")
  createdAt     DateTime        @default(now()) @map("created_at")
  updatedAt     DateTime        @updatedAt @map("updated_at")

  @@unique([boxKey, partnershipId])
  @@map("k1_box_override")
  @@index([partnershipId])
}

/// Individual financial line item from an IRS Schedule K-1.
/// Fact table: one row per box per K-1 document.
/// NOTE: Partial unique index "k1_line_item_active_unique" on (k_document_id, box_key)
/// WHERE is_superseded = false — managed in migration SQL, not expressible in Prisma.
model K1LineItem {
  id            String          @id @default(uuid())
  kDocumentId   String          @map("k_document_id")
  kDocument     KDocument       @relation(fields: [kDocumentId], references: [id], onDelete: Cascade)
  boxKey        String          @map("box_key")
  boxDefinition K1BoxDefinition @relation(fields: [boxKey], references: [boxKey])
  amount        Decimal?        @db.Decimal(15, 2)
  textValue     String?         @map("text_value")
  rawText       String?         @map("raw_text")
  confidence    Decimal?        @db.Decimal(3, 2)
  sourcePage    Int?            @map("source_page")
  sourceCoords  Json?           @map("source_coords")
  isUserEdited  Boolean         @default(false) @map("is_user_edited")
  isSuperseded  Boolean         @default(false) @map("is_superseded")
  createdAt     DateTime        @default(now()) @map("created_at")
  updatedAt     DateTime        @updatedAt @map("updated_at")

  @@map("k1_line_item")
  @@index([kDocumentId, boxKey])
  @@index([kDocumentId])
  @@index([boxKey])
  @@index([isSuperseded])
}

Relations to add to existing models:

// In model KDocument — add:
lineItems K1LineItem[]

// In model Partnership — add:
boxOverrides K1BoxOverride[]

State Transitions

K1LineItem Versioning (ESTIMATED → FINAL)

State 1: ESTIMATED K-1 confirmed
  K1LineItem rows created with isSuperseded = false

State 2: FINAL K-1 imported for same KDocument
  1. UPDATE K1LineItem SET isSuperseded = true WHERE kDocumentId = X AND isSuperseded = false
  2. INSERT new K1LineItem rows with isSuperseded = false
  3. Old rows preserved for audit trail

Query pattern (always):
  WHERE isSuperseded = false

CellMapping → K1BoxDefinition Migration

Phase 1: Both tables exist
  - K1BoxDefinition seeded from IRS_DEFAULT_MAPPINGS
  - CellMapping continues to serve existing code

Phase 2: Dual-read
  - New code reads K1BoxDefinition
  - Old code gradually migrated

Phase 3: CellMapping dropped
  - Migration removes table
  - Service/controller/module deleted

Validation Rules

Entity Rule Enforcement
K1BoxDefinition boxKey is non-empty string Application + PK constraint
K1BoxDefinition dataType{number, string, percentage, boolean} Application validation
K1LineItem amount XOR textValue populated (not both null, not both non-null for numeric types) Application validation layer
K1LineItem confidence ∈ [0.00, 1.00] Application validation
K1LineItem boxKey exists in K1BoxDefinition FK constraint (database)
K1LineItem At most 1 active row per (kDocumentId, boxKey) Partial unique index (database)
K1BoxOverride One override per (boxKey, partnershipId) @@unique constraint (database)