The ingestion pipeline is operational. Three weeks of building, testing, and refining against real data from my corpus. The results are better than expected, but the challenges were different from what I anticipated.
Pipeline Architecture
Stage 1: Format Normalization. Twenty-three years of digital artifacts means every email client, every chat platform, every file format from the last two decades. PST archives from Outlook. EML files. Chat exports from platforms that no longer exist. Voice transcripts in varying quality. Journals in plain text, Markdown, and proprietary formats. Everything normalizes to a common structure with metadata preservation.
Stage 2: Decision Identification. This is where it gets interesting. NLP models identify decision moments within each artifact. Not every sentence is a decision. Most text is context, narrative, or administrative noise. The extraction layer identifies the moments where a choice was made, a position was taken, a direction was set, or a reversal occurred.
Stage 3: Context Reconstruction. Each identified decision is enriched with its surrounding context. What signals preceded it? What constraints were explicit? What alternatives were mentioned or implied? What stakeholders were involved? What was the apparent risk tolerance?
Stage 4: Graph Integration. Decisions with their contexts are added to the lattice. Edges form between structurally similar decisions across domains and time periods.
What Surprised Me
The journal entries are more valuable than the emails. Emails capture what I decided and communicated. Journals capture why. The internal reasoning that preceded or followed action. Having both layers, the external action and the internal reasoning, gives Tessera something no system trained only on communications can achieve: the ability to connect what was decided with why it was decided.