The Embedding Strategy: Capturing Meaning Across Domains

Semantic search is Tessera’s primary retrieval mechanism for conceptual matching, and the embedding strategy required careful thought. Off-the-shelf embeddings are trained on generic text. Tessera’s corpus is anything but generic.

The Domain Collision Problem

Standard embedding models cluster text by topic. Legal documents near legal documents. Technical discussions near technical discussions. But Tessera’s value is in finding structural similarity across domains. An email about legal risk and an email about technical debt might be structurally identical: irreversible commitment, asymmetric downside, stakeholder misalignment about severity.

I needed embeddings that capture decision structure, not topic. This required fine-tuning the embedding model on my corpus with a loss function that rewards structural similarity and penalizes surface-topic clustering.

The Fine-Tuning Approach

I built a contrastive training set from the extracted decisions. Pairs of decisions that share structural properties across different domains are positive examples. Decisions in the same domain with different structures are hard negatives. The model learns to embed the shape of the decision rather than the vocabulary of the domain.

The base model is small enough to run on CPU for inference, approximately 400MB, maintaining the air-gap and commodity-hardware requirements. Fine-tuning completed in six hours on a single consumer GPU.

Results

Cross-domain retrieval accuracy improved dramatically. A query framed as a technical architecture question now retrieves relevant legal and operational precedents when the decision structure matches. This is the integration layer made computational. Tessera finds precedent the way I find precedent: by recognizing the shape of the problem, not the vocabulary.