I started with off-the-shelf embedding models. They are trained on web-scale data and produce excellent general-purpose vector representations. They also systematically fail on personal knowledge in ways that took me weeks to diagnose.
The problem is domain specificity. When I write “the CW integration broke the Automate deployment,” every word in that sentence has a specific meaning in my context that differs from the general-purpose meaning. “CW” is ConnectWise. “Automate” is a specific product, not a verb. “Broke” means introduced a regression, not a physical break. Generic embeddings encode the general meanings and miss the specific ones.
The Fine-Tuning Approach
I fine-tuned the embedding model on my corpus using contrastive learning. Pairs of artifacts that I judged to be related (based on graph proximity) were used as positive pairs. Artifacts with no graph connection were used as negative pairs. The fine-tuned model learned that documents about ConnectWise Automate should be close to documents about RMM deployments, even though the surface-level language might be completely different.
The improvement was substantial. Retrieval accuracy on a test set of queries went from sixty-one percent with generic embeddings to seventy-eight percent with fine-tuned embeddings. Combined with the graph and lexical layers in trimodal retrieval, the overall system accuracy reached eighty-seven percent.
The fine-tuning process itself was informative. The pairs that the model struggled with most were cross-domain references: a security decision that was relevant to a managed services workflow, or a personal reflection that contained insight about a professional pattern. These cross-domain connections are exactly the ones that make a life assistant valuable, and they require embeddings that can bridge domains rather than staying within them.
Incremental Updates
The embedding model cannot be re-fine-tuned every time new artifacts arrive. The computational cost is too high for a local system. Instead, I use a layered approach: the base model is fine-tuned periodically (quarterly), and new artifacts are embedded using the current model with a lightweight adapter that adjusts for recent vocabulary and relationship patterns.
The adapter is small, trainable in minutes rather than hours, and captures the most recent shifts in my language and context without requiring full model retraining. It is a practical compromise between embedding quality and computational feasibility on an air-gapped system.
This is one of the recurring themes in Tessera’s development: the optimal architecture is not the one that performs best on paper. It is the one that performs well enough within the constraints of local hardware, air-gapped operation, and a single developer’s capacity to maintain it. Pragmatism over perfection, every time.