Local LLMs and Tessera: Why I Will Never Call an External API

The AI industry’s default assumption is cloud. Need language understanding? Call OpenAI. Need embeddings? Hit the API. Need synthesis? Send your text to someone else’s server and trust their terms of service.

Tessera will never do this. Not as a preference. As a design constraint baked into the architecture.

The Governance Argument

I have been developing the Ethical AI Implementation Architecture specifically to address how organizations should govern AI systems processing sensitive data. Transparency. Privacy. Accountability. Every principle I have articulated argues against sending the most sensitive decision-making corpus imaginable to an external API.

An external API call with Tessera’s data would be a sovereignty breach. The content is not just emails. It is the extracted decision patterns, risk postures, and judgment heuristics of a person operating across legal, technical, commercial, and ethical domains simultaneously. Feeding that to a cloud provider trains their model on my competitive advantage.

The Technical Reality Makes This Possible

For Tessera’s use cases, summarization, decision extraction, semantic search, and precedent-grounded synthesis, a 7-billion parameter model running locally outperforms a cloud API call in every dimension that matters: latency, cost, privacy, and availability.

The integration layer will be pluggable. Initial support for llama.cpp-compatible models through Python bindings. The interface abstracts model selection, allowing different models for different tasks. The models ship with the deployment or load from a local directory. No downloads. No license checks. No phone-home behavior.

The intelligence runs where the data lives. That is the only architecture that respects what this data actually is.