Your AI model is unauditable by design. We fixed that.

Production inference runs inside frameworks that can't tell you what they computed, in what order, or why. We built a sealed binary runtime that can—using a minimal set of primitive operations. BERT runs today. Here's the proof.

Why the name

The coelacanth was believed gone for tens of millions of years—then rediscovered, alive and unchanged. Coelanox takes that as a product metaphor: an invisible layer that sits in the stack without fanfare, does one job with discipline, and shows its full value when someone asks hard questions—integrity, lineage, what actually ran. Not a dashboard. A stratum.

The responsibility gap

When a radiologist uses an AI model to flag a tumour, when a bank's model denies a loan, when an onboard classifier decides what it's looking at—someone is legally and ethically responsible for what that model computed. Not just the output. The computation itself.

Right now, they can't verify it. Not really. PyTorch, TensorFlow, ONNX Runtime—every major inference stack is a general-purpose computation engine wrapped in Python, dependency trees, and runtime dispatch. They were built for flexibility, not verifiability. You get a number back. You don't get a complete, tamper-evident record of every operation that produced it.

Explainability tools sit on top and interpret what happened. That's archaeology, not auditing. For regulated industries—medical devices, financial decisioning, autonomous systems, defence—the gap between "we have a model" and "we can prove what our model did" is becoming a serious liability.

The framework problem is architectural

Frameworks are expressive: dynamic graphs, callbacks, runtime kernel selection. That expressiveness is exactly what makes them unauditable. The execution path is determined at runtime by dispatchers, backends, and whatever kernels were selected. There is no single, inspectable execution record—by design.

That is not a criticism. They are extraordinary engineering. They are the wrong tool when you need to prove, not assert, what your model computed.

TCP doesn't decide how to move your packets. Neither should your runtime.

TCP is trusted because it is dumb: it moves bytes exactly as specified. We built Coelanox on the same philosophy—a deterministic, sealed inference runtime that does exactly what it's told, nothing more. No Python. No Docker. No OS in the hot path. A model enters as a .cnox container—cryptographically sealed, SHA-256 verified; containers can be signed for provenance. Inference runs through a tiny Turing-incomplete executor. With audit enabled, operations can be logged so outputs are traceable against the plan.

Fifty-two primitives cover the vast majority of production models

Through Coelanox's Universal IR and real production workloads, we use a minimal canonical opset—Add through advanced ops like Convolution, LayerNorm, MatMul, and more. Dynamic control-flow primitives belong in your serving layer, not your inference kernel: the runtime should not be Turing-complete. That's a security property.

For edge cases (some RNN variants, dynamic GNNs), coverage differs; for most deployed transformers, CNNs, and MLP architectures, this vocabulary is the computational core.

BERT runs today. Here's the proof.

BERT base uncased—fully packaged into a .cnox container, verified, and running inference with op-by-op audit output. Early scalar-backend numbers prioritise correctness and auditability; SIMD and vendor backends are the next layer—a runtime you can't verify is not "faster," it's faster at being wrong in ways you can't detect.

How the container works

A .cnox container is a single self-contained unit: header with integrity information, manifest, Universal IR graph, compressed weights, and optionally pre-compiled CLF kernel blobs. Verify before run: if the payload doesn't match the hash, execution never starts. CLF maps each op to pre-compiled kernel code; the executor walks the plan—no dynamic linking, no runtime codegen in the hot path.

What this enables

Coelanox is a compliance enabler: evidence at the compute level for teams that must show what ran—not just what came out the other end. Air-gapped and offline inference. Tamper-evident deployment: any change to weights, graph, or kernels is detectable before execution.

Current state and what's next

Scalar backend runs models end-to-end with audit output; CLF kernel path is part of the architecture; ONNX ingestion and the full CLI surface (package, verify, run, benchmark, audit, extract) ship in active development. Next: SIMD acceleration, vendor GPU backends, and formal benchmarks against ONNX Runtime and TensorRT—without compromising the audit story.

We're accepting design partnership conversations for production inference in regulated environments. See contact for early access and partnership discussions. CLF spec and reader: GitHub.

← Documentation home · Interactive technology page →