COELANOX — Runtime behaviour specification

Audience: Security and platform teams approving deployment next to existing infrastructure.
Scope: Behaviour of coelanox when loading and executing a .cnox container. Packaging (package) has separate failure modes; only critical interactions are noted here.


1. Contract summary

TopicSpecification
Source of truthExecution semantics are defined by Universal IR + IR semantics version in the manifest, not by ONNX at runtime.
Same .cnox, same input → same output?Not guaranteed bit-identical across different backends (e.g. scalar vs SIMD/CLF) or across library/OS versions. Scalar reference behaviour is test-backed; SIMD should match within float tolerance where applicable.
Determinism on same hardwareScalar path: largely deterministic for a fixed IR and inputs (single-threaded reference semantics for ops under test). SIMD / native paths may use parallel or reordered reductions; do not assume bitwise reproducibility for floats.
Determinism across runsSame binary, same container, same input, same backend, same config: expect stable results for scalar; for SIMD, expect close floats unless documented otherwise.

For IR versioning rules, see IR semantics.


2. Load and verify phases

2.1 Container open and parse

Failure modes:

ConditionTypical behaviour
File missing / unreadableError returned; no partial load.
Truncated or invalid binaryParse error; no execution.
Container size exceeds COELANOX_MAX_CONTAINER_SIZE_BYTES (or equivalent config)Load rejected with policy error.
ir_semantics_version not accepted by this runtimeLoad rejected with explicit version error (no silent run).

2.2 Integrity verification (verify and default run)

ConditionBehaviour
SHA-256 mismatchFailure; container treated as untrusted.
--no-verify on runSkips hash check (development-only; discouraged in production).

2.3 Optional Ed25519 signature

When signing is used:

ConditionBehaviour
Signature required (--require-signature) and key/signature invalidFailure; no run.
Trusted key providedVerification succeeds or fails explicitly.

3. Execution phase

3.1 Input validation

When enabled (COELANOX_ENABLE_INPUT_VALIDATION, default true):

CheckOn failure
Input tensor size vs manifest limitsError; no inference.
NaN / Inf policy (when enforced)Error per configuration.

3.2 Backend selection (high level)

  • Scalar: Interpreted execution over IR; always available if the graph is supported.
  • SIMD / CLF / plan: Used when packaged and compatible with runtime discovery; otherwise fallback to scalar may occur (see logs; not silent in the sense of “wrong answer”—path may change).
  • User must not assume two backends return identical floats.

3.3 Execution timeout

COELANOX_EXECUTION_TIMEOUT_MS (default 30000 ms) can abort long runs. The runtime checks the deadline between execution steps (plan steps or scalar waves). On expiry:

ResultBehaviour
TimeoutInference aborts; error returned to caller; no partial success guarantee.

3.4 Resource limits

LimitEffect when exceeded
COELANOX_MAX_MEMORY_BYTESRuntime may reject or fail allocation paths.
Workspace cap (--max-workspace-mb / env)Error if the model cannot run within cap.

4. Failure modes (operator-facing)

SymptomCause classTypical response
Hash / integrity failedTampering, corruptionStop; do not run; re-source artifact.
Version mismatchOld/new toolchain mismatchStop; rebuild or align runtime version.
Package-time Custom opUnsupported ONNX/opFix graph or extend translator (not a runtime patch).
TimeoutSlow or stuck workAbort; tune timeout or workload.
Path rejectedCOELANOX_ALLOW_ABSOLUTE_PATHS=falseError; use allowed paths.
Backend missingNo CLF / wrong pathMay fall back to scalar; performance drop, not silent correctness.

Extended runbooks: Operations.


5. Audit and logging

  • coelanox run --audit (and related options) can emit per-op evidence to configured outputs.
  • serve mode flushes audit state during long-lived sessions (see IPC / serve documentation).
  • Logging uses tracing; integrate with your log pipeline (RUST_LOG, COELANOX_LOG_LEVEL).

This specification does not enumerate every error string; it describes classes of behaviour security teams should expect.


Related documents

Non-technical hub