# GPU precision and TF32 This is an awareness note. If you only run MIME's built-in experiments, **no action is needed** — the relevant internal paths already force full precision. Read on if you write your own GPU code. ## What TF32 is On Ampere and newer NVIDIA GPUs, JAX's default float32 matmul precision is **TF32** — a reduced format with a ~10-bit mantissa (~3 decimal digits, ~1e-3 relative error). Any float32 matmul dispatched to GPU tensor cores silently runs at TF32 unless you ask for full precision explicitly. ## Why it can corrupt physics TF32 is harmless when a matmul's output is the same order of magnitude as its inputs — it just adds ~0.1% noise. It is **catastrophic** when the result is a near-cancellation: a residual far smaller than the input terms (an LBM momentum moment summed from much larger populations, a spectral pressure residual). There, TF32's ~1e-3 input error swamps the answer entirely — a moment that should be `3e-5` can come back as exactly `0.0`. ## What MIME already does MIME's v0.2 fit-up forced full precision on the affected internal paths — the LBM moment transforms (D3Q19 / D2Q9 moment matrices) and the FVM pressure solver. These fixes are per-call `precision="highest"` on the specific matmuls; TF32-tolerant paths (neural-net surrogates) are left fast. Nothing is required of you when using these solvers. ## If you write your own GPU code For precision-sensitive operations — anything where the result is much smaller than its inputs — request full precision explicitly: ```python import jax.numpy as jnp y = jnp.matmul(a, b, precision="highest") # also jnp.dot / jnp.einsum / jnp.tensordot ``` Prefer per-call `precision=` over the global `jax_default_matmul_precision` flag, so TF32-tolerant code stays fast. Small matmuls (all dimensions ≲ 16–32 — 3×3 rotations, 4×4 transforms) are never dispatched to tensor cores, so TF32 cannot reach them. ## Further reading The full investigation — every audited float32 matmul path in MIME, with verdicts and measured errors — is in [`tf32_matmul_precision_audit.md`](../validation/benchmark_reports/tf32_matmul_precision_audit.md).