---
orphan: false
---

# What's new in v0.2

```{versionadded} v0.2
```

v0.2 is the cycle that turns MADDENING from a single-GPU prototype
into a deployable multi-physics framework: pencil-decomposed
sharding for big simulations, a clean cloud-resume path that
survives spot preemption, edge validation that catches wiring
errors at compile time, and the static-data channel that finally
gives nodes a place to keep large arrays that don't evolve in
time.

Nine roadmap items, eight shipped (item #2 partial pending the
MIME decoder pull-over).  Full per-feature status lives in
`V0.2_PROGRESS.md` in the repo root.

## Highlights

### Halo / pencil decomposition (#1)

3-D pencil decomposition for stencil nodes lands as M1-M8.  The
contract change: every {class}`~maddening.core.node.SimulationNode`
now declares its halo via {meth}`~maddening.core.node.SimulationNode.halo_width`
returning a `dict[axis, width]` instead of the old `requires_halo`
boolean.  Pointwise nodes return `{}`; 1-D heat with a 2nd-order
stencil returns `{0: 1}`; D3Q19 LBM returns `{0: 1, 1: 1, 2: 1}`.

```{note}
`requires_halo` stays around as a default-implemented compatibility
shim until v0.3.  Subclasses that override it instead of
`halo_width` emit a DeprecationWarning pointing at the new API.
```

### Static-data channel on `SimulationNode` (#3)

Optional {attr}`~maddening.core.node.SimulationNode.static_data`
property lets a node carry non-state arrays (meshes, wall masks,
basis functions, lookup tables) outside the state pytree.  JAX
bakes them into the JIT'd HLO as constants instead of carrying
them through every `fori_loop` iteration.

A drift check at every step entry-point re-hashes the
`(key, shape, dtype)` tuples and triggers a recompile if any node's
static_data shape changed — typical case: `replace_node` swapping
in a different mesh size.  See
[DESIGN.md](https://github.com/Microrobotics-Simulation-Framework/MADDENING/blob/main/DESIGN.md)
§2 "Static-data channel" for the full contract.

HeatNode is the first in-tree consumer: `_grid_x` migrated from
"re-build a JAX array on every property access" to
"build once, expose via static_data".

### Compile-time edge validation (#4)

`GraphManager.compile()` now walks every edge and surfaces
shape/dtype/unit mismatches as four warning classes:

* {class}`~maddening.warnings.ShapeMismatchWarning`
* {class}`~maddening.warnings.DtypeMismatchWarning`
* {class}`~maddening.warnings.UnitMismatchWarning`
* parent {class}`~maddening.warnings.EdgeValidationWarning`

A transform on the edge suppresses the check (the transform may
reshape on the fly).  Aggregation means a 20-edge graph with three
different problems fires three warnings, not one.

Shipped as warnings in v0.2; flipped to hard
{class}`~maddening.warnings.EdgeValidationError` subclasses inside an
:class:`ExceptionGroup` in v0.2.1 (units stay as warnings).  See
{doc}`/developer_guide/edge_validation_migration` for the migration
playbook and {doc}`v0.2.1` for the patch release notes.

### Field subscriptions + zstd compression (#5, #6)

`BinaryStateEncoder` learned to pack a subset of fields and
optionally compress the payload:

```python
enc = BinaryStateEncoder(
    state,
    fields={"lbm": ["velocity"]},   # drop the 19 f-distributions
    compression="zstd",             # or "zstd+xor"
)
```

The compression mode is part of the schema, so the
`/ws/state/binary` subscribe message accepts it too:

```json
{"type": "subscribe",
 "fields": {"lbm": ["velocity"]},
 "compression": "zstd"}
```

On a 32³ LBM-like payload, subscribing to velocity + zstd cuts the
wire by 99% on slowly-varying flows.  ZMQ `NetworkRelay` got the
same `fields=` parameter for static server-side filtering.

A runnable demo is at
`src/maddening/examples/cloud/streaming/08_subscribe_lbm_velocity.py`.

### Cloud provider expansion (#7)

`AWSProvider` and `GCPProvider` join `RunPodProvider` and
`LambdaLabsProvider` (the latter promoted out of "stub" status).
All four share the {class}`~maddening.cloud.providers.CloudProvider`
ABC and pass the same credential-lifecycle test suite (51 cases
covering profile merging, env vars, chmod 0600, deletion
semantics).

Examples for each: `02_runpod_launch.py`, `03_lambda_launch.py`,
`04_aws_launch.py`, `05_gcp_launch.py`.

```{note}
End-to-end launches against real cloud accounts are out of CI scope
— they need real credentials and trigger spend.  The credential
layer is fully covered offline.
```

### Preempt → snapshot, resume from URL (#8)

`CloudSession(on_preempted=hook)` now drives a snapshot of the
GraphManager state when the spot VM is reclaimed.  The new VM
reads `RESUME_FROM_URL` at startup and pulls the state back.
Every snapshot ships with a sidecar manifest containing
`schema_version`, SHA-256, size, and a caller-supplied extra dict.
Tampering or version drift raises
{class}`~maddening.core.simulation.checkpoint.CheckpointIntegrityError`.

See {doc}`/user_guide/cloud_resume` for the full contract.

### Profiler + Perfetto export (#9)

`POST /sim/profile?n_steps=N` returns a Perfetto-loadable Trace
Event JSON.  Drag-and-drop it into <https://ui.perfetto.dev> for
an interactive flame-graph view of per-node timings + coupling
overhead.

`POST /sim/profile/jax/start` and `/stop` wrap
{func}`jax.profiler.start_trace` for an XLA-level capture, and
`/cloud/teardown` snapshots the last trace dir as a base64'd
tar.gz in its response so the trace survives VM destruction.

A runnable demo: `src/maddening/examples/advanced/profile_lbm_step.py`.

### Surrogates subpackage scaffolding (#2, partial)

New subpackages — `surrogates/primitives/`, `surrogates/weights/`,
`surrogates/training/`, `surrogates/replace/` — re-export their
contents from the v0.1 leaf-module locations.  The decoder
zoo extraction from MIME and the `SurrogateArchitecture` ABC
decoupling are queued for v0.2.x.

## Breaking changes

None *in v0.2.0*.  All changes are additive or live behind the
warnings introduced by #4.  v0.2.1 subsequently flipped those
warnings to hard errors (see {doc}`v0.2.1` and the semver
carve-out in {doc}`/developer_guide/edge_validation_migration`).

## Deprecations

| Symbol | Replacement | Removed in |
|---|---|---|
| `SimulationNode.requires_halo` | {meth}`~maddening.core.node.SimulationNode.halo_width` | v0.3 |
| `ShardedNode` | {class}`~maddening.cloud.multigpu.sharded_node.ShardedPointwiseNode` (deprecated alias) | v0.3 |

## New optional dependencies

| Extra | Pulls in | For |
|---|---|---|
| `compression` | `zstandard>=0.22` | binary-encoder compression (#6) |

`compression` is also rolled into `[server]`, `[ci]`, and `[all]`.

## Suite size

| | v0.1 | v0.2 |
|---|---|---|
| MADDENING tests passing | 1358 | 1613 |
| MIME tests passing | — | 625 |

The MADDENING suite added ~250 new tests across the v0.2 work
(static_data, edge validation, encoder subscription + compression,
profiler perfetto export, AWS/GCP providers, preempt + manifest).

## Migration playbook

If you have v0.1 code:

1. **`requires_halo` → `halo_width`.**  Pointwise nodes are fine
   as-is.  Stencil subclasses should override `halo_width()`
   returning a `dict[axis, width]`.  The compat shim keeps v0.1
   subclasses working with a DeprecationWarning.
2. **Edges that previously failed at first `step()`** now warn at
   `compile()`.  Either fix the mismatch or add a `transform=`.
   See {doc}`/developer_guide/edge_validation_migration`.
3. **Large per-node arrays moved to `static_data`** — clean
   refactor, not required.  Nodes with `requires_halo`-shaped
   migration paths are documented in DESIGN.md §2.

For cloud users:

* Spot resilience: wire `make_preempt_snapshot_hook` into your
  `CloudSession(on_preempted=)` callback.
* Bandwidth: pass `fields={...}` to `BinaryStateEncoder` /
  `NetworkRelay` and add `compression="zstd"` to subscribe
  messages.
* Profiling: hit `POST /sim/profile` and drag the JSON into
  ui.perfetto.dev.

## What's still in flight

See `V0.2_PROGRESS.md` in the repo root for the per-item open
checkboxes.  The big remainders:

* Multi-GPU smoke test on a real RunPod cluster (`#1 M9`).
* MIME-decoder pull-over into `surrogates/primitives/` (`#2`).
* `SurrogateTrainer` decoupling from the `SurrogateArchitecture`
  ABC (`#2`).
* `s3://` / `gs://` / `azure://` URL schemes in
  `download_and_load_state` (`#8`).
* The v0.2.1 flip-to-errors cut for `#4`.