--- orphan: false --- # What's new in v0.2 ```{versionadded} v0.2 ``` v0.2 is the cycle that turns MADDENING from a single-GPU prototype into a deployable multi-physics framework: pencil-decomposed sharding for big simulations, a clean cloud-resume path that survives spot preemption, edge validation that catches wiring errors at compile time, and the static-data channel that finally gives nodes a place to keep large arrays that don't evolve in time. Nine roadmap items, eight shipped (item #2 partial pending the MIME decoder pull-over). Full per-feature status lives in `V0.2_PROGRESS.md` in the repo root. ## Highlights ### Halo / pencil decomposition (#1) 3-D pencil decomposition for stencil nodes lands as M1-M8. The contract change: every {class}`~maddening.core.node.SimulationNode` now declares its halo via {meth}`~maddening.core.node.SimulationNode.halo_width` returning a `dict[axis, width]` instead of the old `requires_halo` boolean. Pointwise nodes return `{}`; 1-D heat with a 2nd-order stencil returns `{0: 1}`; D3Q19 LBM returns `{0: 1, 1: 1, 2: 1}`. ```{note} `requires_halo` stays around as a default-implemented compatibility shim until v0.3. Subclasses that override it instead of `halo_width` emit a DeprecationWarning pointing at the new API. ``` ### Static-data channel on `SimulationNode` (#3) Optional {attr}`~maddening.core.node.SimulationNode.static_data` property lets a node carry non-state arrays (meshes, wall masks, basis functions, lookup tables) outside the state pytree. JAX bakes them into the JIT'd HLO as constants instead of carrying them through every `fori_loop` iteration. A drift check at every step entry-point re-hashes the `(key, shape, dtype)` tuples and triggers a recompile if any node's static_data shape changed — typical case: `replace_node` swapping in a different mesh size. See [DESIGN.md](https://github.com/Microrobotics-Simulation-Framework/MADDENING/blob/main/DESIGN.md) §2 "Static-data channel" for the full contract. HeatNode is the first in-tree consumer: `_grid_x` migrated from "re-build a JAX array on every property access" to "build once, expose via static_data". ### Compile-time edge validation (#4) `GraphManager.compile()` now walks every edge and surfaces shape/dtype/unit mismatches as four warning classes: * {class}`~maddening.warnings.ShapeMismatchWarning` * {class}`~maddening.warnings.DtypeMismatchWarning` * {class}`~maddening.warnings.UnitMismatchWarning` * parent {class}`~maddening.warnings.EdgeValidationWarning` A transform on the edge suppresses the check (the transform may reshape on the fly). Aggregation means a 20-edge graph with three different problems fires three warnings, not one. Shipped as warnings in v0.2; flipped to hard {class}`~maddening.warnings.EdgeValidationError` subclasses inside an :class:`ExceptionGroup` in v0.2.1 (units stay as warnings). See {doc}`/developer_guide/edge_validation_migration` for the migration playbook and {doc}`v0.2.1` for the patch release notes. ### Field subscriptions + zstd compression (#5, #6) `BinaryStateEncoder` learned to pack a subset of fields and optionally compress the payload: ```python enc = BinaryStateEncoder( state, fields={"lbm": ["velocity"]}, # drop the 19 f-distributions compression="zstd", # or "zstd+xor" ) ``` The compression mode is part of the schema, so the `/ws/state/binary` subscribe message accepts it too: ```json {"type": "subscribe", "fields": {"lbm": ["velocity"]}, "compression": "zstd"} ``` On a 32³ LBM-like payload, subscribing to velocity + zstd cuts the wire by 99% on slowly-varying flows. ZMQ `NetworkRelay` got the same `fields=` parameter for static server-side filtering. A runnable demo is at `src/maddening/examples/cloud/streaming/08_subscribe_lbm_velocity.py`. ### Cloud provider expansion (#7) `AWSProvider` and `GCPProvider` join `RunPodProvider` and `LambdaLabsProvider` (the latter promoted out of "stub" status). All four share the {class}`~maddening.cloud.providers.CloudProvider` ABC and pass the same credential-lifecycle test suite (51 cases covering profile merging, env vars, chmod 0600, deletion semantics). Examples for each: `02_runpod_launch.py`, `03_lambda_launch.py`, `04_aws_launch.py`, `05_gcp_launch.py`. ```{note} End-to-end launches against real cloud accounts are out of CI scope — they need real credentials and trigger spend. The credential layer is fully covered offline. ``` ### Preempt → snapshot, resume from URL (#8) `CloudSession(on_preempted=hook)` now drives a snapshot of the GraphManager state when the spot VM is reclaimed. The new VM reads `RESUME_FROM_URL` at startup and pulls the state back. Every snapshot ships with a sidecar manifest containing `schema_version`, SHA-256, size, and a caller-supplied extra dict. Tampering or version drift raises {class}`~maddening.core.simulation.checkpoint.CheckpointIntegrityError`. See {doc}`/user_guide/cloud_resume` for the full contract. ### Profiler + Perfetto export (#9) `POST /sim/profile?n_steps=N` returns a Perfetto-loadable Trace Event JSON. Drag-and-drop it into for an interactive flame-graph view of per-node timings + coupling overhead. `POST /sim/profile/jax/start` and `/stop` wrap {func}`jax.profiler.start_trace` for an XLA-level capture, and `/cloud/teardown` snapshots the last trace dir as a base64'd tar.gz in its response so the trace survives VM destruction. A runnable demo: `src/maddening/examples/advanced/profile_lbm_step.py`. ### Surrogates subpackage scaffolding (#2, partial) New subpackages — `surrogates/primitives/`, `surrogates/weights/`, `surrogates/training/`, `surrogates/replace/` — re-export their contents from the v0.1 leaf-module locations. The decoder zoo extraction from MIME and the `SurrogateArchitecture` ABC decoupling are queued for v0.2.x. ## Breaking changes None *in v0.2.0*. All changes are additive or live behind the warnings introduced by #4. v0.2.1 subsequently flipped those warnings to hard errors (see {doc}`v0.2.1` and the semver carve-out in {doc}`/developer_guide/edge_validation_migration`). ## Deprecations | Symbol | Replacement | Removed in | |---|---|---| | `SimulationNode.requires_halo` | {meth}`~maddening.core.node.SimulationNode.halo_width` | v0.3 | | `ShardedNode` | {class}`~maddening.cloud.multigpu.sharded_node.ShardedPointwiseNode` (deprecated alias) | v0.3 | ## New optional dependencies | Extra | Pulls in | For | |---|---|---| | `compression` | `zstandard>=0.22` | binary-encoder compression (#6) | `compression` is also rolled into `[server]`, `[ci]`, and `[all]`. ## Suite size | | v0.1 | v0.2 | |---|---|---| | MADDENING tests passing | 1358 | 1613 | | MIME tests passing | — | 625 | The MADDENING suite added ~250 new tests across the v0.2 work (static_data, edge validation, encoder subscription + compression, profiler perfetto export, AWS/GCP providers, preempt + manifest). ## Migration playbook If you have v0.1 code: 1. **`requires_halo` → `halo_width`.** Pointwise nodes are fine as-is. Stencil subclasses should override `halo_width()` returning a `dict[axis, width]`. The compat shim keeps v0.1 subclasses working with a DeprecationWarning. 2. **Edges that previously failed at first `step()`** now warn at `compile()`. Either fix the mismatch or add a `transform=`. See {doc}`/developer_guide/edge_validation_migration`. 3. **Large per-node arrays moved to `static_data`** — clean refactor, not required. Nodes with `requires_halo`-shaped migration paths are documented in DESIGN.md §2. For cloud users: * Spot resilience: wire `make_preempt_snapshot_hook` into your `CloudSession(on_preempted=)` callback. * Bandwidth: pass `fields={...}` to `BinaryStateEncoder` / `NetworkRelay` and add `compression="zstd"` to subscribe messages. * Profiling: hit `POST /sim/profile` and drag the JSON into ui.perfetto.dev. ## What's still in flight See `V0.2_PROGRESS.md` in the repo root for the per-item open checkboxes. The big remainders: * Multi-GPU smoke test on a real RunPod cluster (`#1 M9`). * MIME-decoder pull-over into `surrogates/primitives/` (`#2`). * `SurrogateTrainer` decoupling from the `SurrogateArchitecture` ABC (`#2`). * `s3://` / `gs://` / `azure://` URL schemes in `download_and_load_state` (`#8`). * The v0.2.1 flip-to-errors cut for `#4`.