What’s new in v0.2#
Added in version v0.2.
v0.2 is the cycle that turns MADDENING from a single-GPU prototype into a deployable multi-physics framework: pencil-decomposed sharding for big simulations, a clean cloud-resume path that survives spot preemption, edge validation that catches wiring errors at compile time, and the static-data channel that finally gives nodes a place to keep large arrays that don’t evolve in time.
Nine roadmap items, eight shipped (item #2 partial pending the
MIME decoder pull-over). Full per-feature status lives in
V0.2_PROGRESS.md in the repo root.
Highlights#
Halo / pencil decomposition (#1)#
3-D pencil decomposition for stencil nodes lands as M1-M8. The
contract change: every SimulationNode
now declares its halo via halo_width()
returning a dict[axis, width] instead of the old requires_halo
boolean. Pointwise nodes return {}; 1-D heat with a 2nd-order
stencil returns {0: 1}; D3Q19 LBM returns {0: 1, 1: 1, 2: 1}.
Note
requires_halo stays around as a default-implemented compatibility
shim until v0.3. Subclasses that override it instead of
halo_width emit a DeprecationWarning pointing at the new API.
Static-data channel on SimulationNode (#3)#
Optional static_data
property lets a node carry non-state arrays (meshes, wall masks,
basis functions, lookup tables) outside the state pytree. JAX
bakes them into the JIT’d HLO as constants instead of carrying
them through every fori_loop iteration.
A drift check at every step entry-point re-hashes the
(key, shape, dtype) tuples and triggers a recompile if any node’s
static_data shape changed — typical case: replace_node swapping
in a different mesh size. See
DESIGN.md
§2 “Static-data channel” for the full contract.
HeatNode is the first in-tree consumer: _grid_x migrated from
“re-build a JAX array on every property access” to
“build once, expose via static_data”.
Compile-time edge validation (#4)#
GraphManager.compile() now walks every edge and surfaces
shape/dtype/unit mismatches as four warning classes:
ShapeMismatchWarningDtypeMismatchWarningUnitMismatchWarningparent
EdgeValidationWarning
A transform on the edge suppresses the check (the transform may reshape on the fly). Aggregation means a 20-edge graph with three different problems fires three warnings, not one.
Shipped as warnings in v0.2; flipped to hard
EdgeValidationError subclasses inside an
:class:ExceptionGroup in v0.2.1 (units stay as warnings). See
Edge validation: migration guide (v0.2 → v0.3.0) for the migration
playbook and What’s new in v0.2.1 for the patch release notes.
Field subscriptions + zstd compression (#5, #6)#
BinaryStateEncoder learned to pack a subset of fields and
optionally compress the payload:
enc = BinaryStateEncoder(
state,
fields={"lbm": ["velocity"]}, # drop the 19 f-distributions
compression="zstd", # or "zstd+xor"
)
The compression mode is part of the schema, so the
/ws/state/binary subscribe message accepts it too:
{"type": "subscribe",
"fields": {"lbm": ["velocity"]},
"compression": "zstd"}
On a 32³ LBM-like payload, subscribing to velocity + zstd cuts the
wire by 99% on slowly-varying flows. ZMQ NetworkRelay got the
same fields= parameter for static server-side filtering.
A runnable demo is at
src/maddening/examples/cloud/streaming/08_subscribe_lbm_velocity.py.
Cloud provider expansion (#7)#
AWSProvider and GCPProvider join RunPodProvider and
LambdaLabsProvider (the latter promoted out of “stub” status).
All four share the CloudProvider
ABC and pass the same credential-lifecycle test suite (51 cases
covering profile merging, env vars, chmod 0600, deletion
semantics).
Examples for each: 02_runpod_launch.py, 03_lambda_launch.py,
04_aws_launch.py, 05_gcp_launch.py.
Note
End-to-end launches against real cloud accounts are out of CI scope — they need real credentials and trigger spend. The credential layer is fully covered offline.
Preempt → snapshot, resume from URL (#8)#
CloudSession(on_preempted=hook) now drives a snapshot of the
GraphManager state when the spot VM is reclaimed. The new VM
reads RESUME_FROM_URL at startup and pulls the state back.
Every snapshot ships with a sidecar manifest containing
schema_version, SHA-256, size, and a caller-supplied extra dict.
Tampering or version drift raises
CheckpointIntegrityError.
See Surviving spot preemption for the full contract.
Profiler + Perfetto export (#9)#
POST /sim/profile?n_steps=N returns a Perfetto-loadable Trace
Event JSON. Drag-and-drop it into https://ui.perfetto.dev for
an interactive flame-graph view of per-node timings + coupling
overhead.
POST /sim/profile/jax/start and /stop wrap
jax.profiler.start_trace() for an XLA-level capture, and
/cloud/teardown snapshots the last trace dir as a base64’d
tar.gz in its response so the trace survives VM destruction.
A runnable demo: src/maddening/examples/advanced/profile_lbm_step.py.
Surrogates subpackage scaffolding (#2, partial)#
New subpackages — surrogates/primitives/, surrogates/weights/,
surrogates/training/, surrogates/replace/ — re-export their
contents from the v0.1 leaf-module locations. The decoder
zoo extraction from MIME and the SurrogateArchitecture ABC
decoupling are queued for v0.2.x.
Breaking changes#
None in v0.2.0. All changes are additive or live behind the warnings introduced by #4. v0.2.1 subsequently flipped those warnings to hard errors (see What’s new in v0.2.1 and the semver carve-out in Edge validation: migration guide (v0.2 → v0.3.0)).
Deprecations#
Symbol |
Replacement |
Removed in |
|---|---|---|
|
|
v0.3 |
|
|
v0.3 |
New optional dependencies#
Extra |
Pulls in |
For |
|---|---|---|
|
|
binary-encoder compression (#6) |
compression is also rolled into [server], [ci], and [all].
Suite size#
v0.1 |
v0.2 |
|
|---|---|---|
MADDENING tests passing |
1358 |
1613 |
MIME tests passing |
— |
625 |
The MADDENING suite added ~250 new tests across the v0.2 work (static_data, edge validation, encoder subscription + compression, profiler perfetto export, AWS/GCP providers, preempt + manifest).
Migration playbook#
If you have v0.1 code:
requires_halo→halo_width. Pointwise nodes are fine as-is. Stencil subclasses should overridehalo_width()returning adict[axis, width]. The compat shim keeps v0.1 subclasses working with a DeprecationWarning.Edges that previously failed at first
step()now warn atcompile(). Either fix the mismatch or add atransform=. See Edge validation: migration guide (v0.2 → v0.3.0).Large per-node arrays moved to
static_data— clean refactor, not required. Nodes withrequires_halo-shaped migration paths are documented in DESIGN.md §2.
For cloud users:
Spot resilience: wire
make_preempt_snapshot_hookinto yourCloudSession(on_preempted=)callback.Bandwidth: pass
fields={...}toBinaryStateEncoder/NetworkRelayand addcompression="zstd"to subscribe messages.Profiling: hit
POST /sim/profileand drag the JSON into ui.perfetto.dev.
What’s still in flight#
See V0.2_PROGRESS.md in the repo root for the per-item open
checkboxes. The big remainders:
Multi-GPU smoke test on a real RunPod cluster (
#1 M9).MIME-decoder pull-over into
surrogates/primitives/(#2).SurrogateTrainerdecoupling from theSurrogateArchitectureABC (#2).s3:///gs:///azure://URL schemes indownload_and_load_state(#8).The v0.2.1 flip-to-errors cut for
#4.