IBLBMFluidNode — Onboarding Document#
Entry point for implementing the IBLBMFluidNode refactor. Start here.
1. Purpose and Context#
IBLBMFluidNode wraps the existing LBM fluid solver code (D3Q19 lattice Boltzmann with Bouzidi interpolated bounce-back) as a proper MADDENING SimulationNode, replacing the manual Python-loop wiring in scripts/run_confinement_sweep.py with a node-graph approach. Once this node exists, it can be wired to RigidBodyNode and PermanentMagnetResponseNode via GraphManager edges, enabling the FSI coupling (T3.C) and emergent step-out demo (T3.D) without any custom simulation loop code. The step-out effect — where the UMR loses synchrony with the rotating magnetic field — will emerge naturally from the node graph.
This work is tracked as T3.0 in UMR_REPLICATION_PLAN.md. The full interface specification is in docs/architecture/iblbm_fluid_node_spec.md. No MADDENING extensions are required — the existing SimulationNode interface supports all IBLBMFluidNode requirements, as assessed in a detailed architecture review (Concern 63, 2026-03-24). The LBM code, boundary condition infrastructure, and sparse q-value computation all already exist as tested utility functions — this refactor wraps them, it does not rewrite them.
2. Where to Find Everything#
MADDENING framework (sibling repo, editable install)#
File |
Description |
Relevance |
|---|---|---|
|
|
The interface contract. |
|
|
Key methods: |
|
|
Gauss-Seidel iteration within a timestep. Used when LBM and RigidBody need bidirectional exchange within the same step (FSI mode). For fixed-omega confinement sweep, not needed — one-step lag via back-edges is sufficient. |
|
|
Defines |
MIME node implementations (reference patterns)#
File |
Description |
Relevance |
|---|---|---|
|
|
Adds |
|
|
Closest analogue. Same role ( |
|
|
The node IBLBMFluidNode exchanges data with. Outputs |
|
|
Produces |
|
|
Produces |
LBM utility modules (code being wrapped)#
File |
Description |
Relevance |
|---|---|---|
|
D3Q19 lattice constants, |
Core LBM operations. |
|
|
All boundary condition functions. |
|
|
Self-contained rotating body step function. IBLBMFluidNode’s |
|
|
2D IB-LBM solver (immersed boundary method). Different approach from the 3D bounce-back method used for the UMR. Reference for state management patterns but not directly used by IBLBMFluidNode. |
|
|
Convergence monitoring. Not needed inside the node — convergence is monitored externally by the sweep script or GraphManager callback. |
|
|
UMR geometry generation. |
Scripts and existing wiring#
File |
Description |
Relevance |
|---|---|---|
|
Manual LBM sweep loop (T2.6/T2.6b) |
The code IBLBMFluidNode replaces. The per-step loop inside |
Planning and specification documents#
File |
Description |
Relevance |
|---|---|---|
|
Full interface specification |
Constructor parameters, state dict, boundary inputs, update() pseudocode, boundary fluxes, coupling architecture, effort estimates. Read this after the SimulationNode interface. |
|
T3.0 (this work), T3.C (FSI coupling), T3.D (step-out demo) |
Context for why this node exists and what it enables. |
|
T3.C and T3.D rendering requirements |
The velocity cross-section visualization (§T3.A) reads from IBLBMFluidNode’s velocity field output. |
3. Implementation Checklist#
Complete in order. Each item is self-contained once its dependencies are met.
# |
Task |
File(s) |
Description |
Depends on |
Effort |
|---|---|---|---|---|---|
1 |
Create |
|
Class with |
None |
30 min |
2 |
Implement |
|
Port the per-step loop inside |
#1 |
1 hour |
3 |
Implement |
|
Return |
#2 |
15 min |
4 |
Register in |
|
Add |
#3 |
5 min |
5 |
Unit test: |
|
Call |
#3 |
1 hour |
6 |
Integration test: |
|
Wire |
#5 |
1 hour |
7 |
Regression test: confinement sweep via node graph |
|
Run a short confinement simulation (64³, ratio 0.30, 200 steps) via |
#6 |
1 hour |
8 |
CSFFlowNode interchangeability test |
|
Build two |
#6 |
30 min |
9 |
Documentation: update |
|
Add |
#4 |
5 min |
Total: ~7 hours.
4. Key Design Decisions Already Made#
These were carefully reasoned in previous sessions. Do not revisit without strong justification.
No MADDENING extensions required#
The SimulationNode interface was assessed against six requirements (large state, variable geometry, quaternion inputs, circular dependencies, spatial stencil, JIT compilation) and found sufficient for all of them. XLA manages the 1.3 GB f-array via buffer reuse — no Python-level copies. The requires_halo property was designed for LBM. CouplingGroup handles bidirectional coupling. See the full assessment in docs/architecture/iblbm_fluid_node_spec.md §2.
One-step lag for LBM ↔ RigidBody coupling#
Standard IB-LBM practice. The drag torque from step t drives the angular velocity at step t+1. This is implemented via a back-edge in the GraphManager (detected automatically in validate(), line ~1280 of graph_manager.py). For the confinement sweep (fixed omega), there is no circular dependency at all — IBLBMFluidNode runs standalone.
Per-step sparse q-value recomputation#
The hybrid rotate+recompute approach was rejected (Concern 62 analysis, 2026-03-24). At ω = 0.001 rad/step and fin tip radius 29 lu, the surface moves 0.029 lu per step — enough to degrade Bouzidi from O(dx²) to O(dx) after a single step. Q-values must be recomputed every step. compute_q_values_sdf_sparse makes this feasible by evaluating only ~112K boundary nodes instead of 7.1M full domain (63× reduction).
run() path, not run_scan()#
GraphManager.run() uses a Python loop calling _compiled_step once per step. This is the correct execution path because the geometry generation (create_umr_mask) and sparse q-value computation (compute_q_values_sdf_sparse) use Python for-loops over 19 directions that JAX unrolls during tracing. run_scan() would require the full step to be scannable, which it is in principle (the geometry ops are JAX-traceable) but compilation time would be prohibitive. The Python-loop path is already performant: measured 0.04s/step LBM + ~0.1s/step sparse q-values at 192³ on H100.
Solid mask in state dict#
The UMR solid mask at 192³ is a boolean array of ~7 MB. Including it in the state dict is trivially practical — the f-array is 1.3 GB. The mask is regenerated every step in update() from the current body_angle. It is included in state for observability (rendering, debugging) and because it is part of the simulation state.
requires_halo = True#
LBM streaming accesses spatial neighbors (lattice velocities shift populations between adjacent nodes). This property must return True for IBLBMFluidNode. Currently, MADDENING’s ShardedNode refuses to shard nodes with requires_halo = True — this is correct and means IBLBMFluidNode runs on a single device. Multi-GPU support would require halo exchange, which is out of scope.
5. Known Risks and Things to Watch Out For#
JIT compilation time#
The first call to _compiled_step with IBLBMFluidNode will trigger JAX tracing. The 19-direction loops in compute_missing_mask, compute_q_values_sdf_sparse, and apply_bouzidi_bounce_back are Python for-loops that JAX unrolls into a flat XLA computation graph. At 192³, expect 30–60 seconds for the first step (compilation), then ~0.14s per step thereafter. This is normal and matches the current run_confinement_sweep.py behavior.
MAX_LINKS constant in compute_q_values_sdf_sparse#
The max_boundary_links_per_dir parameter controls the fixed pad size for jnp.nonzero. It defaults to 0, which triggers auto-computation as 1.5 * max_count + 1 where max_count is the maximum boundary link count across the 19 directions for the current mask.
Measured baseline for d2.8 UMR geometry: At 64³, total boundary links across all directions = 12,436 (~654 per direction max). Surface area scales as (N/64)² — extrapolated to 192³: ~5,900 per direction max, ~112K total. The auto-computed max_boundary_links_per_dir at 192³ would be approximately 5,900 * 1.5 + 1 ≈ 8,851. These values correspond to the d2.8 UMR (body_radius=0.87mm, fin_outer_radius=1.42mm, 2 sets of 3 fins) at confinement ratio 0.30.
If the UMR geometry changes (different fin count, different body radius), the boundary link count changes. The 1.5× margin handles angle-to-angle variation for a fixed geometry, but a new geometry needs a fresh measurement. To measure: call jnp.sum(missing_mask, axis=(1,2,3)) at several angles and take the maximum.
If the pad size is too small, jnp.nonzero silently truncates boundary links. The q-values for truncated links default to 0.5 (simple BB behavior). This degrades accuracy without any error message. Always verify boundary link counts after geometry changes.
Padding mask correctness#
In compute_q_values_sdf_sparse, after bisection, the is_real mask (based on entry_idx < actual_count) must be applied before scattering results back into the full q-values array. Padding entries undergo bisection on garbage coordinates and produce garbage q-values. These are masked to 0.5 before scatter. If this mask is removed or bypassed, the garbage values corrupt the Bouzidi computation silently — the simulation runs but produces wrong drag values. Do not remove the is_real masking logic.
Circular dependency timing in FSI mode#
For the confinement sweep (T2.6b), angular velocity is constant — no circular dependency. For FSI (T3.C), the LBM needs angular velocity from RigidBodyNode, and RigidBodyNode needs drag torque from LBM. The one-step lag is standard and acceptable for slowly varying flows (Stokes regime). However, near step-out (where the UMR transitions between synchronous and asynchronous rotation), the dynamics change rapidly. If the one-step lag causes visible artifacts at high field frequencies, switch to a CouplingGroup with Gauss-Seidel sub-stepping (max_iterations=3). Test by comparing step-out frequency with and without the coupling group — they should agree within 1%.
Backward compatibility#
run_confinement_sweep.py must continue to work unchanged after the refactor. IBLBMFluidNode is a new node class — it does not modify any existing code. The sweep script uses the utility functions directly (not through a node), so it is unaffected. Do not refactor the utility functions in a way that changes their signatures or behavior.
6. Verification Plan#
Unit test (checklist item #5)#
def test_iblbm_matches_rotating_body_step():
"""IBLBMFluidNode.update() produces identical results to rotating_body_step()."""
# Setup: 32^3, tau=0.8, ratio=0.30, omega=0.003, use_bouzidi=False
node = IBLBMFluidNode(name="lbm", timestep=1.0, nx=32, ny=32, nz=32, ...)
state = node.initial_state()
bi = {"body_angular_velocity": jnp.array([0, 0, 0.003])}
new_state = node.update(state, bi, 1.0)
# Compare against rotating_body_step
f_ref, force_ref, torque_ref, _, _ = rotating_body_step(
state["f"], tau=0.8, angular_velocity=0.003, dt_lbm=1.0,
current_angle=0.0, geometry_params=..., center=..., use_bouzidi=False,
)
assert jnp.allclose(new_state["f"], f_ref, atol=1e-6)
assert jnp.allclose(new_state["drag_torque"], torque_ref, atol=1e-4)
Integration test (checklist item #6)#
Wire all 4 nodes via GraphManager, add edges, run 100 steps. Assert:
drag_torqueis non-zero after step 10 (fluid has developed)orientationhas changed from initial (body is rotating)No NaN or Inf in any state
Regression test (checklist item #7)#
Run 200-step simulation at 64³, ratio 0.30, with Bouzidi. Compare mean drag torque against the T2.6b sanity test value (21.3916 lu at 64³ with sparse Bouzidi, 8 bisection iterations). Must match within 0.1%.
Interchangeability test (checklist item #8)#
def test_csf_and_iblbm_are_interchangeable():
"""Both CSFFlowNode and IBLBMFluidNode work as environment nodes."""
for NodeClass in [CSFFlowNode, IBLBMFluidNode]:
gm = GraphManager()
gm.add_node(NodeClass(name="fluid", ...))
gm.add_node(RigidBodyNode(name="body", ..., use_analytical_drag=False))
gm.add_edge("fluid", "drag_force", "body", "drag_force")
gm.add_edge("fluid", "drag_torque", "body", "drag_torque")
gm.compile() # Must not raise
gm.step() # Must not crash
7. How This Enables Tier 3#
Once IBLBMFluidNode is a proper MADDENING node, the FSI demo (T3.C) requires zero custom simulation code. The node graph:
ExternalMagneticFieldNode → PermanentMagnetResponseNode → RigidBodyNode ↔ IBLBMFluidNode
produces the step-out effect naturally: as field frequency increases, the magnetic torque and viscous drag torque compete. Below step-out, the UMR locks to the field. Above step-out, the drag exceeds the magnetic torque and the UMR falls out of synchrony — angular velocity oscillates, flow field becomes unsteady. This transition is visible in the USD scene (T3.D) as a dramatic change from smooth rotation to chaotic tumbling.
The emergent step-out demo (T3.D in UMR_REPLICATION_PLAN.md) runs this graph at 64³ in real-time (~2 fps with HydraStorm rendering), streamed via Selkies. The user adjusts field frequency and watches the step-out transition happen live. The parameter panel (T3.B) overlays the precomputed predictions from T2.7. See UMR_REPLICATION_PLAN.md T3.C and T3.D, and RENDERING_PLAN.md §[UMR ADDITION] for rendering details.