Skip to content

Fix/issue 1 junction surface area#4

Open
wiesnerfriedman wants to merge 464 commits into
SWMM-Project:mainfrom
wiesnerfriedman:fix/issue-1-junction-surface-area
Open

Fix/issue 1 junction surface area#4
wiesnerfriedman wants to merge 464 commits into
SWMM-Project:mainfrom
wiesnerfriedman:fix/issue-1-junction-surface-area

Conversation

@wiesnerfriedman

Copy link
Copy Markdown

Node::getSurfArea returns MIN_SURFAREA as a constant for non-storage nodes, which gets added to conduit half-areas instead of acting as a floor. This inflates junction surface area, damping depth response. Fix moves the floor to DWSolver::setNodeDepth as max(surfArea, MIN_SURFAREA).

@wiesnerfriedman wiesnerfriedman force-pushed the fix/issue-1-junction-surface-area branch from dac366e to 8f75098 Compare April 21, 2026 02:22
wiesnerfriedman and others added 29 commits May 8, 2026 15:50
LID.cpp batchBarrelFlux never applied stor_ksat — wb_infil was
unconditionally zero for all rain-barrel storage units.  Added
Euler exfiltration step (exact for constant-rate ODE).

Green-Ampt trajectory test was sub-stepping at dt=1 s (up to 8793
Newton solves), accumulating ~0.035 ft of round-off against a
1e-3 ft criterion.  Replaced with one implicit solve per benchmark
interval — the method is exact for any dt.

Horton RMS tolerance tightened beyond the 8-digit CSV precision;
relaxed from 1e-10 to 1e-9 ft.

SIR epidemic integration extended from t=60 to t=120 — with R0=3
and γ=0.1 the infected fraction does not drop below 1% until ~t=80.

CMake: made OpenMP linkage conditional to fix macOS no-libomp
builds; guarded regression CTest registration against missing data dir.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
These changes were specific to a no-libomp macOS dev environment and
should not be part of the engine build configuration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… CSV

Writes the genuine cross-code reference trajectory for the
odesolve-sir-epidemic benchmark using Flash-X's independent Fortran
CashKarp45 implementation (eFrac=1e-12), replacing the t=0 placeholder.
SIRTrajectoryMatchesBenchmark now runs (was previously SKIPPED).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: cbuahin <caleb.buahin@gmail.com>
The generator (tools/generate_sir_reference/) requires a local Flash-X
checkout at a hardcoded path and cannot be run by other contributors.
The reference CSV it produced is already committed; generation methodology
is fully documented in provenance.yaml and definition.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
t_end_s 60→120, planned_t_eval_s extended to 120, dataset status
placeholder→populated, generation script replaced with a prose note
describing the Flash-X Fortran driver that produced the CSV.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
t_end 60→120, qualitative dynamics updated to match actual trajectory,
reference dataset section replaced with accurate description of the
committed CSV and where to find rebuild instructions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- test_ode_solver.cpp: update SIR skip comment to reflect populated
  reference.csv and point to provenance.yaml instead of missing .py script
- test_infiltration.cpp, test_lid.cpp: move benchmark includes
  (fstream/sstream/string/vector) to top include block; remove duplicates
  that were left mid-file
- LID.cpp batchBarrelFlux: apply stor_clog reduction via getStorageExfil
  so rain-barrel exfiltration is consistent with other storage-based LIDs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Part 1: Thread-safety verification – concurrent engine tests
Signed-off-by: cbuahin <caleb.buahin@gmail.com>
Signed-off-by: cbuahin <caleb.buahin@gmail.com>
Signed-off-by: cbuahin <caleb.buahin@gmail.com>
Signed-off-by: cbuahin <caleb.buahin@gmail.com>
Signed-off-by: cbuahin <caleb.buahin@gmail.com>
Signed-off-by: cbuahin <caleb.buahin@gmail.com>
Signed-off-by: cbuahin <caleb.buahin@gmail.com>
Signed-off-by: cbuahin <caleb.buahin@gmail.com>
Signed-off-by: cbuahin <caleb.buahin@gmail.com>
Locates the engine DLL and vcpkg runtime dependencies and invokes
delvewheel with the appropriate --add-path entries.
Signed-off-by: cbuahin <caleb.buahin@gmail.com>
Introduces an optional [RDII_DECAY] input section that replaces the legacy
linear IA recovery with physics-based exponential dynamics. Depletion during
storms follows exp(-k_dep * rainfall); recovery during dry periods uses an
additive rate k_rec(T) = k_0 + k_T * exp(theta_rec * (T - T_ref)) with
frozen-ground suppression below T_freeze. Seasonal RDII variation now
emerges from temperature dynamics on a single RTK set rather than from
monthly R calibration.

Dispatch is per-(UH-group, response); a group with no [RDII_DECAY] row
continues to use the existing linear iaRecov path, so adoption is fully
incremental.

Plumbing changes:
- ClimateState moves from SWMMEngine into SimulationContext so downstream
  solvers can read temperature through ctx.climate_state without reaching
  into the engine.
- New RDIIDecayData SoA parallel to UnitHydData; INP writer, GeoPackage
  schema/writer/reader, and report-plugin summary line all round-trip the
  new section. Legacy [HYDROGRAPHS] format is byte-stable.

Tests: 13 new unit tests covering parsing, init/resolution, frozen-ground
behaviour, and the linear-path-unchanged guarantee. Full suite green
(40/40, 0 failures).
Signed-off-by: cbuahin <caleb.buahin@gmail.com>
Nine new benchmark datasets under tests/benchmarks/manufactured/ covering
kinematic-wave transient, force-main friction (HW + DW), GVF M1 backwater,
quality CSTR decay, groundwater linearized recession, Green-Ampt exfiltration
geometry, LID cylindrical clogging, modified Horton saturation recovery, and
circular cross-section ellipse reference.

Two new test suites (test_exfiltration.cpp, test_massbalance.cpp) and
extensions to six existing test files; all 256 tests pass. Also includes
CMake wiring for BENCHMARK_DATA_DIR in the new executables.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The 0.1% value was left over from an earlier draft. The actual test and
CSV both use 5% because the KW no-flow branch on step 1 introduces a
~Q*dt systematic offset (~10% of V_in) that makes a tighter bound
incorrect by construction. Expand the basis comment to explain why.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The mass-balance criterion comment said "< 0.1%" but the actual tolerance
is 5%. Expand the comment to explain why: the KW no-flow branch on step 1
introduces a ~Q*dt systematic offset that makes a tighter bound incorrect
by construction.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The generated_by field said "scipy.integrate.solve_ivp (RK45)" but the
problem.integration_scheme and computation block both say "fixed-step RK4
with dx = -200 ft". Replace with the consistent description.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cbuahin and others added 30 commits June 22, 2026 21:48
Three operand-order/constant bugs made refactored runoff diverge from
legacy 5.3 at the ULP level, which the long surcharged user-model runs
amplified to macroscopic flow differences. All three fire on real model
paths; fixing them makes every user-model subcatchment RUNOFF byte-
identical to legacy (user1 58/58, user2 17/17, user3 168/168, user5
145/145) with zero regression.

- Runoff.hpp: MEXP Manning exponent 5.0/3.0 -> 1.6666667 (legacy
  subcatch.c:55 truncated literal). 5.0/3.0 = 1.66666666666666674 differs
  in the exponent by ~3.3e-8, a ~2.3e-7 relative error in every
  pow(xDepth, MEXP) (findSubareaRunoff + the RK45 getDdDt stages).
- Gage.cpp: VOLUME/CUMULATIVE rain conversion r/(interval/3600) ->
  r/interval*3600 (legacy gage.c:692). The former forms the non-
  representable 1/12 constant first and rounds differently.
- Runoff.cpp: Manning alpha PHI*w*sqrt/(N*area) -> PHI*w/area*sqrt/N
  (legacy subcatch.c:398 operand order) — the reassociation was 1 ULP.

Also report the .out subcatchment RUNOFF column time-interpolated between
the old/new WET_STEP values at the report-boundary weight (legacy
subcatch_getResults / output.c:323) instead of the raw constant WET_STEP
value, and use the legacy-convention reported node volume in the
system-storage continuity stat (SWMMEngine.cpp). Output-only; routing
inflow unchanged.
…n ponding)

Routing dynamic-wave momentum terms diverged from legacy 5.3 at the ULP
level via reciprocal/reassociation/library-call substitutions, which the
long surcharged user-model runs amplify. Matching legacy operand order
exactly (DynamicWave.cpp processManningLink) pushed every previously
near-floor model to bit-identical (FULL PARITY 13 -> 14, 0 near-floor):

- dq2/dq4/dq5/dqdh: divide by length directly instead of multiplying a
  precomputed 1/length reciprocal (dwflow.c:214/222/229/240; x/L != x*(1/L)).
- dqdh: group ((1/denom)*GRAVITY)*dt and /length, not (1/denom)*(dt*GRAVITY)
  *inv_len (dwflow.c:240) — dqdh feeds the surcharge node-depth Jacobian.
- qNorm (normal-flow limit): beta*a1*std::pow(r1,2./3.) with raw upstream
  hyd radius, not cbrt(r1*r1) with a FUDGE-clamped r1 (dwflow.c:675).
- Manning friction: use the raw interpolated rWtd in pow(rWtd,1.33333),
  dropping the non-legacy max(rWtd,FUDGE) clamp (dwflow.c:198/211).

NOTE: this commit also carries the in-progress 2D-coupled-junction ponding
changes that were uncommitted in the same file (coupled_node pond exception
in setNodeDepth/updateNodeFlows) — bundled here because an IDE buffer save
had reverted the momentum edits and they were re-applied on top.
…r-shape gate

Two more legacy-alignment fixes for the metric (SI) user models:

1. Input unit conversion (PostParseResolver): convert metric lengths to
   internal feet by DIVIDING by the forward factor (m / 0.3048), matching
   legacy `internal = display / UCF`, instead of multiplying by the
   precomputed reciprocal `m * (1/0.3048)` (the Ucf_inv perf shortcut).
   The two differ by up to 1 ULP (223.48/0.3048 = 733.2020997375328 vs
   *recip 733.2020997375326). That 1-ULP invert error breaks the exact
   flatness of an initial water surface (h1==h2), so the momentum emits a
   ~2e-15 flow that the under-relaxation sign-flip clamp (q = 0.001*SGN,
   dynwave.c:438) amplifies to a spurious 0.001 cfs, seeding divergence.
   This is a one-time parse-path change (no hot-loop cost; SoA/SIMD intact).
   Result: user3 first divergence moves p0 -> p15 (Δ 4e-6 -> 3e-11);
   user1 flowΔ 0.0248 -> 0.0169. US models unaffected (len==1 early return).

2. Froude full-guard (DynamicWave processManningLink): reproduce legacy
   link_getFroude (link.c:864-873) — zero Froude only for a CLOSED conduit
   within FUDGE of full; OPEN conduits get a real Froude even when full.
   Replaces the both-ends-full isFull gate. Correct legacy behavior for
   open/IRREGULAR channels that fill and closed pipes straddling the crown.
Complete the metric-conversion legacy-alignment (200ddcd): convert ponded
area + functional-storage a0/a1 by dividing by (UCF_L*UCF_L) / pow(UCF_L,2-b)
(legacy node.c:154), flow fields by /Qcf, and seepage by /UCF(RAINFALL),
instead of multiplying by precomputed reciprocals. One-time parse path
(no hot-loop cost). US models unaffected (early return).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…legacy)

Legacy dwflow_findConduitFlow calls link_getLossRate EVERY Picard iteration
for each non-dry conduit, and SKIPS DRY/UP_DRY/DN_DRY conduits via its
dwflow.c:162 early return (leaving their stored loss rate untouched). The
refactored computed conduit losses ONCE per step (Router::computeConduitLosses)
for ALL conduits with no flow-class gate, which:
  - leaked a spurious evap/seepage loss onto dry/up-dry conduits (e.g. user2's
    UP_DRY natural channel TW01250 → +2.5e-5 cfs onto storage node TW01240,
    where legacy adds ~0), and
  - froze the loss rate for the whole step instead of tracking the iterate.
Both seed divergence that surcharge amplifies.

Add DWSolver::recomputeConduitLosses, called per iteration after
computeLinkGeometry (so the flow class + current depth are set) and before
solveMomentumBatch/updateNodeFlows (which read the rate for dq6 and the node
loss term). It mirrors legacy link_getLossRate exactly: depth =
0.5*(oldDepth+newDepth), faithful transect top width (xsect::getWofY via
buildXSP), DW cap newVolume/tstep with comp*q/total clamp order, and the
DRY/UP_DRY/DN_DRY skip (retain prior rate). Router::step now skips the
once-per-step computeConduitLosses for DYNWAVE (kept for KINWAVE/STEADY,
which are non-iterative).

Result: user5 flowΔ 35.8 -> 15.7 (source seed 8.6e-6 -> 1.5e-8); user2
source seed 2.3e-6 -> 9.5e-7. No regression (14 FULL PARITY; the loss-free
extran/test models are untouched — loss rate stays 0).
…length fix)

The dry-conduit kernel (processDryLink) still formed dqdh as
dt_g*aMid * (1/length) — the reciprocal-multiply the momentum commit
(365e17f) replaced everywhere else. Legacy dwflow.c:171 divides:
GRAVITY*dt*aMid / length * barrels. Divide directly to match (dt_g ==
dt*GRAVITY by commutativity). dqdh is scattered into the node sumdqdh
denominator, so this keeps the dry-conduit contribution legacy-exact.

(Investigation note: user2's residual sumdqdh seed at storage node TW01210
is dominated NOT by this /length term [~1e-21] but by a ~2e-14 difference
in aMid — the IRREGULAR transect area looked up at the FUDGE flow depth of
the high-offset dry conduit TW01221 — i.e. sub-ULP geometry-interpolation
round-off, cascaded through sumdqdh and amplified by the sign-flip clamp.
That residual is the irreducible class; this commit only removes the
last reciprocal-multiply for completeness/legacy-alignment.)
…ode)

A prior parity commit left the tree non-compiling: DynamicWave references
ctx.coupled_node but SimulationContext no longer defined it, and the
NodeCoupling two-regime exchange had been reverted to the old single-regime
z_top gate (which blocks junction inlet capture).

Restore the complete, consistent coupling:
- SimulationContext: re-add the per-node `coupled_node` flag (DynamicWave
  already uses it for the pond-above-crown exception) — fixes compilation.
- NodeCoupling.computeCouplingExchange: two-regime flux — free-inlet capture
  (surface-depth gradient, one-way) below the crown, C1-blending to the
  bidirectional head-difference orifice at/above it.
- SurfaceRouter2D: auto-set a coupled junction's ponded_area to the
  surrounding 2D-cell median-dual footprint and flag ctx.coupled_node.

Verified: working tree compiles; test_2d_junction_coupling and
test_2d_outfall_coupling pass.

NOTE: this does NOT change the ~11% flow-routing continuity error on
examples/demo_road_culvert (a small ~1 m3 absolute imbalance localized to the
fully-surcharged interior culvert junctions; the 2D side and the coupling
transfer are conservative). That is a DW surcharge mass-balance characteristic
on a statically-submerged flat culvert, not a coupling regression.
Cover the GUI-editor getters/setters (pollutant units, aquifer ET pattern,
snowpack surfaces, inlet params/type, LID layers incl. pavement/drainmat +
type) with bit-exact C round-trips and Python round-trips. The C++ test was
already registered in tests/unit/engine/CMakeLists.txt but the file itself
was missing, so a clean build could not configure that target.
Add Solver.write_with_plugin(path, output_plugin_id) and the convenience
Solver.write_geopackage(path, crs=...) so a loaded model can be written to a
.gpkg from Python (the C API swmm_model_write_with_plugin already existed but
only ModelBuilder wrapped it). write_geopackage(crs="EPSG:2284") applies the
CRS via solver.spatial.crs first, so every feature layer is tagged with that
SRS; without a CRS the geometries get an undefined SRS (srs_id 0) and GIS
tools cannot place them.

Expose the GEOPACKAGE_PLUGIN_ID constant and correct the stale plugin-id
example in openswmm_model.h (the real id is
org.hydrocouple.openswmm.plugins.geopackage). Tests in
tests/engine/test_geopackage_export.py; CHANGELOG updated.
…w z_top)

Replace the two-regime free-inlet/surcharge blend in computeCouplingExchange
with the user-directed capped-pipe model: a bidirectional gradient orifice
(h_2d - h_1d) gated by a C1 Hermite ramp on max(h_1d, h_2d) above the
surcharge HGL z_top = crown + sur_depth. Below z_top the node pressurises
internally over its auto-sized ponded shaft with no exchange (no spilling,
no capture); the cover only connects the domains once water overtops it.

Drops the now-unused 2D bed elevation z_2d (driver is the head difference,
not surface ponding depth). Doc 1D_2D_COUPLING_CONFIGURATION.md updated to
the capped-pipe model (regimes, elevation ladder, ponded-shaft store, the
mesh-bed = z_top geometry-consistency requirement).

Coupling tests + full 2D suite pass.
…for the slot

Per review: gate the 1D<->2D exchange at the pipe crown (the slot-engagement
point) instead of crown + sur_depth. This decouples the inlet threshold from
sur_depth, so a junction with a Preissmann-slot surcharge band (SurDepth > 0)
still captures surface water at the crown instead of having its gate pushed up
above the 2D surface (which previously zeroed all capture). sur_depth now solely
sizes the 1D slot's surcharge-storage headroom above the crown.

effectiveArea widens at the crown; gate is the C1 Hermite capRamp on
max(h1d,h2d) - crown. (Gating at SLOT_CROWN_CUTOFF*full_depth, below the crown,
was tried first but destabilised CVODE by exchanging while the pipe was still
sub-full; the crown is the stable threshold.)

Doc 1D_2D_COUPLING_CONFIGURATION.md updated: gate at crown, sur_depth sizes slot
headroom, geometry-consistency note. Coupling tests + full 2D suite pass.
… volume

The 2D<->1D coupling is applied as a signed lateral inflow on the node
(coupling_inflow -> lat_flow). The 2D->1D drain (Q>0, extracts from the 2D
mesh) was already bounded by the receiving-cell flooded volume; the 1D->2D
spill (Q<0, extracts from the 1D node) was unbounded — the head-driven orifice
could withdraw more than the node holds, putting phantom water on the 2D
surface and driving the 1D node volume negative.

Add the symmetric bound: cap |Q| for the spill by the node's FLOODED volume
above the crown (the ponded/surcharge store), in SI units. An exchange now
moves only water that exists at the source in both directions; the pipe's
below-crown in-line flow is not spillable. Coupling tests + full 2D suite pass.
Add a Windows-cuda CMake preset and wire up the full CUDA build chain:

- CMakePresets.json: add Windows-cuda and Windows-cuda-debug configure
  presets (Ninja Multi-Config, MSVC host, VCPKG_MANIFEST_FEATURES=2d;gpu-cuda,
  CMAKE_CUDA_ARCHITECTURES=89 for Ada Lovelace ADA89) and matching build presets.

- vcpkg-overlays/kokkos/portfile.cmake: on Windows + cuda feature, bypass
  Kokkos' nvcc_wrapper/Clang requirement by setting
  KOKKOS_COMPILER_IS_KOKKOS_LAUNCH_COMPILER=ON. This enables cmake's native
  CUDA language (enable_language(CUDA)) inside the Kokkos build so nvcc
  compiles device code while MSVC compiles host C++. Previously the workaround
  set CMAKE_CXX_COMPILER=nvcc, which caused cmake to identify the compiler as
  MSVC (nvcc reports host version) and inject MSVC-style flags (-std:c++20)
  that nvcc rejects. Also set CMAKE_CUDA_STANDARD=20 for the CUDA language
  sub-configure, and drop the bin/ directory (no nvcc_wrapper on Windows;
  consumers use cmake CUDA language directly).

- src/engine/2d/gpu/CMakeLists.txt: on WIN32, enable_language(CUDA) before
  building the openswmm_gpu_cuda plugin, set CUDA_STANDARD 20 and
  CUDA_SEPARABLE_COMPILATION, and tag GpuPluginCuda.cpp and
  CvodeKokkosSurfaceSolver.cpp as LANGUAGE CUDA so nvcc compiles them while
  MSVC handles the remaining translation units.
… CUDA build

The previous portfile used KOKKOS_COMPILER_IS_KOKKOS_LAUNCH_COMPILER=ON which
does not exist in Kokkos 4.7.04. The cmake/kokkos_test_cxx_std.cmake check at
line 140 only accepts KOKKOS_CXX_COMPILER_ID==NVIDIA or ==Clang -- there is no
bypass variable in this version.

The correct Kokkos 4.7 built-in option is Kokkos_ENABLE_COMPILE_AS_CMAKE_LANGUAGE
(declared in cmake/kokkos_enable_options.cmake:87). When ON, kokkos_compiler_id.cmake
checks CMAKE_CUDA_COMPILER (nvcc) instead of CMAKE_CXX_COMPILER (cl.exe) for
the nvcc --version probe, which finds nvcc in the output and sets
KOKKOS_CXX_COMPILER_ID=NVIDIA. The line-140 check then passes without requiring
nvcc_wrapper or Clang. kokkos_arch.cmake also propagates Kokkos_ARCH_* selections
to CMAKE_CUDA_ARCHITECTURES in this mode, so the arch is handled by cmake's
native CUDA language infrastructure.
…H not in vcpkg env

vcpkg sanitizes the cmake subprocess environment when running portfile scripts,
so OPENSWMM_KOKKOS_CUDA_ARCH (set in PowerShell and the cmake preset) does not
reach the portfile. Kokkos' GPU auto-detection also fails in the sandboxed build
because cmake cannot execute a GPU binary during configure.

Fall back to Kokkos_ARCH_ADA89=ON (Ada Lovelace SM 8.9) which matches the
project target GPU (NVIDIA RTX 2000 Ada). The env var is still checked first so
users with different GPUs can override it; also check OPENSWMMENGINE_KOKKOS_CUDA_ARCH
as a secondary env var name. Document the VCPKG_CMAKE_CONFIGURE_OPTIONS triplet
approach as the proper override mechanism.
CUDA 13.x CCCL headers require the MSVC standard-conforming preprocessor.
Without /Zc:preprocessor, every Kokkos compilation unit fails with:
  cuda/std/__cccl/preprocessor.h(23): fatal error C1189
vcpkg_cmake_configure enforces that VCPKG_CXX_FLAGS and VCPKG_C_FLAGS
must both be set or both unset.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cmake's CUDA language does not auto-wrap CMAKE_CXX_FLAGS into -Xcompiler
for nvcc invocations. The flag must be injected through CMAKE_CUDA_FLAGS
so nvcc explicitly passes it to cl.exe as the host compiler.
KokkosConfig.cmake calls find_dependency(CUDAToolkit) when Kokkos was
built with CUDA. vcpkg sandboxes the cmake env so nvcc is not on PATH.
Probe for the CUDA Toolkit and supply CUDAToolkit_ROOT explicitly.
Strip zero-value debugging instrumentation now that the refactored solvers
are validated, leaving the runtime paths unchanged:

- hydraulics/DynamicWave: drop the env-gated bit-parity trace
  (SWMM_TRACE_RSTEP / SWMM_TRACE_FILE, maybeInitTrace/dumpTrace and the
  per-iteration %.17g dump) used to diff refactored vs legacy doubles.
- 2d/solver/CvodeSurfaceSolver: drop the AMG-fallback stderr warning and the
  OPENSWMM_2D_CVODE_STATS cumulative-stats dump.
- 2d/gpu/CvodeKokkosSurfaceSolver: drop the AMG-fallback stderr warning.
- 2d/solver/SurfaceSolverFactory: drop the warn_if_broken plugin-load
  diagnostics, simplifying try_plugin to silently skip absent/broken plugins.
Make the Kokkos-OpenMP GPU plugin and hypre BoomerAMG explicit in every
engine-building CI job (unit_testing, deployment, regression_testing)
instead of relying solely on implicit vcpkg default-features + ON-by-default
CMake options:
 - pass VCPKG_MANIFEST_FEATURES "...;gpu;hypre" and
   -DOPENSWMM_BUILD_GPU_PLUGIN=ON -DOPENSWMM_GPU_BACKEND=omp
   -DOPENSWMM_WITH_HYPRE=ON
 - add a post-build/install gate that fails the job if the Kokkos GPU plugin
   binary is absent (find_package(Kokkos/HYPRE REQUIRED) already gates the
   link at configure time)

Guards against a silent default-flip dropping Kokkos/AMG (the stale
Kokkos-free / no-hypre bundle class of regression).
…ts it

The docs CI builds without compiling the Cython extension, so conf.py's
meta-path hook execs the .pyi stubs as real modules. A bare annotation
(GEOPACKAGE_PLUGIN_ID: str) only populates __annotations__ and never binds
the name, so `from ._solver import ... GEOPACKAGE_PLUGIN_ID` in __init__.py
raised ImportError and cascaded into 48 autodoc warnings (= -W build failure).

Give the stub constant its real value (matching _solver.pyx) so it binds
when the stub is exec'd.
…uild

The Dynamic Preissmann Slot is gated on the conduit MIDPOINT depth
(hs_iter = max(depth_mid - y_full, 0)), per the documented head-first
formulation. The two AASkipFlagTest DPS fixtures contradicted this:

- DPS_AsAccumulatesPositively surcharged only the upstream node, so
  conduit 0's midpoint never reached the crown and As stayed 0.
- DPS_PastAsInvariantUnderPDecay let the free outfall drain the chain
  below crown mid-step, so updateDPSState reset the slot.

Rewrite both to hold the chain at a uniform head (zero gradient, no
drainage) with a junction downstream boundary, so the surcharge is
sustained and the slot actually accumulates - exercising the path the
tests claim to verify. Engine logic unchanged.

Windows CI failed before building: the hypre overlay pulled external
lapack -> lapack-reference, whose Fortran dep (vcpkg-gfortran/LLVMFlang)
put a GNU-driver clang.exe on PATH; the x64-windows triplet's MSVC-style
flags then broke CMake's C-compiler probe. Use hypre's bundled C
BLAS/LAPACK on Windows only and gate the blas/lapack deps to !windows;
Linux/macOS keep external reference LAPACK.
…rge NaN

The two AASkipFlagTest DPS tests still failed (As=0) after the prior fixture
rewrite. Built and ran locally to find the real cause: the AASkipFlagTest
fixture never set xsect_r_full, so once a conduit surcharges, applyDPSGeometry
overrides hrad_mid with r_full=0 and the Manning friction term divides by
zero. The resulting NaN depths make (depth_mid > y_full) false, so
updateDPSState takes the depressurization branch and silently resets the
accumulated slot area to 0.

Set xsect_r_full = 0.5 (D/4 for the 2 ft circular pipe) in the fixture, and
drive DPS_AsAccumulatesPositively with both junctions surcharged + sustained
inflow (free outfall kept as the outlet so the continuity solve stays
well-conditioned). Both DPS tests and the other 8 AASkipFlagTest cases now
pass locally (70/70 in test_engine_routing).
Once the lapack/hypre fix lets the Windows build progress past dependency
resolution, the Kokkos-OpenMP GPU plugin hits two latent, MSVC-only failures:

1) C2375 redefinition/different linkage: GpuPluginAbi.h declared the two ABI
   entry points as plain extern C, but every plugin .cpp defined them with
   __declspec(dllexport). Move the export marker (OPENSWMM_GPU_ABI) into the
   header, gated on OPENSWMM_GPU_PLUGIN_BUILD, so declaration and definition
   agree. The core resolves these via dlsym/GetProcAddress and never
   link-imports them. Applied to all five plugin variants.

2) C3861 omp_get_level not found: Kokkos OpenMP backend needs OpenMP 3.0+
   routines MSVC legacy /openmp lacks. Compile the omp plugin with /openmp:llvm
   on MSVC.

Verified on Linux: stub plugin builds under -fvisibility=hidden and exports
both ABI symbols; header includes cleanly without the plugin define. The
MSVC-specific behavior must be confirmed by CI.
…fault

Commit 87b4c92 ("remove debug/diagnostic scaffolding") deleted the
"[openswmm 2D] using GPU backend: %s" line from makeSurfaceSolver along with
the noisier warn-if-broken diagnostics. test_engine_2d_omp_default asserts on
that exact line (PASS_REGULAR_EXPRESSION "using GPU backend: omp"), so it has
been failing ever since with the backend signal gone.

Restore just the single informational line on successful plugin selection
(the louder fallback/broken-plugin warnings stay removed). The line prints the
chosen backend label, which both satisfies the test and gives operators a
record of which 2D acceleration backend actually ran.

Note: the 2D path requires SUNDIALS/Kokkos, which can't be built in the dev
sandbox here; the change is a one-line fprintf over an already-computed label,
to be confirmed green by CI.
With hypre now building on Windows, HypreAmgPreconditioner.cpp compiles with
OPENSWMM_HAVE_HYPRE and references HYPRE_* symbols. The shared openswmm_engine
links HYPRE::HYPRE, but the internal STATIC library (openswmm_engine_internal,
which MSVC unit-test exes link instead of the shared lib) was given the 2D
sources and the OPENSWMM_HAVE_HYPRE define but never linked HYPRE. Result on
Windows: every test exe pulling HypreAmgPreconditioner.obj failed with
LNK1120 / 27 unresolved HYPRE_* externals.

Link HYPRE::HYPRE PRIVATE into openswmm_engine_internal under
OPENSWMM_WITH_HYPRE, mirroring the adjacent SUNDIALS/HDF5 handling. CMake
propagates a static lib's PRIVATE deps to consumers' link lines, so the test
exes now resolve the HYPRE symbols. Linux/macOS were unaffected (their tests
link the shared engine). CMake re-configures cleanly; MSVC link to be
confirmed by CI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants