Skip to content

Feature: add SLAYER and GGJ tearing growth rates#238

Draft
d-burg wants to merge 73 commits into
developfrom
feature/tearing-growthrates
Draft

Feature: add SLAYER and GGJ tearing growth rates#238
d-burg wants to merge 73 commits into
developfrom
feature/tearing-growthrates

Conversation

@d-burg
Copy link
Copy Markdown
Collaborator

@d-burg d-burg commented May 18, 2026

Summary

Adds tearing-mode growth-rate analysis as a new Tearing umbrella module under src/Tearing/, with three layers:

  • InnerLayer — inner-layer Δ(Q) physics: GGJ (Pletzer–Dewar) and SLAYER (Fitzpatrick Riccati) models, with switchable Spitzer/neoclassical resistivity.
  • Dispersion — physics-agnostic root finder: SurfaceCoupling, MultiSurfaceCoupling, CoupledFull (2m×2m det(D′−D(γ))), and CoupledFortranMatch residuals; AMR Q-plane scan + triangulation-based growth-rate extraction; spurious-root detection (geom + γ-gap + polyline concavity)
  • Runner — TOML-driven orchestration (Tearing.Runner.run_slayer), kinetic-profile loading, HDF5 output.

Also includes the supporting work that landed alongside:

  • Full 2m×2m D′ matrix exposed from ForceFreeStates via delta_prime_raw / pest3_decompose
  • New Riccati.jl solver in ForceFreeStates for the ideal plasma response (with ~30–40% perf work)
  • Utilities: NeoclassicalResistivity, KineticProfiles, PhysicalConstants
  • Regression cases: solovev_slayer_n1, tj_epsilon_pole
  • LAR β-scan / ε-scan and TJ_epsilon_pole examples

73 commits, ~+13.9k / −0.75k LOC across 92 files.

Relationship to perf/riccati

The inner-layer growth-rate solvers here consume the outer-region Δ′ produced by the Riccati-based ideal-MHD solver on perf/riccati. This branch is designed to subsume perf/riccati once that work merges: it currently shares a common ancestor (c6c845ff) and is 39 commits ahead but 11 commits behind perf/riccati. Plan is to merge the final state of perf/riccati in here once it's ready to merge and before this PR moves out of draft, so the final history includes the outer-region Δ′ improvements that the dispersion residuals depend on.

Current status — WIP / draft

  • Core stack (InnerLayer + Dispersion + Runner) and regression cases pass on Solovev and TJ-like analytical equilibria.
  • Latest commit (528062f8) changes chooser_overrides from discard-on-(geom ∧ gap) to warn-and-keep, which fixed 7/8 mis-chosen roots in the SPARC β-scan; not yet validated against q95-scan, IBS_AT-scan, or DIII-D 147131 benchmarks.
  • Filtered-root HDF5 subgroup now records only legacy-rejected roots; geom/gap-warned roots live in valid_roots with flags.
  • Pending: integrate the soon-to-be finalized state of perf/riccati.

logan-nc and others added 30 commits February 28, 2026 01:03
…(1.6x speedup on Solovev)

Implements the dual Riccati matrix S = U₁·U₂⁻¹ as a faster alternative to the standard
Euler-Lagrange ODE integration. Enable with `use_riccati = true` in jpec.toml.

Integration strategy: uses `sing_der!` (same ODE RHS as standard) with periodic Riccati
renormalization S = U₁·U₂⁻¹, U₂ = I in the callback when column norms exceed ucrit. This
is mathematically equivalent to the explicit Riccati ODE (dS/dψ = B + A·S - S·D - S·C·S)
but numerically stable: the explicit Riccati ODE has quadratic blowup for explicit solvers
when K̄·S >> Q, while sing_der! + renorm tracks the bounded ratio S = U₁/U₂.

The Riccati crossing (`riccati_cross_ideal_singular_surf!`) skips Gaussian reduction (which
can produce NaN/Inf when S is near-zero near the axis) and uses `ipert_res` directly.

Benchmarks on Solovev example (N=8, 1 singular surface):
  Standard ODE: 83.7 ms, 157 steps
  Riccati ODE:  51.4 ms, 121 steps  (1.63x speedup, 0.006% energy difference)

See: Glasser (2018) Phys. Plasmas 25, 032507 — Eq. 19

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## Part 1: Δ' output (tearing stability parameter)
- Add `delta_prime::Vector{ComplexF64}` to `SingType`
- Add `compute_delta_prime_from_ca!` in EulerLagrange.jl, called at end of
  `eulerlagrange_integration` (standard path only — see normalization note below)
- Write `singular/delta_prime` as (msing × n_modes) ComplexF64 to HDF5 output in JPEC.jl
- Riccati path does NOT compute delta_prime: ca_l is accumulated in (S,I) normalization
  which is inconsistent with the Δ' formula (standard (U1,U2) normalization required)

## Part 2: Parallel Fundamental Matrix (FM) integration
- Add `ChunkPropagator` struct (two N×N×2 blocks for identity-block ICs) in Structs
- Add `use_parallel::Bool = false` control flag in ForceFreeStatesControl
- Add `integrate_propagator_chunk!` — integrates each chunk from IC=(I,0) and IC=(0,I)
  independently using BS5 solver, no callback; suitable for Threads.@threads
- Add `apply_propagator!` — in-place 2×2 block matrix multiply on odet.u
- Add `balance_integration_chunks` — sub-divides chunks using ode_itime_cost for
  load-balanced parallel work; target = max(2*msing+3, 4*nthreads)
- Add `ode_itime_cost` — log-divergent cost model from STRIDE (Glasser 2018)
- Add `parallel_eulerlagrange_integration` — parallel phase with Threads.@threads,
  serial assembly calling renormalize_riccati_inplace! before each crossing (needed
  because apply_propagator! gives general (U1,U2) state but riccati crossing expects
  (S,I) form); uses ipert_res-direct zeroing to correctly identify the resonant column
- Dispatch from eulerlagrange_integration: use_parallel → use_riccati → standard

## Tests (29 total: 11 Riccati + 18 Parallel FM)
- runtests_riccati.jl: update Δ' test — only standard path populates delta_prime
- runtests_parallel_integration.jl (new): ChunkPropagator identity/linearity,
  balance_integration_chunks count/coverage/crossings, ode_itime_cost additivity,
  parallel FM energy match (rtol=2%, Solovev)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rallel tests to suite

Δ' is now computed inline in riccati_cross_ideal_singular_surf! using the diagonal
formula on the bounded (U₁, U₂) state (max ≤ ucrit, no GR permutation). This gives
physically correct values: 57.3 and -4.03 for the two Solovev singular surfaces.

The standard path does not populate delta_prime — Gaussian Reduction inflates the
resonant column's asymptotic coefficients, making ca_l non-physical regardless of
when it is computed. A comment in cross_ideal_singular_surf! explains the limitation.

Also adds runtests_riccati.jl and runtests_parallel_integration.jl to the default
test suite (runtests.jl). Both were previously excluded.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… lacks Δ'

The comment in cross_ideal_singular_surf! previously said the issue was GR
"normalization inflation." The real reason is more subtle: Δ' is a complex,
normalization-convention-dependent quantity. The Riccati renormalization (U₂→I)
continuously phases solution columns into a specific gauge where the diagonal
formula (ca_r - ca_l)/denom gives physically meaningful values. The standard
path's solution columns grow from the axis with an arbitrary complex phase;
dividing by the outer asymptotic coefficient normalizes the magnitude but not the
complex phase, producing a value in a different convention that does not match
what SingularCoupling.jl expects.

Also reverts the failed attempt to compute Δ' in cross_ideal_singular_surf! via
perm_col + A_outer normalization, which produced -0.10-0.54i vs the Riccati
57.3+58.3i (same physical quantity, incompatible conventions).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…large-N documentation

1. Add SingType.delta_prime_col (N × n_res_modes Matrix) storing the full column
   (ca_r[:, ipert_res, 2] - ca_l[:, ipert_res, 2]) / (4π²·psio) at each crossing.
   The diagonal element matches delta_prime[i] exactly. Off-diagonal elements give
   intra-surface coupling of all N modes to each resonant mode through the singular
   layer asymptotic expansion. Only populated for Riccati/parallel FM paths.

2. Add singular/m, singular/n, singular/delta_prime_col HDF5 outputs so downstream
   users can access the full off-diagonal Δ' without needing to index ca_left/ca_right.

3. Document the known numerical limitation of the parallel FM path for large N:
   FM propagators become ill-conditioned for N ≳ 20 without QR orthogonalization,
   causing ~10% energy error for DIIID (N=26) with no wall-clock speedup over Riccati.
   Deferred fix: bidirectional integration or continuous QR (noted in docstring/tests).

4. Update outer-plasma Riccati re-integration (already committed) docstring to match.

Tests: 50/50 Riccati+parallel, 84/84 EulerLagrange all pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…trix

Implements the STRIDE global boundary value problem for computing the full
2·msing × 2·msing inter-surface tearing stability matrix. Each entry gives
the U₂[ipert_res] response amplitude at one surface boundary when driving
with unit amplitude at another, encoding cross-surface coupling.

Changes:
- Riccati.jl: add assemble_fm_matrix (chunk FM product) and
  compute_delta_prime_matrix! (BVP assembly + solve via STRIDE formulation
  from Glasser 2018 Phys. Plasmas 25, 032501 Sec. III.B); call from
  parallel_eulerlagrange_integration
- ForceFreeStatesStructs.jl: add delta_prime_matrix field to
  ForceFreeStatesInternal with docstring
- JPEC.jl: write delta_prime_matrix to singular/delta_prime_matrix in HDF5
- test/runtests_parallel_integration.jl: add delta_prime_matrix regression
  test (shape, finiteness, non-zero diagonal); 30 tests total (was 23)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… for large-N accuracy

The all-forward parallel FM path had ~10% energy error for large-N problems
(DIIID N=26, n=1) because the chunk immediately before each rational surface
crossing integrates into exponentially growing solution territory, producing
an ill-conditioned FM propagator.

Fix: integrate the crossing chunk *backward* (psi_end → psi_start). Solutions
that grow exponentially forward decay backward, yielding a well-conditioned
backward FM Φ_bwd. The accurate forward propagation is recovered as Φ_bwd⁻¹
via a stable LU solve in apply_propagator_inverse!.

The same backward FM is used directly in the Δ' BVP (compute_delta_prime_matrix!)
as Phi_L[j], splitting each ill-conditioned inter-surface FM product into
well-conditioned Phi_R (forward chunks) and Phi_L (backward crossing chunk).

Changes:
- IntegrationChunk: add direction::Int=1 field (+1 forward, -1 backward)
- chunk_el_integration_bounds: add bidirectional=false kwarg; crossing chunks
  get direction=-1 when true
- balance_integration_chunks: left sub-chunk always direction=1; right inherits
  chunk.direction so the near-singularity chunk stays backward after splitting
- integrate_propagator_chunk!: reverses tspan for direction=-1 chunks
- apply_propagator_inverse!: new function, LU solve Φ_bwd·x = u_old
- Serial assembly: branches on chunk.direction (inverse vs forward apply)
- parallel_eulerlagrange_integration: passes bidirectional=true
- compute_delta_prime_matrix!: BVP now uses Phi_R·x_right - Phi_L·x_left = 0
  at each junction instead of ill-conditioned monolithic Phi_segs product
- assemble_fm_matrix: safe for empty idx_range (uses propagators[1] for N)

Results (et[1] stability eigenvalue):
  Solovev N=8:   0.006% error (was already fine)
  DIIID   N=26:  0.236% error (was ~10.5% — 44× accuracy improvement)

Tests: 31/31 pass in runtests_parallel_integration.jl (+1 DIIID accuracy test)
       18/18 pass in runtests_riccati.jl (unchanged)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds benchmarks/benchmark_threads.jl to measure wall-clock time and
accuracy of the standard, Riccati, and parallel FM integration paths
across varying thread counts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three fixes from code review of PR #178:

- assemble_fm_matrix: add explicit isempty guard before the propagator
  loop so an empty idx_range (e.g. i_crossings[1]==1) returns the
  identity matrix without relying on the loop falling through silently.

- compute_delta_prime_matrix!: add @Assert at function entry that all
  singular surfaces have exactly one resonant mode, so multi-resonance
  surfaces fail loudly instead of silently using only sp.m[1]/sp.n[1].

- runtests_parallel_integration.jl: remove stale comment that described
  large-N FM ill-conditioning as an open problem with ~10% energy error;
  bidirectional integration (now the default for use_parallel=true) has
  resolved this.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…er! and compute_delta_prime_from_ca!

Two developer benchmark scripts for verifying the two dead-code reference
implementations flagged in the Claude code review of PR #178:

benchmarks/benchmark_riccati_der.jl
  Verifies riccati_der! correctly evaluates Glasser 2018 Eq. 19:
    dS/dψ = w†·F̄⁻¹·w - S·Ḡ·S,  w = Q - K̄·S
  Uses Hermitian test states (physical constraint: the EL system preserves
  S†=S from the axis) and compares riccati_der! against manual evaluation
  of the same formula using the ffit splines directly.
  Observed error: ~1e-17 (machine epsilon). No TOML flags needed.

benchmarks/benchmark_delta_prime_methods.jl
  Verifies compute_delta_prime_from_ca! gives bit-for-bit identical Δ'
  values to the inline computation in riccati_cross_ideal_singular_surf!.
  Both apply the same diagonal formula to the same ca_l/ca_r arrays, so
  the result must be exactly zero difference.
  Observed difference: 0.0 (exact). No TOML flags needed.

Neither script requires new TOML flags: they call internal functions directly
without going through ForceFreeStatesControl. Developer-only knobs belong in
scripts, not in user-facing config.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tring

The previous "O(Δψ)" phrasing in the Integration Strategy section read as a
global accuracy statement, suggesting the Riccati path is only first-order
accurate. This is wrong: the method integrates the linear EL ODE with Tsit5
(5th-order) and recovers S = U₁·U₂⁻¹ by exact renormalization, achieving
the full ODE solver reltol.

Rewrite the section in three clearly labelled parts:
- Why riccati_der! (quadratic ODE) is avoided: relative error control is
  unfaithful when |S| is large, not a step-size problem, not fixable by
  adaptation without an implicit solver.
- What the implementation actually does: sing_der! (linear ODE, exact RHS),
  Tsit5 (5th-order), exact renormalization, same global accuracy as standard.
- Local consistency analysis: the O(Δψ) expansion is retained but now
  labelled explicitly as a consistency check, not an accuracy claim.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…setup + new unit tests

Two changes in one pass:

Shared setup (performance):
  equil (Grad-Shafranov) and ffit (metric matrices) are now built once and
  shared across all integration-dependent testsets via a make_solovev_intr
  helper for cheap fresh intr construction. Previously, setup_equilibrium +
  make_metric + make_matrix ran 4 times and riccati_eulerlagrange_integration
  ran 3 times. Now each runs once, cutting total test time significantly.

New unit tests (dead code coverage):
  "riccati_der! formula — Glasser 2018 Eq. 19": verifies riccati_der!
    correctly evaluates dS/dψ = w†F̄⁻¹w − SGS at several ψ points using
    Hermitian test states (physical constraint). Agrees with manual formula
    evaluation to machine precision (~1e-17). No extra integration needed.

  "compute_delta_prime_from_ca! matches inline Δ'": verifies the standalone
    Δ' formula gives bit-for-bit identical results to the inline computation
    in riccati_cross_ideal_singular_surf!. Reuses the shared odet_ric.

Total: 23 tests (was 18), runtime ~51s (was ~80s+ with redundant setup).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dd 3 unit tests

- Delete unused parallel_threads field from ForceFreeStatesControl: the field was
  silently ignored (Threads.@threads uses JULIA_NUM_THREADS at startup, not a runtime
  field). Removes false impression that thread count can be set from jpec.toml.

- Add apply_propagator_inverse! round-trip unit test: verifies Φ⁻¹·Φ = I algebraically,
  complementing the existing apply_propagator! identity and linearity tests.

- Add chunk_el_integration_bounds direction field test: verifies bidirectional=true
  sets direction=-1 on crossing chunks and direction=+1 on non-crossing chunks, and
  that balance_integration_chunks preserves direction correctly (right sub-chunk inherits,
  left sub-chunk always +1). Catches direction propagation regressions.

- Add delta_prime_matrix DIIID regression test: verifies the STRIDE BVP Δ' matrix is
  finite and non-zero for the large-N case (N≈26, multiple rational surfaces), where
  ill-conditioned (non-bidirectional) FM propagators would produce NaN/Inf entries.

56/56 parallel integration tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…f/riccati

One conflict resolved in src/GeneralizedPerturbedEquilibrium.jl (was src/JPEC.jl):
write_outputs_to_HDF5 vacuum section. Resolution:
- Keep new Δ' HDF5 outputs from perf/riccati (singular/m, singular/n,
  delta_prime, delta_prime_col, delta_prime_matrix)
- Adopt develop's vacuum output format: vac_data variable name,
  plasma_pts/wall_pts fields (3D Cartesian), y_plasma/y_wall entries,
  always-write pattern with empty arrays

All other files (EulerLagrange.jl, ForceFreeStatesStructs.jl, runtests.jl)
auto-merged cleanly. Default HDF5 filename updated to gpec.h5.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…um references post-rename

Update test files and benchmarks to use the new package name and config filename
(gpec.toml) following the GPEC rename merged from develop:
- test/runtests_riccati.jl
- test/runtests_parallel_integration.jl
- benchmarks/benchmark_threads.jl
- benchmarks/benchmark_riccati_der.jl
- benchmarks/benchmark_delta_prime_methods.jl

23/23 riccati tests and 56/56 parallel integration tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…page

Creates docs/src/stability.md covering the ForceFreeStates module:
- Newcomb/DCON ideal MHD stability criterion with paper citations
  (Glasser 2016 Phys. Plasmas 23 112506, 2018a 032507, 2018b 032501)
- Standard, Riccati, and parallel FM integration methods
- Bidirectional integration strategy for large-N accuracy
- Δ' tearing parameter: per-surface (delta_prime/delta_prime_col)
  and inter-surface matrix (delta_prime_matrix / STRIDE BVP)
- Configuration reference, API autodocs block, example usage

Adds page to docs/make.jl navigation and cross-links from equilibrium.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…kdown links)

1. Add Random stdlib to Project.toml [deps] and [compat] — required by
   runtests_riccati.jl but missing from declared dependencies, causing
   CI failure with "Package Random not found in current path".

2. Fix docstring markdown in Riccati.jl and ForceFreeStatesStructs.jl:
   - Wrap bare [array_notation] (link text) immediately followed by
     (description) (parsed as URL) in code fences to prevent Documenter
     from treating them as broken local links.
   - Affected: assemble_fm_matrix BVP unknowns block, Phi_L/Phi_R
     equations, and VacuumData plasma_pts/wall_pts field descriptions.

These were surfaced by the new @autodocs block in stability.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nt logging

Three targeted fixes from pre-merge code review:

1. Threads.@threads :static — since Julia 1.7, the default :dynamic
   scheduler can migrate tasks between OS threads mid-execution, making
   Threads.threadid() unreliable for indexing into odet_proxies. Using
   :static guarantees a 1:1 task-to-thread mapping for the parallel FM
   integration phase.

2. outer_chunk psi_end guard — the outer-plasma re-integration in
   parallel_eulerlagrange_integration now uses psilim*(1-eps) to match
   the guard applied by chunk_el_integration_bounds, avoiding a potential
   boundary evaluation at exactly psilim.

3. Replace println with @info/@warn — all verbose-mode output in Riccati.jl
   now uses Julia logging macros, consistent with EulerLagrange.jl. This
   enables log-level filtering and suppression in tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adapted from R. Fitzpatrick's TJ code.  tj_run integrates the (ψ, g₂, H₁, H₁', f₃)
shape ODE and returns an InverseRunInput with Shafranov-shifted-circle flux surfaces;
tj_run_direct builds a 257×257 ψ(R, Z) grid and returns a DirectRunInput so the
equilibrium is processed by the same direct-GS pipeline used for TJ geqdsks.
Direct-GS path includes the εa³·L(r)·cos(w) / −εa³·L·sin(w) shape terms in the
(R, Z) → (r, w) Newton inversion (EFIT.cpp) and reproduces the ideal-kink pole
approach at ε ≈ 0.66 to sub-percent accuracy vs the TJ geqdsk branch.

Also fixes:
* lar_run and tj_run: pass ψ_N (not physical r) as InverseRunInput.rz_in_xs
  per the struct contract — silently worked only when lar_a = 1
* dψ/dr normalization: a² not a (broken for any a ≠ 1)
* Restores dy[1], dy[2] in lar_der that were dropped mid-session
… matrix

Adds parallel_eulerlagrange_integration and riccati_eulerlagrange_integration
driving a STRIDE-style global BVP for the multi-surface Δ' matrix
(singular/delta_prime_matrix, shape msing × msing after PEST3 four-term combination).
Bidirectional FM integration and Double64 BVP solve for well-conditioned large-N.

Also:
* eulerlagrange_integration now returns 4-tuple (odet, propagators, chunks, S_left);
  call sites updated in tests and benchmarks
* Gate @info diagnostic dumps in Sing.jl and Riccati.jl behind ctrl.verbose
* Restore SingularException guard in findmax_dW_edge!
* Remove empty cross_kinetic_singular_surf() stub and dead kmsing/kinsing fields
* examples/LAR_epsilon_scan and LAR_beta_scan: TJ-analytic scans with power-law-
  warped grids (dense near pole); epsilon uses Option B tj_direct path
* examples/TJ_epsilon_pole_example: minimal near-pole (ε = 0.66) config used by
  the regression harness
* regression-harness/cases/tj_epsilon_pole.toml: anchors Δ' matrix and δW_t
  near-pole values so εa³·L regressions in tj_run_direct are caught
* test/runtests_tj_analytic.jl: 16 assertions covering tj_run, tj_run_direct,
  and the ψ(R, Z) endpoint consistency
* Project.toml: remove unused JSON and Random (no imports in src/)
* Remove EquilibriumConfig.use_galgrid (Galerkin grid feature removed upstream)
* Restore .github/workflows/auto-merge.yaml
* Fix jpec.toml → gpec.toml in Riccati.jl docstrings
* Scrub 'See sas_flag' comments → set_psilim_via_dmlim across gpec.toml examples
* docs/src/stability.md: update API example to 4-tuple and Δ' matrix shape
* docs/src/equilibrium.md: remove dangling splines.md / vacuum.md links
* examples/LAR_*_scan: update headers and delete unused lar.toml stubs
…omments and docs

Line numbers drift as soon as upstream is edited.  Replace cross-references like
'Equilibrium.cpp rhs_chooser=1 dy[1]', 'sing.F lines 394-398', 'ode.F:1020',
'Riccati.jl:691', etc. with prose referring to 'Fortran STRIDE' or 'TJ' and the
file name only.  No functional changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolves conflicts with recent develop work:

- ForceFreeStatesStructs.jl: adopt develop's kinetic_factor/kinetic_source
  control (replacing the old con_flag/kin_flag pair), and incorporate
  parallel_threads field.  Drop deprecated set_psilim_via_dmlim/dmlim.
- EulerLagrange.jl: gate the rational-surface cross on
  ctrl.kinetic_factor == 0 (instead of !ctrl.con_flag), while retaining
  this branch's sing_asymp_left/right distinction needed for the
  parallel FM Δ' BVP.
- Sing.jl: merge develop's multi-line compute_sing_mmat! signature with
  this branch's sig::Float64=1.0 kwarg.
- Riccati.jl, GeneralizedPerturbedEquilibrium.jl: rename !ctrl.con_flag
  → ctrl.kinetic_factor == 0, and kin_flag stubs → kinetic_factor > 0.
- Example / regression gpec.toml files: keep psiedge and drop the
  deprecated dmlim pair that was already removed on develop.
- Project.toml / Manifest.toml: pick up QuadGK dependency.

Verified: Solovev_ideal_example runs end-to-end through all pipeline
stages.
…n truncation

The edge-dW scan over ψ ∈ [psiedge, psilim] was doing double duty: reporting
the dW peak location (a diagnostic) AND silently moving psilim/qlim/u to that
peak (a truncation that reshaped the Δ' BVP and δW eigenvalues).  In benchmark
runs against Fortran STRIDE, the silent truncation corrupted the outermost
rational's Δ' by tens of percent depending on where the peak happened to fall
inside the band — e.g. on the LAR ε-scan at ε≈0.4, Δ'(3/1) shifted from the
correct ≈1.8 down to ≈0.85 (>50 % error).  The truncation also silently
depended on psiedge itself, so going from psiedge=1.0 → 0.99 was a behavioral
cliff rather than a smooth tightening of the edge band.

Split the behavior into two paths at three call sites (ForceFreeStates/
EulerLagrange.jl and ForceFreeStates/Riccati.jl ×2):

  * Default (truncate_at_dW_peak=false): edge scan is diagnostic-only.  Runs
    findmax_dW_edge! with the resulting dW(ψ), ψ, q, and energy components
    stored on odet.edge_scan and written to HDF5 under edge_scan/.  psilim,
    qlim, and odet.u are restored to the post-integration values so that Δ'
    and free-boundary eigenvalues are determined solely by qhigh / psihigh /
    dmlim.  ψ_peak is logged at verbose level.

  * Legacy (truncate_at_dW_peak=true): reproduces the original Fortran
    ode_record_edge heuristic.  After the diagnostic scan, psilim, qlim, and
    odet.u are pulled back to the dW-peak step.  Preserved so that future
    work on a more robust edge-mode filter can build on top of it, with a
    warning in the docstring and log line that Δ' and δW are unreliable in
    this mode.

Docstring update on ForceFreeStatesControl.psiedge / truncate_at_dW_peak
spells out the diagnostic vs legacy semantics and the reliability caveat.

test/runtests_fullruns.jl: update the Solovev kinetic multi-n expected et[1]
from -0.01248 to -0.19359 with an inline comment.  The old value reflected
the truncated-integration behaviour; the new value reflects the full-domain
answer.  Other fullruns tests unchanged.

Validated against Fortran STRIDE β-scan (42 pts) and ε-scan (56 pts) on
identical TJ geqdsk equilibria: Δ'(2/1), Δ'(3/1), δWp, δWv, δWt all match
within numerical noise away from the ideal pole; median smoothness
residuals beat Fortran on all 6 tracked quantities.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
d-burg and others added 27 commits April 21, 2026 11:18
…ownstream Δ' matrix

Defaults updated for SLAYER/GGJ downstream consumption:
- etol 1e-7 → 1e-10 (equilibrium convergence)
- eulerlagrange_tolerance 1e-7 → 1e-8
- singfac_min 0 → 1e-4 (required non-zero on the parallel path)
- sing_order 2 → 6 (STRIDE convention for Δ')
- use_parallel false → true (unlocks singular/delta_prime_matrix)
- Add set_psilim_via_dmlim + dmlim controls in sing_lim! (Fortran sas_flag
  equivalent) for single-n truncation beyond the outermost rational surface

Test fixes: runtests_slayer_params / runtests_slayer_inputs updated for the
params.f sign convention Q_i = -tauk·ω*_i (both Q's share the same sign
structure; earlier tests held the layerinputs.f Q_i sign flip which we
deliberately do not mirror).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… η in GGJ & SLAYER

Adds a shared Spitzer/Sauter/Redl resistivity closure so GGJ and SLAYER
can both consume the same neoclassical η formula:

- src/Utilities/NeoclassicalResistivity.jl (new): SpitzerModel /
  SauterNeoModel / RedlNeoModel tag types, coulomb_log_e (NRL/Sauter/
  Wesson forms), eta_spitzer (Sauter 1999 Eq. 18a), trapped_fraction
  (Lin-Liu & Miller 1995 full form) + trapped_fraction_eps fallback,
  nu_star_e (Sauter 1999 Eq. 18b), and eta_neoclassical dispatched on
  the model (F₃₃ via Sauter 1999 Eq. 13 or Redl 2021 Eq. 17).

- src/ForceFreeStates/ResistEval.jl: ResistGeometry struct extended with
  avg_B, B_max, B_min, f_trap, R_major, eps_local. Populated inside the
  existing θ-loop at essentially zero cost (one extra integrand + running
  min/max over B and R).

- src/Tearing/InnerLayer/GGJ/LayerInputs.jl: build_ggj_inputs grows
  `resistivity_model::NeoResistivityModel=SpitzerModel()` and
  `lnLambda_form::Symbol=:nrl` kwargs. Uses the shared closure; default
  Spitzer switches from Wesson 1.65e-9·lnΛ form to Sauter-18a (Zeff-aware,
  ~1% agreement at Zeff=1).

- src/Tearing/InnerLayer/SLAYER/LayerParameters.jl + LayerInputs.jl:
  same `resistivity_model` kwarg, plus optional f_trap / nu_e_star /
  R_major_eff / lnLambda_form. Defaults to SpitzerModel() + :wesson so
  legacy SLAYER η is bit-identical. When a neoclassical model is selected,
  build_slayer_inputs pulls f_trap + R_major + eps_local from
  sing.restype if populated, and computes ν*_e via the shared utility.

Validated on DIII-D 147131 @ 2300 ms (ideal example) vs OMFIT
utils_fusion.py and OFT bootstrap.py F₃₃ formulas: max |reldiff|
= 1.8e-16 across all 4 rational surfaces for lnΛ, ν*_e, η_Sp, η_Sauter,
η_Redl, F₃₃(Sauter), F₃₃(Redl). Benchmark lives at
CTM-processing/julia_vs_fortran/neoclassical_resistivity_benchmark/.

In the DIII-D banana regime (q=2,3,4), η_Sauter/η_Sp ≈ 4–5× — the
expected trapped-particle enhancement for H-mode tearing studies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…_prime_raw + pest3_decompose

The STRIDE-BVP Δ' computation already assembles a 2m×2m side-major matrix
dp_raw in compute_delta_prime_matrix! (Riccati.jl:779, ordering
[L_s1, R_s1, L_s2, R_s2, …]), then collapses it to the m×m PEST3 odd-parity
Δ' projection via deltap[i,j] = dp_raw[2i,2j] − dp_raw[2i,2j-1] − dp_raw[2i-1,2j]
+ dp_raw[2i-1,2j-1] (the (L−R)(L−R)^T combination). The A' (even-parity
interchange), B', Γ' (off-parity) blocks are thrown away.

This commit retains the full 2m×2m matrix:
- New ForceFreeStatesInternal.delta_prime_raw field (side-major, byte-
  compatible with Fortran rdcon/gal.f::gal_write_delta top 2msing×2msing
  block of delta_gw.dat; no ½ prefactor per Fortran convention).
- Populated right before PEST3 collapse at Riccati.jl:819.
- Persisted as singular/delta_prime_raw in gpec.h5.
- New pest3_decompose(dp_raw) → (A, B, Γ, Δ) and dprime_outer_matrix
  helpers, matching Fortran rdcon/gal.f:1728-1743 recombination.

Needed for the full det(D' − D(γ)) = 0 tearing+interchange eigenvalue
problem in Phase C. Sanity-checked on DIII-D: pest3_decompose(dp_raw).Δ
matches the existing m×m delta_prime_matrix to 4.6e-14. Cross-check vs
Fortran delta_gw.dat shows pre-existing dpsi^α normalization gap (neither
code writes the Hermitian form; it's applied at use-time). Benchmark
artefacts at CTM-processing/julia_vs_fortran/ggj_coefficients_benchmark/
dprime_raw_crosscheck/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…GGJ parity channel selection

Replaces solve_inner's anonymous SVector{2,ComplexF64} return with a named
struct InnerLayerResponse(tearing, interchange) to eliminate a latent
parity-channel bug and self-document the inner-layer API.

The bug: the old contract said "(Δ_odd, Δ_even)" but the word "odd"/"even"
is used inconsistently across the literature — GWP 2016 labels parity by
the symmetry of the flux W (odd-W = interchange, even-W = tearing), while
Fortran rmatch/deltac.f labels by the velocity+temperature (odd-NΘ = tearing,
even-NΘ = interchange). These give OPPOSITE parity names for the same
physics channel. The GGJ Galerkin solver mirrored deltac.f's end-of-routine
swap (Galerkin.jl:711-712), putting index 1 = interchange. The GGJ Shooting
solver mirrored deltar.f, putting index 1 = interchange. SLAYER put its
pressureless tearing Δ at index 1. Meanwhile Dispersion/Coupled.jl:96 and
Dispersion/SurfaceCoupling.jl:46 hardcoded [1] — so for SLAYER surfaces
they correctly picked the tearing channel, but for GGJ surfaces they
silently picked the INTERCHANGE (Glasser-stabilization) channel instead of
the tearing drive. Any GGJ multi-surface dispersion scan run prior to this
commit was solving the wrong eigenvalue problem.

Fix:
- New InnerLayerResponse struct with physics-named tearing/interchange fields.
- GGJ Galerkin: removed the deltac.f swap; isol=1 (W'(0)=0 → W even, sheet
  current, tearing) maps to .tearing; isol=2 (W(0)=0 → W odd, non-reconnecting)
  maps to .interchange. Per-solver parity derivation documented in BC comments.
- GGJ Shooting: traced match/matrix.f::matrix_layer sign-symmetric vs
  sign-antisymmetric constraints to confirm deltar(1)=interchange, deltar(2)=
  tearing; remapped _delta_from_c0 output into named fields accordingly.
- SLAYER: pressureless Fitzpatrick has no interchange channel →
  InnerLayerResponse(Δ, 0).
- Dispersion/Coupled.jl + SurfaceCoupling.jl: replaced solve_inner(...)[1]
  with solve_inner(...).tearing at both call sites.
- 6 test files updated: synthetic test models return InnerLayerResponse;
  real SLAYER/GGJ callers use .tearing. 200+ tests pass; 2 pre-existing
  slayer_riccati failures (D_norm threshold drift, unrelated to parity
  refactor) verified by git-stash bisection.

Naming: chose tearing/interchange per user decision — more self-documenting
than odd/even which depends on whose parity convention you're reading.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…matrix

Companion to the m×m MultiSurfaceCoupling (tearing-only) that was shipped
earlier in the perf/slayer-growthrates branch. CoupledFull generalizes to
the full Pletzer-Dewar 1991 / GWP 2016 tearing+interchange eigenvalue
problem needed to include Glasser stabilization in the GGJ model.

Structure:
- MultiSurfaceCouplingFull holds a 2m×2m D' matrix in parity-major
  ordering [[A' B'] [Γ' Δ']], a per-surface Vector{SurfaceCoupling},
  reference-surface index, and msing_max truncation. Built via
  multi_surface_coupling_full(surfaces, dp_full; ref_idx, msing_max).
- Evaluation mc(Q) subtracts a 2m×2m block-diagonal D(γ) with
  interchange-channel response on the upper-left m diagonal and
  tearing-channel response on the lower-right m diagonal. Each
  channel rescaled by per-surface tauk_ref/tauk_k and sc.scale; sc.dc
  critical offset subtracted from the tearing channel only.

Tests (20): constructor validation, pressureless SLAYER-like reduction
to det(A')·det(Δ'−Δ_t) via block-diagonal outer, Schur-complement
identity for the full coupling case, Q-rescaling via tauk ratios,
interchange-channel physical activation, dprime_outer_matrix round-trip
against pest3_decompose, msing_max truncation preserves parity-block
structure.

Paired with a Julia↔Fortran inner-layer GGJ Galerkin benchmark (at
CTM-processing/julia_vs_fortran/inner_layer_benchmark/) that runs
rmatch's deltac_run qscan on the DIII-D resistive example and the
matching Julia solve_inner(GGJModel(:galerkin), ...) at identical
(E,F,G,H,K,M,τ_A,τ_R,v1) inputs and Q grid. The benchmark finds a
uniform 2.10× factor Julia/Fortran across BOTH channels and ALL Q
(not a pole/convergence artifact) — to be investigated as a follow-up;
the eigenvalue problem topology is insensitive to this uniform factor
so the CoupledFull machinery is usable as-is for root finding via
contour-intersection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…matrix

Adds MultiSurfaceCouplingFortran — a literal Julia port of Fortran
rmatch/match.f::match_delta (fulldomain=0 branch). This is the full
Pletzer-Dewar 4m×4m tearing+interchange coupled dispersion matrix, with
the inner-layer amplitudes d^j_± kept as explicit DOFs alongside the
outer-region amplitudes C^j_{L,R}, coupled by the ±1 matching identity
    C^j_L =   d^j_+ − d^j_-
    C^j_R = −(d^j_+ + d^j_-)

Motivation: the naive 2m×2m form det(D' − diag(Δ_int, Δ_tear)) = 0
(shipped earlier as CoupledFull) is structurally incorrect because
D' lives in the (L,R) side-major basis while the inner-layer output
(Δ_tearing, Δ_interchange) lives in the (+,-) parity basis. The two
cannot be subtracted directly without an explicit basis transform
(Wang-Glasser-Brennan-Liu-Park 2020, Phys. Plasmas 27, 122503,
Eq. 11a-11d). Fortran rmatch avoids the transform by keeping both sets
of amplitudes alive in a 4m-DOF linear system. This commit mirrors that
choice.

Validation on DIII-D resistive example (n=1, msing=4):
- Julia 4m×4m |det| ∈ [4.6e31, 3.5e39] vs Fortran rmatch
  [4.0e32, 6.3e36] — same order of magnitude in the same regions.
- Same dipolar pole structure at origin, same green/magenta contour
  sign-change network in both codes. Julia shows some extra contour
  noise in the lower half-plane consistent with the known uniform
  2.10× inner-layer factor + STRIDE-BVP vs Galerkin outer-solve drift
  (both documented in CTM-processing/julia_vs_fortran/
  inner_layer_benchmark/FINDINGS.md).

CoupledFull (2m×2m) stays untouched — it remains exported for reference
and its 20 tests still pass, but its determinant values should not be
used for physical root finding. Use multi_surface_coupling_fortran for
that.

The patched Fortran rmatch (match_detgrid subroutine added for
apples-to-apples grid scans) lives in ../GPEC/rmatch/match.f in the
user's local tree; not part of this commit.

26 new unit tests in runtests_dispersion_coupled_fortran.jl covering
constructor validation, 1-surface 4x4 hand-verified determinant,
2-surface Fortran-assembly equivalence, Q rotation shift, scale
factor, msing_max truncation, pressureless (SLAYER-like) smoke test,
GGJ-like m=3 smoke test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds an `inner_kwargs::NamedTuple` field to `MultiSurfaceCouplingFortran`
so callers can forward Galerkin grid-tuning parameters (pfac, xfac, nx, nq)
to `solve_inner` at every Q evaluation. Matches the Fortran rmatch
`&DELTAC_LIST` namelist convention and enables apples-to-apples Julia↔
Fortran dispersion comparisons.

Added test verifies the kwarg reaches solve_inner. All 31 existing
CoupledFortranMatch tests continue to pass.

Context: investigation of the apparent 2.091× Julia↔Fortran discrepancy
on DIII-D GGJ inner-layer output revealed it was a **benchmark
configuration error**, not a code bug. Fortran rmatch rescales τ_R by
η_rdcon/η_user at match.f:212-213 (a deliberate optimization for the
η-scan workflow — lets users rerun rmatch at different resistivity
without redoing rdcon). When our Julia benchmark drivers fed the raw
τ_R from delta_gw.dat into GGJParameters, they were comparing Julia at
the "rdcon resistivity" to Fortran at the rmatch.in resistivity. Fix:
set rmatch.in::eta to match the value baked into delta_gw.dat. With
matched eta, Julia↔Fortran agree to 0.4% across all Q and both channels,
with clean 4m×4m determinant agreement in the detgrid benchmark (192×192
narrow-box scan, |det| ranges overlap to < 0.5%).

Benchmark updates (in CTM-processing sibling repo, untracked):
- run_fortran_deltac_qscan.py + run_fortran_detgrid.py: eta forced to
  match delta_gw.dat (5.089e-9)
- compare_detgrid.py: SLAYER-convention axes (growth on y, rotation on x)
  and 3-panel layout (Fortran 4m×4m, Julia 4m×4m, Julia m×m — dropped
  the CoupledFull 2m×2m since it was shown to be structurally wrong).
- FINDINGS.md: full write-up of the eta-rescale root cause.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Overhaul of `build_slayer_inputs` + `solve_inner(::SLAYERModel{:fitzpatrick})`
so that Julia and Fortran SLAYER produce identical coupled-dispersion
det(Q) scans at every plot-frame Q, on the same (geqdsk, kinetic file,
slayer.in namelist) inputs. Verified by quantitative 4-hypothesis test
at TJ ε=0.001 and β=0.1 benchmark cases:

  hypothesis                              median Re    median Im
  J(Q) ~ F(Q)   identity                    +1.01        +1.02     <- eps
  J(Q) ~ F(Q)   identity                    +0.99        +1.01     <- beta
  (the three reflection hypotheses all give off-axis ratios)

Before this patch the eps_0.001 ratio was (+1.10, -0.98) — a clean
Im-axis reflection in Riccati p-space that produced a visually
"flipped-about-ω=0" magenta (Im det=0) contour despite all normalized
SLAYER parameters (τ_k, S, D_norm, P_perp, P_tor, Q_e, Q_i, d_beta)
matching Fortran to <1%.

### `LayerInputs.jl::build_slayer_inputs`

Four new kwargs + internal ω_*e/ω_*i computation (port of Fortran
`slayer/layerinputs.f:456-459`):

  * `bt`                          now also supports a scalar override
    in addition to a callable or `nothing` (F-spline default).
  * `R0 = nothing`                override magnetic-axis R; default
    `equil.ro`. Lets the benchmark driver pass the geqdsk RMAXIS
    literal so both codes use the same reference axis.
  * `rs_method = :midplane`       keeps original θ=0 outboard-midplane
    chord behaviour by default; `:fsa` activates a θ-mean of
    √rzphi_rsquared that matches Fortran STRIDE's `issurfint` /
    `a_surf` flux-surface-averaged minor radius.
  * `z_i = 1.0`                   ion charge for the diamagnetic
    formula; hardcoded to 1 for main D ion in Fortran
    `layerinputs.f:399`.
  * `compute_omega_star = true`   when `true`, per-surface ω_*e / ω_*i
    are re-derived from cubic-spline derivatives of (n_e, T_e, T_i)
    carried in `profiles`, using χ₁ = 2π·equil.psio and the formulae

        ω_*e =  (2π/χ₁)·(T_e·dn_e/dψ / n_e + dT_e/dψ)
        ω_*i = -(2π/(z_i·χ₁))·(T_i·dn_e/dψ / n_e + dT_i/dψ)

    (the main-ion density is taken equal to n_e by quasi-neutrality,
    matching the gpeckf staging convention and Fortran's kin%f(1)
    after read_kin). Fortran's elementary-charge `e` cancels when
    T_e, T_i are in eV and dT/dψ is scaled by e, giving the
    equivalent form above. Setting `compute_omega_star=false`
    preserves the legacy behaviour where `profiles.omega_e` and
    `profiles.omega_i` are used as-is (for backward compatibility).

### `Riccati.jl::solve_inner(::SLAYERModel{:fitzpatrick})`

Replaced `Q_c = ComplexF64(Q)` (raw pass-through) with the Wick-
rotation+conjugate:

    Q_c = im * conj(ComplexF64(Q))

Fortran `slayer/growthrates.f:337,340` applies `g_tmp = q_in * ifac`
with `ifac = (0, +1)` (from `sglobal.f:105`). The algebraically
natural Julia port would be `Q_c = Q * im`, but empirically that
gives `Julia_det(Q) = Fortran_det(-Q)` (180° rotation), and
`Q_c = Q * (-im)` gives `Julia_det(Q) = Fortran_det(-conj(Q))`
(Im-axis reflection). The form `im * conj(Q)` substitutes into
Julia's Riccati so that `-conj(Q_c) = im·Q` — matching Fortran's
internal `g_tmp` — and yields identity. Root cause of the residual
Im-axis reflection in Julia's Riccati (suspected: branch selector
in `_riccati_f_initial` large-D vs small-D regime, or in the
asymptotic `W_bound` sign convention) is not yet identified and
is tracked in `~/Desktop/plasma/CTM-processing/CONVENTIONS.md`
§4 TODO. Once found, `Q_c = Q * im` should be restored to match
Fortran's `ifac` literally.

### Upstream fixes that unblocked this

Prior attempts to resolve Julia↔Fortran SLAYER disagreement stalled
on three issues that this patch exposes and resolves cleanly:

  1. `equil.config.b0exp` (which the benchmark driver was passing
     as `bt`) is a TOML normalization constant (default 1.0, user-
     set 2.0), **not** the geqdsk BCENTR. With `bt` now acceptable
     as a scalar kwarg, the benchmark driver feeds the geqdsk
     BCENTR literal directly; τ_k J/F ratio went from 5.12×
     (ε=0.001) / 21.5× (β=0.1) to 1.0009 / 1.0070.
  2. `equil.ro` is the GS solver-found axis R, not the geqdsk
     RMAXIS header value. The new `R0` kwarg lets the driver
     pass the literal so both codes use the same axis reference.
  3. Julia's `surface_minor_radius(..., theta=0)` is outboard-
     midplane only, not flux-surface-averaged. Fortran STRIDE's
     `a_surf` IS flux-surface-averaged. The new `rs_method=:fsa`
     aligns the conventions.

After these three plus the Wick-rotation+conjugate, all SLAYER
normalized params agree sub-percent across both test cases and
the coupled-dispersion panels are pixel-level identical between
Julia and Fortran.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…erkin scratch buffers

Two performance-motivated changes that came out of the
julia_vs_fortran benchmark work. Both preserve numerical output
exactly (no behaviour change beyond thread-scheduling nondeterminism
in the residual evaluations, and even that is serialised before
cache insertion so the final result set is deterministic).

### `ContourSearchAMR.jl::amr_scan`

Added `parallel = Threads.nthreads() > 1` kwarg and a bulk-eval
helper `_bulk_eval_into_cache!` that:

  * partitions the set of Q-values needed this phase into
    already-cached vs new (keeps uniqueness),
  * evaluates all new points via `Threads.@threads` when
    `parallel=true` and more than one Julia thread is available,
  * pushes the results into the shared `Dict{ComplexF64,ComplexF64}`
    cache serially afterwards so no Dict data races occur.

Used in both the initial nre0 × nim0 coarse-grid phase and in each
refinement pass. The per-call evaluation of `f` (typically a
`MultiSurfaceCoupling` or `MultiSurfaceCouplingFortran` closure) is
thread-safe because each invocation constructs its own per-surface
solver state — the only shared mutable state is the cache, which
the helper handles serially. Deterministic output regardless of
thread count.

On the 100×100 + 4-pass benchmark scan this cut Julia SLAYER AMR
from ~60s to ~15s on an Apple M2 Max (8 threads).

### `GGJ/Galerkin.jl::GalerkinWorkspace` + `_assemble_and_solve!`

Added five preallocated scratch buffers to `GalerkinWorkspace`
(`cell_mat_buf`, `cell_mat_ext_buf`, `cell_rhs_ext_buf`, `ab_buf`,
`rhs_buf`) sized to the max case (`np+1=4`) used at any cell type,
and re-use them via `fill!(buf, 0)` inside the per-cell loop.
Previously each cell called `zeros(ComplexF64, ...)` which
accumulated thousands of MiB of allocations over a full dispersion
scan.

Same numerical output; the cell-matrix sub-slices are explicitly
zeroed before use and smaller cells (e.g. `CT_EXT` with
`cell.np=1`) rely on the remaining buffer elements staying zero
from the previous `fill!` call.

Measured on the TJ ε=0.001 benchmark (nx=256, cutoff=20, tol_res=1e-7,
msing=2): Galerkin det evaluation dropped from ~4.2 MiB allocs / call
to ~30 kiB / call, with a corresponding 20-25% wall-time reduction
in the GGJ AMR scan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ar coupled residual

`MultiSurfaceCouplingFortran` (aka the 4m×4m Pletzer-Dewar tearing+
interchange dispersion matrix, port of Fortran `rmatch/match.f::match_delta`
fulldomain=0 branch) was adding `+ sc.dc` to BOTH the inner-layer
interchange and tearing Δ channels before assembling the coupled matching
block:

    # CoupledFortranMatch.jl, before:
    delta1 = resp.interchange * sc.scale + sc.dc   # WRONG
    delta2 = resp.tearing     * sc.scale + sc.dc   # WRONG

The code comment claimed this was "per the Fortran convention (χ_parallel
shift that acts on the outer diagonal before matching)." That is NOT in
Fortran — `match.f:508-519` assembles the fulldomain=0 block directly from
the raw `delta1 = deltar(ising, 1)` / `delta2 = deltar(ising, 2)` with no
Δ_crit offset anywhere:

    ! Fortran match.f (fulldomain=0):
    delta1 = deltar(ising, 1)
    delta2 = deltar(ising, 2)
    mat(idx3, idx3) = -delta1
    mat(idx3, idx4) =  delta2
    mat(idx4, idx3) = -delta1
    mat(idx4, idx4) = -delta2

The Δ_crit proxy represents a slab-layer χ_parallel-matching correction
and is meaningful only for tearing-only models like SLAYER (which drops
the interchange channel and needs a proxy for the missing Glasser/
Mercier stabilization). GGJ's 4m×4m Pletzer-Dewar matching already
includes the interchange channel explicitly (`resp.interchange`), so
adding `sc.dc` double-counts that physics.

### Fix

1. `CoupledFortranMatch.jl:179-180`: drop `+ sc.dc` on both channels.
   delta1 / delta2 are now the raw inner-layer outputs, matching
   match.f:508-519 bit-for-bit.

2. `SurfaceCoupling.jl`: remove the `dc::Real=0.0` kwarg from
   `surface_coupling(model::GGJModel, ...)`. The SLAYER and generic
   overloads still accept it — SLAYER genuinely needs it for its
   slab-layer Δ_crit subtraction. The `SurfaceCoupling.dc` struct field
   is hard-wired to 0 for GGJ callers, making the API reflect the
   physics.

### Tests

- `test/runtests_dispersion_coupled.jl`: 42 / 42 pass
- `test/runtests_dispersion_residual.jl`: 20 / 20 pass
  (Both test files construct `surface_coupling(GGJModel, ...)` with
  positional args only — no call sites broken.)

### Impact

For the julia_vs_fortran benchmark, this is a no-op when the driver was
already passing `dc=0.0` for GGJ (the safe default we settled on earlier
in the session). The fix prevents the footgun of anyone else accidentally
passing a nonzero `dc` to a GGJ coupling and getting physically wrong
results.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… and v1

GGJ:
- LayerInputs.jl: changed `v1 = 1.0` placeholder to
  `v1 = rg.v1_local / equil.params.volume`. This is the dV/dψ
  normalization that `rescale_delta` consumes as `v1^(2*p1)` to
  convert raw Galerkin Δ to outer-region matching units. Matches
  Fortran resist.f:144 (`sing%restype%v1 = v1/volume`) and match.f:1078
  (`deltar = deltar * sfac**(2*p1/3) * v1**(2*p1)`). Previously, on
  realistic shaped equilibria where v1_local/volume != 1, Julia's GGJ
  Δ disagreed with Fortran by `(v1_local/volume)^(2*p1)`. Analytical
  TJ/Solovev cases hid the bug because v1_local/volume happens to
  hover near unity there.

SLAYER:
- LayerInputs.jl: changed `dr_val = 0.0` default to `dr_val = nothing`.
  When `nothing` is passed, build_slayer_inputs auto-derives the
  per-surface resistive interchange index `D_R = E + F + H²` from
  `sing.restype` (already populated by `resist_eval_all!`). Without
  this, the slayer_panels benchmark driver was reading a scalar
  dr_val=-0.1 from a Fortran namelist and applying it uniformly to
  every surface, producing dc_tmp values that didn't match Fortran's
  per-surface STRIDE-derived values. With `nothing` default, dc_type
  in {:lar, :rfitzp, :toroidal} now produces a non-zero per-surface
  dc_tmp without manual configuration. dgeo_val behaves analogously
  but errors clearly if dc_type=:toroidal is requested without an
  explicit value (auto-derive needs ⟨|∇ψ|²⟩ FSA which isn't yet
  exposed in ResistGeometry — TODO).

NOTE on Fortran/STRIDE divergence: Julia uses D_R correctly per
Connor-Hastie-Helander 2015 (PPCF 57 065001) Eq. 59. Fortran STRIDE
has a one-character bug in stride_netcdf.f:100 — `dr_rationals(i) =
locstab%f(1)/respsi` uses index 1 (= D_I, the Mercier criterion)
instead of index 2 (= D_R, the resistive interchange). Julia and
Fortran will therefore disagree on dc_tmp magnitude by ~D_I/D_R per
surface (~3-4× on DIII-D) until that upstream Fortran bug is fixed.
The disagreement is documented at the build_slayer_inputs docstring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rowth_rates

Adds `pole_threshold_adaptive::Bool = false` to SLAYERControl. When true,
`run_slayer_from_inputs` overrides `control.pole_threshold` per scan with
`|mean(Δ)|` over the dispersion-residual array before calling
`find_growth_rates`. Backward-compatible (default false uses the literal
`pole_threshold`).

Justification: the hardcoded default `pole_threshold=10.0` is too
restrictive when |Δ| spans 8+ orders of magnitude (typical for SLAYER
coupled-dispersion scans). All intersections then get classified as
poles and zero roots are returned. The adaptive recipe — empirically
matching the Python `10·median(|Δ|)` heuristic and the omfit
`|mean(Deltas_AMR)|` recipe — yields the correct root identification on
the DIIID benchmark and TJ βₚ scan cases (verified at βₚ=0.1
coupled_rfitzp: 6 roots / 8 poles vs 0 roots with the static threshold).

Plumbing changes:
  - Control.jl: new field + docstring
  - HDF5Output.jl: written to /slayer/settings/pole_threshold_adaptive
  - run_slayer.jl: `_pole_threshold_for(scan)` closure dispatches per-scan
  - Runner.jl: import Statistics.mean
… default 1 (serial) eliminates DIII-D 147131 thread-race

The parallel BVP path in `parallel_eulerlagrange_integration` was always invoking
`Threads.@threads :static` over the FM chunks, ignoring the `parallel_threads`
field on `ForceFreeStatesControl`. On numerically delicate equilibria (e.g.
DIII-D 147131 at βₚ ≈ 0.07) this exposed a sub-tolerance nondeterminism: chunk
crossings whose post-jump matrices depend on the order of independent FP
operations across threads, producing intermittently divergent FM matrices and
intermittent BVP failures. The algorithm is correct; the wall-time interleaving
of parallel chunks was perturbing it within tolerance.

Fix:
  * `Riccati.jl`: branch on `bvp_threads = clamp(parallel_threads, 1, julia_nthreads)`.
    `bvp_threads == 1` runs the chunks serially on the calling thread (race-free,
    bit-deterministic). Otherwise, the existing `:static` parallel path is used.
  * `ForceFreeStatesStructs.jl`: document `parallel_threads` semantics, default `1`,
    and the cost (~14% slower than 2-thread on DIII-D 147131 reference).

Verified: with `parallel_threads = 1` (default) and `JULIA_NUM_THREADS = 2`, the
DIII-D 147131 βₚ=0.07 reference Δ' diagonal matches CONVENTIONS.md §6 exactly:
  q=2: +7.92 - 0.03i
  q=3: -5.24 - 0.30i
  q=4: -40.20 + 209.91i
  q=5: +126.6 - 169.24i
in 54.5 s wall (single 4-singular-surface coupled BVP). No regressions on TJ.

Production scans should keep the default; users with robust equilibria and
strict wall-time budgets can opt in to `parallel_threads > 1` knowing the trade-off.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…BVP speedup; bit-identical Δ' in 15-trial DIII-D 147131 sweep)

Empirical reliability sweep on DIII-D 147131 βₚ≈0.07 (5 trials at each of
parallel_threads ∈ {1, 2, 4}, JULIA_NUM_THREADS=4, post-JIT, single Julia
session) showed:

  parallel_threads | wall (avg, single 4-singular-surface coupled BVP)
  -----------------|-------------------------------------------------
  1 (serial)       | 9.25 s  — bit-deterministic by construction
  2                | 7.37 s  — bit-identical Δ' in all 5 trials  (+20.3%)
  4                | 7.51 s  — bit-identical Δ' in all 5 trials  (+18.9%)

Δ′ diagonals were bit-identical across all 15 trials and matched the §6
reference values exactly. Speedup saturates at 2 threads — the BVP has
~10 FM chunks, so 2 threads is enough to amortize them; 4 adds scheduling
overhead with no benefit on this BVP.

Bumping default to 2 captures the ~20% wall-time win on production scans.
The serial path remains available (`parallel_threads = 1`) as a deterministic
fallback if the historical intermittent race re-manifests on a delicate
equilibrium. Documentation in `ForceFreeStatesControl` docstring updated to
record the trade-off and the empirical reliability data.

Use `parallel_threads = 1` (NOT `use_parallel = false`) if a parallel run
ever diverges — `use_parallel = false` produces silently wrong Δ' values
(see CONVENTIONS.md §7).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…aster)

The Fitzpatrick `riccati_f` ODE is a 1-equation system. The prior code
modeled `W` as a 1-element `Vector{ComplexF64}` with an in-place RHS
(`_riccati_f_rhs!(dW, W, params, x)`); every Rosenbrock stage allocated
fresh `dW` intermediates. Converting `W` to a `ComplexF64` scalar with an
out-of-place RHS removes those per-stage heap allocations and lets stage
updates stay on the stack.

Per-call benchmark (1000 calls, Rodas5P, identical inputs):
   vector form:  1.62 ms / call
   scalar form:  0.96 ms / call    (41% faster)

Signature changes:
   _riccati_f_rhs!(dW, W, params, x) -> nothing
       --> _riccati_f_rhs(W::Number, params, x) -> ComplexF64
   _riccati_f_jac!(J, W, params, x) -> nothing
       --> _riccati_f_jac(W::Number, params, x) -> ComplexF64
   solve_inner ODE state:
       u0 = ComplexF64[W_bound];   ODEFunction{true}(...)
       --> u0 = ComplexF64(W_bound); ODEFunction{false}(...)

Solver-agnostic. Rodas5P stays the default. The change works equally well
under any OrdinaryDiffEq stiff solver (Rosenbrock / SDIRK / BDF) since
they all support scalar `u0` via the out-of-place form.

Validation (against the temporary baseline at SLAYER_coupling_paper/
regression_temporary/, 88 TJ records frozen pre-change):

   TJ uncoupled_2over1_rfitzp at βₚ=0.001
       γ baseline   = +4.0552247503e+00 kHz
       γ scalar     = +4.0551819762e+00 kHz
       relative drift = 1.05e-5         (within solver-replacement noise)

   TJ coupled_rfitzp at βₚ=0.07          (exercises full BVP path)
       γ baseline   = -8.1071602485e-03 kHz
       γ scalar     = -8.1071881463e-03 kHz
       relative drift = 3.44e-6
       n_valid_roots = 26, n_poles = 27  (exact match to baseline topology)

   check_regression.py --dry --scope tj : 88/88 pass (5e-4 abs/rel
   tolerance on integrator outputs, exact match on topology fields).

Production wall-time on the coupled-BVP case:
   baseline (vector form):  ~14 min (slowest of 4 parallel cases per βₚ)
   scalar form:             ~10 min  (~29% reduction)

In contrast to the prior KenCarp4 solver-swap attempt (commit 5a9026a8,
reverted as 2b1e1b0f), which looked like a 38% per-call win in synthetic
tests but came out 17% SLOWER in production, this change shows consistent
gains from per-call benchmark through to full production scan. The reason
the wins translate cleanly: the scalar form makes the existing solver
faster without changing its convergence path or step-control behaviour,
so production characteristics scale linearly from the micro-benchmark.

The companion KenCarp4 swap stays deferred (tracked in todos) until we
have direct production-side per-Q timing instrumentation to understand
the bench/production discrepancy.

Test infrastructure also committed:
   profiling/profile_slayer_amr.jl              CPU + alloc profile harness
   profiling/test_riccati_solver_convergence.jl 7-solver convergence sweep
…tring

Empirical finding from Phase 2.5 of the AMR speedup work: sub-percent
floating-point differences between ODE solvers cascade through the AMR's
zero-crossing flagging and produce structurally different cell trees,
not just numerically-noisy Δ values.

Concrete observation on TJ coupled_rfitzp at βₚ=0.07 under the scalar
ODE form (commit b17e0b43):

  Solver     SLAYER wall   γ                 valid_roots  poles
  Rodas5P    ~10 min       -8.107e-3 kHz     26           27
  KenCarp4    ~9 min       -8.107e-3 kHz     43           34

KenCarp4 is per-call faster (consistent with the convergence-test
results), but its slightly different Δ at AMR cell corners flips many
"refine" / "no-refine" decisions and lands on a substantially different
final cell list. The most-unstable root (γ) agrees to 2.1e-5 relative,
but the inventory of secondary roots and poles differs by ~17 / ~7.

Implication: solver swaps are NOT pure per-call optimizations. Future
attempts need to be validated against the topology fields
(`n_valid_roots`, `n_poles`), not just γ. The temporary regression
harness at SLAYER_coupling_paper/regression_temporary/check_regression.py
already treats these as exact-match fields, which correctly gates
solver swaps. The 92-record baseline serves as a topology fingerprint.
…30% additional per-call speedup)

The Fitzpatrick `riccati_f` ODE coefficients fA, fA', fB, fC use parameters
(Q, Q_e, Q_i, P_perp, P_tor, D_norm, iota_e) that are CONSTANT across the
integration. The prior code recomputed `Q*(Q+iQi)`, `Q+iQe`, `D²·iota_e⁻¹`
etc. at every RHS evaluation — tens of thousands of redundant multiplications
per `solve_inner` call.

This commit lifts the x-independent quantities into a `_RiccatiConsts`
struct built once per `solve_inner` call:

   Q_plus_iQe         constant part of denom = (Q + iQe + x²)
   A = Q · (Q + iQi)                      fB constant term
   B = (Q + iQi)·(P_perp + P_tor)         fB · x² coefficient
   C = P_perp · P_tor                     fB · x⁴ coefficient
   E = P_perp + (Q + iQi)·D²              fC · x² coefficient
   G = P_tor · D² / iota_e                fC · x⁴ coefficient

The hot RHS (`_riccati_f_rhs`) and Jacobian (`_riccati_f_jac`) now access
only the bundled constants and `x`, doing ~3 muls + 1 division per call
instead of ~10 muls + 2 divisions.

Per-call benchmark (1000 calls, Rodas5P, identical inputs):
   prior (scalar form, post b17e0b43):  0.96 ms / call
   precompute (this commit):            0.67 ms / call    (-30% per call)
   cumulative vs vector-form baseline:  1.62 → 0.67 ms    (-59%, 2.42× faster)

Validation against the temporary baseline at SLAYER_coupling_paper/
regression_temporary/:

   TJ coupled_rfitzp at βₚ=0.07          (full BVP path)
       γ baseline   = -8.1071602485e-03 kHz
       γ precompute = -8.1071881463e-03 kHz
       relative drift = 3.44e-6     (same as scalar-only Phase 2.3 baseline)
       n_valid_roots = 26, n_poles = 27   (exact match to baseline topology)

   check_regression.py --dry --scope tj : 88/88 pass

Production wall on TJ coupled_rfitzp at βₚ=0.07:
   vector-form baseline:           ~14 min
   scalar form (Phase 2.3):        ~10 min
   scalar + precompute:             ~9 min   (~36% cumulative reduction)

The active SLAYER step alone is now ~41% faster than baseline. Production
wall scales sub-linearly because main() / find_growth_rates / file-write
overheads remain unchanged.

Implementation note — algebraic simplification rejected:
A natural further optimization is `fA' = 1 − 2·fA` (algebraic identity:
(denom − 2p²)/denom = 1 − 2·(p²/denom) = 1 − 2·fA). It saves one complex
division per call. However, when tested, the integrator's adaptive
stepping near marginal stability compounded ULP-level differences in fA'
across thousands of steps, producing ~3e-3 relative γ drift versus this
form's 3e-6. The drift was within the regression's abs-tolerance gate but
still a real precision regression. Reverted — kept the explicit
`(denom − 2·p²)/denom` form, which preserves bit-identical Δ at warm
benchmark points vs the scalar-form baseline.
…tion kwargs

Two additive kwargs to support convergence-vs-resolution studies and
graceful behaviour when the cell-count safety rail is hit:

  snapshot_callback::Union{Nothing,Function} = nothing
      If provided, called at the end of each AMR pass (and once for the
      initial grid, pass=0) with arguments
        (pass::Int, cells::Vector{AMRCell}, cache::Dict{ComplexF64,ComplexF64}).
      The callback receives live references; copy if persistence is needed.
      Used by convergence studies to extract intermediate γ at each pass
      count from a SINGLE AMR run (avoids re-running for every target pass).

  max_cells_action::Symbol = :error
      :error (default, prior behaviour) raises when length(cells) > max_cells.
      :warn_truncate logs a @warn, stops further refinement in the current
      pass, and exits the outer pass loop — leaving a usable AMRResult with
      the partial cell tree. Useful for resolution-sweep studies that
      deliberately push max_cells to bound runtime.

Backward compatibility: defaults preserve the exact prior behaviour.
Validated via regression rerun of TJ coupled_rfitzp at βₚ=0.07
(88/88 pass, γ + topology bit-identical to pre-change baseline).
… passes) study

Driver for the Phase 2.8 convergence study, sweeping AMR initial-grid
resolution and refinement-pass counts to identify the cheapest (nre0,
passes) tuple that hits a γ-convergence target. Uses the new
`snapshot_callback` kwarg (commit f59dcaee) so a SINGLE AMR run captures
γ at every intermediate pass count — avoiding 4× the runs that re-running
per pass would require.

Sweep on TJ coupled_rfitzp at βₚ=0.07, three SLAYER configurations on
the same equilibrium (q=2 uncoupled, q=3 uncoupled, full coupled),
Q_HW=±25 kHz, max_cells=1M with `:warn_truncate` graceful early-stop:

   case               γ_ref(200,5)   min (nre0, pass)   AMR wall
   uncoupled_2over1     -0.03793 kHz    (25, 4)           40 s
   uncoupled_3over1     -0.13069 kHz    (25, 3)           46 s
   coupled              -0.00816 kHz    (25, 5)          187 s

Convergence target: |γ - γ_ref| < max(5e-5, 0.005·|γ_ref|).

Key finding: AMR wall scales primarily with INITIAL grid size (nre0²),
not pass count. The (25, 8) config is FASTER than (200, 5) — starting
from a coarse grid and refining further is cheaper than starting fine
and stopping sooner, because per-pass work scales with the current cell
count which grows from a smaller base.

Recommendation for production defaults:
   uncoupled (any):  nre0 = 25, max_passes = 4
   coupled:          nre0 = 25, max_passes = 5

Compared to current production defaults (nre0=100, passes=4-5), this
gives an additional ~10-20% wall reduction on top of the per-call
optimizations from Phase 2.3 / Phase 2.7.

Plots committed externally:
   /tmp/convergence_curves.png      γ vs pass per case (4 nre0 lines)
   /tmp/convergence_resolution.png  γ at max_pass vs nre0 (3 case lines)
…creen for active boxes

Adds `multi_box_amr_scan` to ContourSearchAMR.jl: run `amr_scan` over multiple
Q-plane boxes with a coarse pre-screen step that skips inactive boxes
entirely. Motivated by the three-stripe ω-axis scan for SLAYER coupled
dispersion: rather than refining one wide ±75 kHz × ±25 kHz box, we split
into three 50 kHz × 50 kHz stripes (centred on the γ=0 axis) and only run
the AMR on stripes that show activity.

A box is flagged ACTIVE if any pre-screen cell satisfies AT LEAST ONE of:
  - sign change in Re(Δ) across its 4 corners (zero-isoline of Re(Δ) crosses
    the cell — root candidate);
  - sign change in Im(Δ) across its 4 corners (root candidate);
  - any corner with |Δ| ≥ pole_magnitude_threshold (likely pole — sign-only
    criteria miss tight poles whose fringe doesn't straddle a corner).

The pole-magnitude criterion is essential for catching poles tucked inside a
pre-screen cell that happens to sample the same sign-lobe at all four corners.

Default pre-screen resolution is 25×25, matching the typical AMR initial
grid — coarser misses small features; finer wastes evaluations on inactive
boxes.

Adds:
  - `BoxActivity` enum (`NoActivity`, `ReZeroCrossing`, `ImZeroCrossing`,
    `PoleMagnitude`)
  - `_check_box_activity` helper (returns first satisfied criterion)
  - `MultiBoxAMRResult` struct (per-box `AMRResult` + aggregated
    cells/Q/Δ + per-box activity reasons + pre-screen eval count)
  - `multi_box_amr_scan(f, boxes; pole_magnitude_threshold, ...)`
  - `as_amr_result(::MultiBoxAMRResult) -> AMRResult` for direct
    consumption by `find_growth_rates`

Tests added in test/runtests_dispersion_amr.jl (3 new testsets, 19 @test
calls covering: 3-box stripe with zero/pole/empty boxes, sharp-pole
synthetic exercising the magnitude criterion, argument validation).
49/49 dispersion-AMR tests pass.

TODO follow-ups:
- Thread a shared cache through `amr_scan` so pre-screen evals aren't
  re-evaluated by the per-box AMR initial pass on active boxes (saves
  ~676 redundant evals per active box).
- Wire into the SLAYER driver (`Tearing.Runner`) so the user-facing
  betascan/diiid/etc. drivers can opt into multi-box layouts without
  manual pole_magnitude_threshold tuning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… via concavity + γ-gap, with secondary-root fallback

The existing `filter_outside_re` gate only triggered when the Re(Δ)=0 contour
was approximately closed at the candidate intersection (closure_gap < 10% of
contour extent). On scans where the spurious upper-branch root sits at the
edge of the Q box (so the Re=0 contour exits the box and is not closed at the
candidate), the gate fell open and the spurious high-γ root was selected as
"least-stable" — producing γ values that visibly exceed the physical eigenmode
cluster (observed on coupled DIII-D 147131 where the algorithm selected
γ=+18.6 kHz instead of the physical γ≈+0.4 kHz).

Adds two new geometric/algorithmic checks that do NOT depend on the Re=0
contour being closed:

  - `:geom`: Re(Δ)=0 is locally downward-concave at the candidate AND the
    Im(Δ)=0 tangent at the candidate exits at angle > `angle_threshold_deg`
    from horizontal (default 45°). The concavity test uses a turn-direction
    cross product that's invariant under polyline traversal direction.
  - `:gap`:  the candidate is unstable (γ > 0) AND its γ exceeds the next
    candidate's γ by more than `gap_kHz_threshold` kHz (default 1.0). Flags
    "isolated peak" outliers.

Combined into a recursive selection rule (per the user's spec):

  - 0 flags → accept candidate as primary, no warning
  - 1 flag  → accept candidate as primary, raise warning, expose next-down
              root as `Q_root_secondary` for downstream review
  - 2 flags → reject candidate, recurse into next-most-unstable root

Extends `GrowthRateResult` with `Q_root_secondary` (`ComplexF64`),
`omega_Hz_secondary`, `gamma_Hz_secondary`, and `warning_flags::Vector{Symbol}`.
The legacy `valid_roots`/`poles`/`filtered_roots` fields are unchanged.

New kwargs on the public `find_growth_rates(::ScanResult|::AMRResult)`:
`gap_kHz_threshold=1.0`, `angle_threshold_deg=45.0`. Defaults preserve
behaviour on cases where neither flag fires (verified against existing test
suite — 49/49 dispersion-AMR tests still pass, 33/33 dispersion-scan,
20/20 dispersion-residual).

Empirical validation (rendered side-by-side contour plots saved separately):

  DIII-D 147131 uncoupled q=4:
    primary γ=-4.540 kHz  no warnings  ✓ (clean case unchanged)

  DIII-D 147131 coupled (msing=4):
    primary γ=+18.630 kHz  ⚠ [:gap]  → secondary γ=+0.418 kHz exposed
    The +18.6 root is a spurious high-γ outlier (Re=0 contour exits the
    γ=+25 kHz box edge, so the legacy outside_re gate falls open). The
    new `:gap` check catches it (Δγ from next root = 18.2 kHz >> 1 kHz)
    and surfaces the physical +0.42 root as the secondary — matching
    visual inspection of the contour plot.

The geom check did not fire on the coupled DIII-D case (Re=0 geometry near
the +18.6 candidate is more vertical than concave-down on this triangulated
AMR mesh). That's the by-design behaviour: a single flag still leaves the
primary as primary, with the secondary surfaced for the operator to
review. A test case that exercises the concavity path is a TODO.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… + density flag (3-of-N spurious-root recursion)

Refines the spurious-root detection in `_run_analysis` based on validation
against the DIII-D 147131 coupled case. Two algorithmic improvements:

1. **Polyline-walk concavity (replaces 3-vertex stencil)**

   The previous geom check used only the 3 vertices immediately adjacent
   to the candidate's closest Re=0 vertex. On AMR-triangulated meshes the
   Re=0 contour is fragmented into ~10⁴ short polylines, so 3 consecutive
   vertices span a single segment — local turn-direction noise dominates
   the macroscopic shape and the test failed to fire on cases the user
   could clearly identify as "downward-concave hills" by eye.

   New `_is_geom_spurious` walks outward from the closest Re=0 vertex
   along the actual polyline, collecting consecutive vertices within
   `max_walk` Q-distance of the candidate. It then fits a local quadratic
   γ = a + b·Δω + c·Δω² and reports `c < 0` (concave-down hill).
   Crucially, the test gates on FIT QUALITY: only flags when the RMS
   residual / γ_spread is below `quality_threshold` (default 0.15),
   so noisy / multi-feature regions correctly produce no flag.

   Verified on the DIII-D 147131 coupled HDF5: at the spurious +18.6
   candidate, the polyline walk at max_walk=0.5 Q gives c=-4.96 with
   RMS/γ_sp=0.10 → CLEANLY flags spurious; at the legitimate +0.41
   candidate the fit is noisy (RMS/γ_sp=0.33) so no flag is raised.

2. **Density flag (`:density`) — clustering as a green-flag for validity**

   New `_is_density_isolated` counts other valid roots within
   `density_radius_Q` of each candidate. Spurious high-γ outliers tend
   to be isolated in Q-space; legitimate coupled-tearing roots cluster
   densely in the resonant region. Disabled when `n_total < 5` (the
   user's clustering heuristic only carries signal when there's a
   cluster baseline to be missing from — uncoupled cases with 1-3
   total roots would otherwise spuriously fire on every candidate).

3. **Recursion rule extended to 3-flag voting**

   `:geom` + `:gap` + `:density`: discard candidate if 2+ flags fire,
   else accept as primary with single-flag warning recorded.

Empirical outcome on existing HDF5s (re-extracted via /tmp/reextract_all.jl):

  DIII-D 147131 uncoupled q=4 (n_roots=3, density auto-disabled):
    primary γ=-4.540 kHz  warn=[:geom]  γ_2nd=-5.557 kHz
    Same physical primary as before, with a single geom warning surfacing
    a nearby root for review. (The geom flag firing here is borderline —
    the local Re=0 fit happens to land concave-down on the AMR mesh
    even though the global structure is well-like; the recursion
    correctly keeps it as primary because it's the only flag.)

  DIII-D 147131 coupled (n_roots=37):
    primary γ=+0.411 kHz  warn=[:density]  γ_2nd=-0.481 kHz
    The spurious +18.6 root is now correctly DISCARDED by the recursion
    (it accumulates 2+ flags from {geom, gap, density}). The +0.41
    root that was previously surfaced only as `secondary` is now the
    primary. This brings `filter_outside_re=true` (default) and
    `filter_outside_re=false` to the same answer on coupled DIII-D —
    the new geom + density logic obviates the need to manually toggle
    the legacy gate.

New kwargs on the public `find_growth_rates(::ScanResult|::AMRResult)`:
`density_radius_Q=0.5`, `min_neighbors=2`. Defaults are conservative —
density only fires when truly isolated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…geom + :gap (back to 2-flag recursion)

The user flagged that :gap and :density could both falsely fire on a
legitimate isolated mode (e.g. an uncoupled case with one dominant unstable
root and one stable root separated by > 1 kHz), causing the recursion to
incorrectly discard the right answer. Removed:

  - `_is_density_isolated` helper
  - `density_radius_Q`, `min_neighbors` kwargs (from public + private API)
  - the per-candidate density check in `_run_analysis`

Recursion rule reverts to the simpler "discard if BOTH :geom and :gap fire"
(which on the validation cases is sufficient to catch the +18.6 kHz
spurious in DIII-D 147131 coupled — the polyline-walk concavity fix from
3dd65e83 cleanly fires :geom on that candidate, and the >1 kHz γ-gap
fires :gap, so both flags accumulate and the recursion discards it).

Empirical re-extraction (without density):

  DIII-D 147131 uncoupled q=4 (n_roots=3):
    primary γ=-4.540 kHz  warn=[:geom]  γ_2nd=-5.557 kHz
    Same as before — the lone :geom warning is informational; the
    primary is correctly the legitimate root.

  DIII-D 147131 coupled (n_roots=37-38):
    primary γ=+0.411 kHz  warn=[]  γ_2nd=NaN  (no warnings — clean!)
    The +18.6 spurious is still correctly DISCARDED by [geom + gap]
    both firing. The legitimate +0.41 root is now reported with NO
    warnings — cleaner than the [:density] warning we previously
    surfaced. Better signal-to-noise: a warning now means
    "geometrically suspicious AND isolated peak", which is a strong
    signal worth alerting on.

Tests still 102/102 passing across runtests_dispersion_{amr,scan,residual}.jl.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d_deg parameter + cleanup

The `angle_threshold_deg` kwarg was a leftover from the earlier `_is_geom_spurious`
formulation that combined "Re=0 concave-down + Im=0 exit angle > 45°" into a single
test. After the polyline-walk refactor (e97225c) the concavity check became
standalone (with its own RMS-residual quality gate), and the angle term was no
longer consulted — but the parameter was still plumbed through every API layer.

Removes the parameter + its docstring + every plumb-through site:
  - Public `find_growth_rates(::ScanResult, ::Real; …)` and `(::AMRResult, …)`
  - Private `_extract_growth_rates`, `_extract_growth_rates_amr`, `_run_analysis`
  - `_is_geom_spurious(pt, re_paths)` now takes only what it actually uses
    (no more `im_paths` or `angle_threshold_deg` placeholders)

Also drops the dead-code-removal comment about `_is_density_isolated` — the
explanation lives in the commit message of 4c6fbe3 (which removed it). The
file is now clean of historical references to features that no longer exist.

Tests still 102/102 across runtests_dispersion_{amr,scan,residual}.jl.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ole_threshold + gap_kHz_threshold plumbing

Three production-default improvements informed by the DIII-D 147131 + TJ
betascan validation work:

1. **Pole threshold default → 10 × median(|Δ|)** (was `|mean(Δ)|`)
   The mean-of-complex-residuals collapses on oscillating dispersions
   whose phases cancel in the complex sum (saw 226 vs 439 on DIII-D
   coupled), and is also inflated by single near-pole pre-screen samples.
   `10 × median(|Δ|)` reflects "10× the typical residual magnitude" and
   is robust to both pathologies. Applied in `_pole_threshold_for` inside
   `run_slayer.jl`. Old behaviour was the only code path; new default is
   strictly an improvement on the validation cases.

2. **`SLAYERControl.boxes`** — multi-box stripe scan field (default empty).
   When non-empty, `_run_scan` dispatches to `multi_box_amr_scan` instead
   of single-box `amr_scan`. Each entry is `(omega_lo, omega_hi, gamma_lo,
   gamma_hi)` in dimensionless Q-units. Activity criteria use
   `pole_magnitude_threshold = 10 × median(|Δ|)` derived from a coarse
   16×6 sample of the union of all boxes (matches the
   validate_multi_box.jl driver). `multi_box_prescreen_n=25` controls the
   per-box pre-screen grid resolution. Backward-compatible — production
   scans that don't set `boxes` see the existing single-box behaviour.

3. **`SLAYERControl.gap_kHz_threshold`** — exposed (default 1.0 kHz) and
   forwarded to the new `find_growth_rates` `:gap` flag. Lets per-case
   TOML configs tune the spurious-isolated-peak threshold without code
   changes.

Tests: 49+33+20+61 = 163 pass across runtests_dispersion_{amr,scan,residual}.jl
+ runtests_slayer_runner.jl.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dge artifacts

direct_position! used Roots.Brent() on the full (axis, rmin) and (axis, rmax)
brackets to locate the inboard/outboard LCFS positions. Brent requires
opposite-sign endpoints — fine for diverted equilibria where renormalized
ψ stays negative from the LCFS out to the (R, Z) box edges.

Fixed-boundary equilibria (e.g. TokaMaker free/fixed-boundary geqdsk output)
violate this assumption: ψ outside the LCFS can have a thin spurious-
extrapolation ring near the box edge where it re-crosses zero, leaving the
(axis, rmin) and (axis, rmax) brackets with same-sign endpoints. Brent then
raises "ArgumentError: The interval [a,b] is not a bracketing interval"
even though the physical LCFS DOES exist inside the bracket.

Fix: pre-scan ψ outward from the magnetic axis with n_scan=200 uniform steps
and locate the FIRST sign change, then run Brent on that sub-bracket. The
first crossing from the axis is the physical LCFS, so behavior is identical
to before on diverted equilibria but robust to fixed-boundary edge artifacts.
Errors with a clear message if no sign change is found in the scan window.

Verified:
  - 79/79 q95 TokaMaker fixed-boundary geqdsks load (previously all failed
    on the inboard bracket)
  - DIII-D 147131 diverted X-point still loads unchanged
  - shaped_beta_scan synthetic geqdsks still load unchanged
  - SLAYER_coupling_paper/coupled_deltacrit_q95scan full-pipeline smoke test
    (coupled_n=1 with rfitzp Δ_crit, pc=1.001) passes end-to-end through
    GPEC main + Force-Free States BVP + SLAYER multi-stripe AMR
Empirical finding from the SPARC β-scan kink-approach diagnostics:
the geom + gap "spurious upper-branch" detector was too aggressive in
the kink-approach regime where valid roots become sparse (only 4-5
candidates per scan, 2-3 kHz γ separation between primary unstable and
next-stable roots).  Concrete failure case:

  shaped_beta_scan / coupled_n2_rfitzp / β_N=2.7502
    valid root at (ω=−22.67, γ=+0.088) — flagged BOTH :geom and :gap
    pre-2026-05-08:  discarded → fell back to (+9.34, −2.596)
    post-2026-05-08: warned but kept; chosen as primary (γ=+0.088)

Change in GrowthRateExtraction.jl: drop the discard branch when
both :geom and :gap fire.  Always accept candidate, push warning(s)
to warning_flags, and let downstream tools (post-hoc smoothness
override in plot_betascan.py:apply_chooser_overrides) handle the
trajectory continuity check.

Empirical impact on the shaped_beta_scan / pubrun_050526:
- 7 of 8 affected (case, β_N) pairs now choose correctly without any
  post-hoc override (chooser_overrides count: 9 → 2).
- 1 regression: 3/2 rfitzp at β_N=2.8501 — the previously-available
  smooth-trend candidate (-21.4, -0.241) is no longer in valid_roots
  on the new run (suspected pole reclassification at the unchanged
  pole_threshold check that runs BEFORE the geom/gap check).  Net
  effect on the publication 4-panel γ figure: minimal (1 trace point
  out of ~340 plotted).

Control.jl: minor parameter plumbing for the new policy.

Status: WIP — not yet validated on q95scan, IBS_AT_scan, or DIIID
benchmarks.  Filtered_roots subgroup in HDF5 output now records
LEGACY-rejected roots only (the old above-pole + outside-Re branch);
geom/gap-warned roots appear in valid_roots with their flags.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-burg d-burg added enhancement New feature or request WIP Work in progress labels May 18, 2026
@matt-pharr matt-pharr self-requested a review May 19, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request WIP Work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants