Skip to content

ForceFreeStates: Dual Riccati + Parallel FM integration + Δ' output#178

Open
logan-nc wants to merge 53 commits into
developfrom
perf/riccati
Open

ForceFreeStates: Dual Riccati + Parallel FM integration + Δ' output#178
logan-nc wants to merge 53 commits into
developfrom
perf/riccati

Conversation

@logan-nc
Copy link
Copy Markdown
Collaborator

@logan-nc logan-nc commented Feb 28, 2026

Summary

Implements two new integration paths for the Euler-Lagrange ODE and adds full Δ' (tearing stability parameter) computation to the Riccati and parallel FM paths.

  • Dual Riccati integration (use_riccati = true): Sequential reformulation as a matrix Riccati ODE S = U₁·U₂⁻¹ with periodic renormalization to maintain bounded (U₁, U₂). Validated on Solovev (0.006% energy error) and DIIID n=1 (0.12% error). Faster than standard for large N.

  • Parallel FM integration (use_parallel = true): Each integration chunk is solved independently from identity ICs using Threads.@threads, then assembled serially. Uses bidirectional integration — crossing chunks (near rational surfaces) are integrated backward to keep FM propagators well-conditioned. The well-conditioned backward FM inverse is applied via LU solve during serial assembly. Fixes a ~10% energy error for DIIID N=26 that existed with the all-forward approach.

  • Per-surface Δ': Both paths compute intr.sing[s].delta_prime (scalar) and delta_prime_col (full N-vector of off-diagonal coupling). Written to singular/delta_prime and singular/delta_prime_col in HDF5.

  • Inter-surface Δ' matrix (parallel FM only): intr.delta_prime_matrix (2·msing × 2·msing) via the STRIDE global BVP [Glasser 2018 Phys. Plasmas 25, 032501]. Written to singular/delta_prime_matrix in HDF5.

  • Edge-dW scan decoupled from integration truncation (new, commit 5d5b8ee): The edge-dW scan over ψ ∈ [psiedge, psilim] is now diagnostic-only by default. It no longer silently moves psilim/qlim/u to the dW peak, which was corrupting Δ' of the outermost rational by tens of percent and making the answer depend on a subtle, hidden heuristic. Legacy behaviour is preserved behind an opt-in truncate_at_dW_peak = true flag (marked in the docstring as "Δ'/δW unreliable in this mode"). psihigh / qhigh / dmlim are now the only controls over the integration domain. Validated against Fortran STRIDE on 42+56-point TJ β and ε scans: Δ'(2/1), Δ'(3/1), δWp, δWv, δWt all match within ~1% (see LAR_beta_scan and LAR_epsilon_scan comparison plots on the companion perf/riccati-full-geqdsk-scans branch).


Accuracy

Example Path et[1] Error vs std
Solovev N=8 standard 16.56600
Solovev N=8 riccati 16.56691 0.006%
Solovev N=8 parallel 16.56691 0.006%
DIIID N=26 standard 1.21237
DIIID N=26 riccati 1.21386 0.123%
DIIID N=26 parallel 1.21386 0.123%

The ~0.12% gap on DIIID is algorithmic (different crossing convention: Riccati-style vs Gaussian Reduction), not ODE tolerance. Both paths are physically correct; 0.12% is well within physical uncertainty.


Performance (4 threads)

Example Path Time Speedup
Solovev N=8 standard 0.25s 1.00×
Solovev N=8 riccati 0.09s 2.83×
Solovev N=8 parallel 0.14s 1.80×
DIIID N=26 standard 1.17s 1.00×
DIIID N=26 riccati 0.91s 1.29×
DIIID N=26 parallel 0.90s 1.30×

Tests

Suite Pass Total
runtests_riccati.jl 23 23
runtests_parallel_integration.jl 56 56
runtests_fullruns.jl 9 9

Riccati tests include unit tests for: renormalize_riccati_inplace!, renormalize_riccati!, Riccati end state U₂≈I, Δ' regression values, delta_prime_col shape/diagonal consistency, riccati_der! formula (Glasser 2018 Eq. 19), and compute_delta_prime_from_ca! bit-identical consistency.

Parallel tests include: apply_propagator! identity and linearity, apply_propagator_inverse! round-trip, chunk balance target count, direction field propagation through bidirectional=true, Solovev energy accuracy, DIIID energy accuracy (key bidirectional fix regression), ode_itime_cost additivity, and Δ' matrix regression for both Solovev and DIIID.

runtests_fullruns.jl updated: regression_solovev_kinetic_multi_n/et[1] expected from −0.01248 → −0.19359 with inline comment. The old value reflected the truncated-integration behaviour from the edge-dW heuristic; the new value reflects the full-domain answer controlled solely by psihigh/qhigh.


Breaking behaviour change

Edge-dW scan is no longer a silent psilim truncation. Users who relied on psiedge < psilim to shrink the integration domain must now either:

  1. Set psihigh or qhigh (or dmlim) explicitly to the desired outer boundary (recommended — the whole point of this change is that the edge is user-controlled and visible), or
  2. Add truncate_at_dW_peak = true to their [ForceFreeStates] TOML section to restore the legacy heuristic (flagged as unreliable for Δ' and δW; preserved for experimental work on improved edge-mode filters).

The edge_scan/ HDF5 group is still written with psi, q, plasma_energy, vacuum_energy, total_energy, vacuum_eigenvalue for whatever post-processing previously used it as a diagnostic.


Regression harness snapshot (develop @ e0ee9df vs perf/riccati @ 685a92a)

Run via regress --cases diiid_n1,solovev_n1,solovev_multi_n,tj_epsilon_pole --refs develop,local --force.

  • solovev_n1: 16 CHANGED, 5 unchanged
    • et[1]: −0.2646 → +0.0999 (sign flip — driven by the edge-dW decoupling plus pre-existing Riccati/parallel-FM work on this branch)
    • vacuum matrix min eigenvalue: 1.905 → 2.284 (new HDF5 quantity)
    • ODE steps (saved): 607 → 273 (55 % fewer due to the psihigh refactor on develop between 04-09 and 04-12, unrelated to this PR's edits)
  • diiid_n1: 21 CHANGED, 6 unchanged
    • Leading-eigenvalue shifts track the energy-principle refactor
    • Large shifts in delta prime, island half-widths, Chirikov parameter, ||resonant flux|| are pre-existing on develop (||resonant flux|| went 452 → 3.26 × 10⁶ between develop @ 04-09 and develop @ 04-12 already — see ticket below) and not caused by this PR. The in-PR perturbed-equilibrium changes are expected to re-baseline these; a separate investigation should confirm the DIIID singular-coupling numbers are physically sane.
  • solovev_multi_n: 11 CHANGED, 4 unchanged — qualitatively the same pattern as solovev_n1
  • tj_epsilon_pole: new case on this branch, not present on develop

Files changed

File Change
src/ForceFreeStates/ForceFreeStatesStructs.jl IntegrationChunk.direction, SingType.delta_prime_col, ForceFreeStatesInternal.delta_prime_matrix; new truncate_at_dW_peak::Bool = false control with docstring warning
src/ForceFreeStates/EulerLagrange.jl chunk_el_integration_bounds(bidirectional=false); balance_integration_chunks direction propagation; edge-dW scan split into diagnostic (default) vs legacy-truncation paths
src/ForceFreeStates/Riccati.jl Full Riccati path; integrate_propagator_chunk! bidirectional tspan; apply_propagator_inverse!; bidirectional serial assembly; compute_delta_prime_matrix! with Phi_L/Phi_R BVP; assemble_fm_matrix empty-range guard; single-resonance assertion; edge-dW decoupling at both the sequential Riccati and parallel FM exits
src/GeneralizedPerturbedEquilibrium.jl HDF5 output for singular/delta_prime, delta_prime_col, delta_prime_matrix; edge_scan/ diagnostic group preserved
test/runtests_riccati.jl 23 tests, shared setup, riccati_der! formula test, compute_delta_prime_from_ca! test
test/runtests_parallel_integration.jl 56 tests including apply_propagator_inverse!, direction field, DIIID Δ' matrix
test/runtests_fullruns.jl Solovev kinetic multi-n et[1] expected value updated to reflect full-domain integration
examples/LAR_{beta,epsilon}_scan/gpec.toml Drop stale kin_flag/con_flag keys (removed from the control struct during the develop merge)
examples/TJ_epsilon_pole_example/gpec.toml Same stale-flag cleanup
benchmarks/benchmark_threads.jl Thread-scaling benchmark
benchmarks/benchmark_riccati_der.jl riccati_der! sanity check
benchmarks/benchmark_delta_prime_methods.jl compute_delta_prime_from_ca! sanity check

Code review responses

Fixed: assemble_fm_matrix empty-range guard · single-resonance assertion in compute_delta_prime_matrix! · stale comment in parallel tests · dead parallel_threads field removed · stale kin_flag/con_flag in LAR/TJ example TOMLs (would crash at runtime) · edge-dW silent psilim truncation (was corrupting Δ' by >50 % at some ε-scan points)

Deferred: apply_propagator! per-call allocations · test boilerplate in runtests_parallel_integration.jl · DIIID perturbed-equilibrium coupling-metric magnitudes (pre-existing on develop, separate investigation)


🤖 Generated with Claude Code

logan-nc and others added 4 commits February 28, 2026 01:03
…(1.6x speedup on Solovev)

Implements the dual Riccati matrix S = U₁·U₂⁻¹ as a faster alternative to the standard
Euler-Lagrange ODE integration. Enable with `use_riccati = true` in jpec.toml.

Integration strategy: uses `sing_der!` (same ODE RHS as standard) with periodic Riccati
renormalization S = U₁·U₂⁻¹, U₂ = I in the callback when column norms exceed ucrit. This
is mathematically equivalent to the explicit Riccati ODE (dS/dψ = B + A·S - S·D - S·C·S)
but numerically stable: the explicit Riccati ODE has quadratic blowup for explicit solvers
when K̄·S >> Q, while sing_der! + renorm tracks the bounded ratio S = U₁/U₂.

The Riccati crossing (`riccati_cross_ideal_singular_surf!`) skips Gaussian reduction (which
can produce NaN/Inf when S is near-zero near the axis) and uses `ipert_res` directly.

Benchmarks on Solovev example (N=8, 1 singular surface):
  Standard ODE: 83.7 ms, 157 steps
  Riccati ODE:  51.4 ms, 121 steps  (1.63x speedup, 0.006% energy difference)

See: Glasser (2018) Phys. Plasmas 25, 032507 — Eq. 19

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## Part 1: Δ' output (tearing stability parameter)
- Add `delta_prime::Vector{ComplexF64}` to `SingType`
- Add `compute_delta_prime_from_ca!` in EulerLagrange.jl, called at end of
  `eulerlagrange_integration` (standard path only — see normalization note below)
- Write `singular/delta_prime` as (msing × n_modes) ComplexF64 to HDF5 output in JPEC.jl
- Riccati path does NOT compute delta_prime: ca_l is accumulated in (S,I) normalization
  which is inconsistent with the Δ' formula (standard (U1,U2) normalization required)

## Part 2: Parallel Fundamental Matrix (FM) integration
- Add `ChunkPropagator` struct (two N×N×2 blocks for identity-block ICs) in Structs
- Add `use_parallel::Bool = false` control flag in ForceFreeStatesControl
- Add `integrate_propagator_chunk!` — integrates each chunk from IC=(I,0) and IC=(0,I)
  independently using BS5 solver, no callback; suitable for Threads.@threads
- Add `apply_propagator!` — in-place 2×2 block matrix multiply on odet.u
- Add `balance_integration_chunks` — sub-divides chunks using ode_itime_cost for
  load-balanced parallel work; target = max(2*msing+3, 4*nthreads)
- Add `ode_itime_cost` — log-divergent cost model from STRIDE (Glasser 2018)
- Add `parallel_eulerlagrange_integration` — parallel phase with Threads.@threads,
  serial assembly calling renormalize_riccati_inplace! before each crossing (needed
  because apply_propagator! gives general (U1,U2) state but riccati crossing expects
  (S,I) form); uses ipert_res-direct zeroing to correctly identify the resonant column
- Dispatch from eulerlagrange_integration: use_parallel → use_riccati → standard

## Tests (29 total: 11 Riccati + 18 Parallel FM)
- runtests_riccati.jl: update Δ' test — only standard path populates delta_prime
- runtests_parallel_integration.jl (new): ChunkPropagator identity/linearity,
  balance_integration_chunks count/coverage/crossings, ode_itime_cost additivity,
  parallel FM energy match (rtol=2%, Solovev)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rallel tests to suite

Δ' is now computed inline in riccati_cross_ideal_singular_surf! using the diagonal
formula on the bounded (U₁, U₂) state (max ≤ ucrit, no GR permutation). This gives
physically correct values: 57.3 and -4.03 for the two Solovev singular surfaces.

The standard path does not populate delta_prime — Gaussian Reduction inflates the
resonant column's asymptotic coefficients, making ca_l non-physical regardless of
when it is computed. A comment in cross_ideal_singular_surf! explains the limitation.

Also adds runtests_riccati.jl and runtests_parallel_integration.jl to the default
test suite (runtests.jl). Both were previously excluded.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… lacks Δ'

The comment in cross_ideal_singular_surf! previously said the issue was GR
"normalization inflation." The real reason is more subtle: Δ' is a complex,
normalization-convention-dependent quantity. The Riccati renormalization (U₂→I)
continuously phases solution columns into a specific gauge where the diagonal
formula (ca_r - ca_l)/denom gives physically meaningful values. The standard
path's solution columns grow from the axis with an arbitrary complex phase;
dividing by the outer asymptotic coefficient normalizes the magnitude but not the
complex phase, producing a value in a different convention that does not match
what SingularCoupling.jl expects.

Also reverts the failed attempt to compute Δ' in cross_ideal_singular_surf! via
perm_col + A_outer normalization, which produced -0.10-0.54i vs the Riccati
57.3+58.3i (same physical quantity, incompatible conventions).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@logan-nc logan-nc changed the title ForceFreeStates - NEW FEATURE - Dual Riccati EL integration (1.2–1.6x speedup) ForceFreeStates: Dual Riccati + Parallel FM integration + Δ' output Mar 1, 2026
logan-nc and others added 3 commits February 28, 2026 23:16
…large-N documentation

1. Add SingType.delta_prime_col (N × n_res_modes Matrix) storing the full column
   (ca_r[:, ipert_res, 2] - ca_l[:, ipert_res, 2]) / (4π²·psio) at each crossing.
   The diagonal element matches delta_prime[i] exactly. Off-diagonal elements give
   intra-surface coupling of all N modes to each resonant mode through the singular
   layer asymptotic expansion. Only populated for Riccati/parallel FM paths.

2. Add singular/m, singular/n, singular/delta_prime_col HDF5 outputs so downstream
   users can access the full off-diagonal Δ' without needing to index ca_left/ca_right.

3. Document the known numerical limitation of the parallel FM path for large N:
   FM propagators become ill-conditioned for N ≳ 20 without QR orthogonalization,
   causing ~10% energy error for DIIID (N=26) with no wall-clock speedup over Riccati.
   Deferred fix: bidirectional integration or continuous QR (noted in docstring/tests).

4. Update outer-plasma Riccati re-integration (already committed) docstring to match.

Tests: 50/50 Riccati+parallel, 84/84 EulerLagrange all pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…trix

Implements the STRIDE global boundary value problem for computing the full
2·msing × 2·msing inter-surface tearing stability matrix. Each entry gives
the U₂[ipert_res] response amplitude at one surface boundary when driving
with unit amplitude at another, encoding cross-surface coupling.

Changes:
- Riccati.jl: add assemble_fm_matrix (chunk FM product) and
  compute_delta_prime_matrix! (BVP assembly + solve via STRIDE formulation
  from Glasser 2018 Phys. Plasmas 25, 032501 Sec. III.B); call from
  parallel_eulerlagrange_integration
- ForceFreeStatesStructs.jl: add delta_prime_matrix field to
  ForceFreeStatesInternal with docstring
- JPEC.jl: write delta_prime_matrix to singular/delta_prime_matrix in HDF5
- test/runtests_parallel_integration.jl: add delta_prime_matrix regression
  test (shape, finiteness, non-zero diagonal); 30 tests total (was 23)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… for large-N accuracy

The all-forward parallel FM path had ~10% energy error for large-N problems
(DIIID N=26, n=1) because the chunk immediately before each rational surface
crossing integrates into exponentially growing solution territory, producing
an ill-conditioned FM propagator.

Fix: integrate the crossing chunk *backward* (psi_end → psi_start). Solutions
that grow exponentially forward decay backward, yielding a well-conditioned
backward FM Φ_bwd. The accurate forward propagation is recovered as Φ_bwd⁻¹
via a stable LU solve in apply_propagator_inverse!.

The same backward FM is used directly in the Δ' BVP (compute_delta_prime_matrix!)
as Phi_L[j], splitting each ill-conditioned inter-surface FM product into
well-conditioned Phi_R (forward chunks) and Phi_L (backward crossing chunk).

Changes:
- IntegrationChunk: add direction::Int=1 field (+1 forward, -1 backward)
- chunk_el_integration_bounds: add bidirectional=false kwarg; crossing chunks
  get direction=-1 when true
- balance_integration_chunks: left sub-chunk always direction=1; right inherits
  chunk.direction so the near-singularity chunk stays backward after splitting
- integrate_propagator_chunk!: reverses tspan for direction=-1 chunks
- apply_propagator_inverse!: new function, LU solve Φ_bwd·x = u_old
- Serial assembly: branches on chunk.direction (inverse vs forward apply)
- parallel_eulerlagrange_integration: passes bidirectional=true
- compute_delta_prime_matrix!: BVP now uses Phi_R·x_right - Phi_L·x_left = 0
  at each junction instead of ill-conditioned monolithic Phi_segs product
- assemble_fm_matrix: safe for empty idx_range (uses propagators[1] for N)

Results (et[1] stability eigenvalue):
  Solovev N=8:   0.006% error (was already fine)
  DIIID   N=26:  0.236% error (was ~10.5% — 44× accuracy improvement)

Tests: 31/31 pass in runtests_parallel_integration.jl (+1 DIIID accuracy test)
       18/18 pass in runtests_riccati.jl (unchanged)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@logan-nc
Copy link
Copy Markdown
Collaborator Author

logan-nc commented Mar 8, 2026

@claude review this

@logan-nc logan-nc self-assigned this Mar 8, 2026
@claude
Copy link
Copy Markdown

claude Bot commented Mar 8, 2026

Claude finished @logan-nc's task in 3m 55s —— View job


Code Review

This is a well-designed PR implementing three complementary integration paths. The theory is clearly traced to Glasser (2016/2018), the docstrings are excellent, and the bidirectional fix for large-N conditioning is elegant. Below are issues found, ordered by severity.


Bug: assemble_fm_matrix panics on empty range

src/ForceFreeStates/Riccati.jl:77-87

function assemble_fm_matrix(propagators::Vector{ChunkPropagator}, idx_range)
    N = size(propagators[1].block_upper_ic, 1)   # ← accesses propagators[1] unconditionally
    Phi = Matrix{ComplexF64}(I, 2N, 2N)
    for i in idx_range
        Phi = Phi_i * Phi
    end
    return Phi
end

Called in compute_delta_prime_matrix! as:

Phi_R_mats[1] = assemble_fm_matrix(propagators, 1:i_crossings[1]-1)

If the first chunk is already a crossing chunk (i_crossings[1] == 1), the range is 1:0 (empty). propagators[1] still exists but the indexing to determine N is a fragile pattern — and more importantly, the result should be the identity matrix, which requires knowing N from somewhere else when the range is empty. The function should guard against this:

function assemble_fm_matrix(propagators::Vector{ChunkPropagator}, idx_range)
    isempty(idx_range) && return Matrix{ComplexF64}(I, 2*size(propagators[1].block_upper_ic,1), 2*size(propagators[1].block_upper_ic,1))
    N = size(propagators[first(idx_range)].block_upper_ic, 1)
    ...
end

Or better: pass N as an explicit argument. Fix this →


Minor: apply_propagator! allocates on every call in the serial assembly loop

src/ForceFreeStates/Riccati.jl:739-758

function apply_propagator!(odet::OdeState, prop::ChunkPropagator)
    u1_prev = copy(@view odet.u[:, :, 1])   # allocation
    u2_prev = copy(@view odet.u[:, :, 2])   # allocation
    tmp = similar(u1_prev)                   # allocation
    ...
end

This is called once per chunk in the serial assembly phase, so for a typical DIIID run (say 50 chunks) the overhead is ~150 allocations of N×N complex matrices. For N=26 this is negligible, but for the design intent of supporting large N it is worth pre-allocating a work buffer in parallel_eulerlagrange_integration and threading it through. Same for apply_propagator_inverse!. Fix this →


Minor: Stale comment in runtests_parallel_integration.jl

test/runtests_parallel_integration.jl:125-133

# Note: this test uses the Solovev example (N=8 modes) where FM propagators
# are well-conditioned. For large-N problems (N ≳ 20, e.g. DIIID with N=26),
# FM propagator ill-conditioning leads to ~10% energy error with no speedup
# over the serial Riccati path. See parallel_eulerlagrange_integration docstring
# for details and deferred fix approaches (bidirectional integration / continuous QR).

This comment predates the bidirectional fix that is the main contribution of this PR. The large-N issue is now resolved — the DIIID accuracy is within 0.24% (see test at line 190). This comment should be updated to reflect the current state. Fix this →


Minor: Dead code — riccati_der! and compute_delta_prime_from_ca!

Both functions have docstrings acknowledging they are unused ("retained for reference"):

  • riccati_der! (Riccati.jl:281): uses the @with_pool pool macro, tying it to an execution context. If this function is never called, it should either be tested (e.g., as a sanity check that the explicit form matches sing_der!), or removed. The docstring says it could be used with an implicit solver — that's a legitimate reason to keep it, but at minimum add a comment linking to the issue/TODO tracking that future work.

  • compute_delta_prime_from_ca! (EulerLagrange.jl:22): the docstring explicitly says "no longer called from any integration driver". If it's retained for testing, add a test. Otherwise remove.


Minor: compute_delta_prime_matrix! single-resonance assumption not asserted early

Riccati.jl:167-172

ipert_all = [begin
    sp = intr.sing[j]
    idx = 1 + sp.m[1] - intr.mlow + (sp.n[1] - intr.nlow) * intr.mpert
    @assert 1 <= idx <= N "Resonant mode index out of range"
    idx
end for j in 1:msing]

The function uses sp.m[1] / sp.n[1], silently assuming exactly one resonant mode per surface. The limitation is documented in the docstring, but multi-resonance surfaces will silently give wrong results (only the first resonant mode is coupled). A stronger guard at the function entry would be safer:

@assert all(j -> length(intr.sing[j].m) == 1, 1:msing) "compute_delta_prime_matrix! only supports single-resonance surfaces; got multi-resonance surface"

Fix this →


Minor: Test boilerplate duplication

The 15-line equilibrium setup block (parse TOML → setup_equilibriumsing_find! → set modes) is repeated verbatim across 7+ test cases in runtests_riccati.jl and runtests_parallel_integration.jl. Extract a helper function:

function setup_solovev_intr_ctrl(; use_riccati=false, use_parallel=false)
    ex = joinpath(@__DIR__, "test_data", "regression_solovev_ideal_example")
    inputs = TOML.parsefile(joinpath(ex, "jpec.toml"))
    inputs["ForceFreeStates"]["verbose"] = false
    inputs["ForceFreeStates"]["use_riccati"] = use_riccati
    inputs["ForceFreeStates"]["use_parallel"] = use_parallel
    ...
    return intr, ctrl, equil, ffit
end

This would cut the test file length by ~40% and make failures easier to localize. Fix this →


Minor: parallel_threads field now superseded

ForceFreeStatesControl.parallel_threads (already present, default 1) was previously documented as "not yet implemented". Now that use_parallel routes through Threads.nthreads(), parallel_threads is effectively dead. Either: (a) wire parallel_threads into parallel_eulerlagrange_integration via @threads :static with thread count control, or (b) document it as deprecated in favour of JULIA_NUM_THREADS env variable.


Positive observations

  • The bidirectional integration rationale is crystal clear. The docstring on parallel_eulerlagrange_integration (and the STRIDE BVP comments in compute_delta_prime_matrix!) trace every design decision back to the specific equations in Glasser (2018). This level of traceability is excellent.

  • The outer plasma re-integration (Riccati rather than FM for the post-last-crossing region) is a thoughtful decision: it avoids FM ill-conditioning in the outer plasma and provides dense checkpoints for findmax_dW_edge!. The comment explaining why is clear.

  • renormalize_riccati_inplace! after every chunk in the serial assembly loop is the Julia analogue of ode_fixup in STRIDE — this connection is explicitly called out in the code comment, which is appreciated.

  • integrate_propagator_chunk! with no callback is correctly justified: identity ICs keep the solution bounded within each chunk (the chunk is designed to stop before the rational surface), so normalization is unnecessary.

  • Δ' normalization convention is carefully documented in riccati_cross_ideal_singular_surf!: the skip-renorm-before-crossing approach ensuring ca_r[ipert_res, ipert_res, 2] = 1 is a subtle but correct choice that makes the left/right Δ' comparison gauge-consistent.

  • Test coverage is solid. Unit tests for renormalize_riccati_inplace! (idempotent case, random case), apply_propagator! (identity, linearity), balance_integration_chunks (chunk count, contiguity, crossing count), and end-to-end regression tests for both Solovev and DIIID.


logan-nc and others added 8 commits March 8, 2026 17:27
Adds benchmarks/benchmark_threads.jl to measure wall-clock time and
accuracy of the standard, Riccati, and parallel FM integration paths
across varying thread counts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three fixes from code review of PR #178:

- assemble_fm_matrix: add explicit isempty guard before the propagator
  loop so an empty idx_range (e.g. i_crossings[1]==1) returns the
  identity matrix without relying on the loop falling through silently.

- compute_delta_prime_matrix!: add @Assert at function entry that all
  singular surfaces have exactly one resonant mode, so multi-resonance
  surfaces fail loudly instead of silently using only sp.m[1]/sp.n[1].

- runtests_parallel_integration.jl: remove stale comment that described
  large-N FM ill-conditioning as an open problem with ~10% energy error;
  bidirectional integration (now the default for use_parallel=true) has
  resolved this.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…er! and compute_delta_prime_from_ca!

Two developer benchmark scripts for verifying the two dead-code reference
implementations flagged in the Claude code review of PR #178:

benchmarks/benchmark_riccati_der.jl
  Verifies riccati_der! correctly evaluates Glasser 2018 Eq. 19:
    dS/dψ = w†·F̄⁻¹·w - S·Ḡ·S,  w = Q - K̄·S
  Uses Hermitian test states (physical constraint: the EL system preserves
  S†=S from the axis) and compares riccati_der! against manual evaluation
  of the same formula using the ffit splines directly.
  Observed error: ~1e-17 (machine epsilon). No TOML flags needed.

benchmarks/benchmark_delta_prime_methods.jl
  Verifies compute_delta_prime_from_ca! gives bit-for-bit identical Δ'
  values to the inline computation in riccati_cross_ideal_singular_surf!.
  Both apply the same diagonal formula to the same ca_l/ca_r arrays, so
  the result must be exactly zero difference.
  Observed difference: 0.0 (exact). No TOML flags needed.

Neither script requires new TOML flags: they call internal functions directly
without going through ForceFreeStatesControl. Developer-only knobs belong in
scripts, not in user-facing config.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tring

The previous "O(Δψ)" phrasing in the Integration Strategy section read as a
global accuracy statement, suggesting the Riccati path is only first-order
accurate. This is wrong: the method integrates the linear EL ODE with Tsit5
(5th-order) and recovers S = U₁·U₂⁻¹ by exact renormalization, achieving
the full ODE solver reltol.

Rewrite the section in three clearly labelled parts:
- Why riccati_der! (quadratic ODE) is avoided: relative error control is
  unfaithful when |S| is large, not a step-size problem, not fixable by
  adaptation without an implicit solver.
- What the implementation actually does: sing_der! (linear ODE, exact RHS),
  Tsit5 (5th-order), exact renormalization, same global accuracy as standard.
- Local consistency analysis: the O(Δψ) expansion is retained but now
  labelled explicitly as a consistency check, not an accuracy claim.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…setup + new unit tests

Two changes in one pass:

Shared setup (performance):
  equil (Grad-Shafranov) and ffit (metric matrices) are now built once and
  shared across all integration-dependent testsets via a make_solovev_intr
  helper for cheap fresh intr construction. Previously, setup_equilibrium +
  make_metric + make_matrix ran 4 times and riccati_eulerlagrange_integration
  ran 3 times. Now each runs once, cutting total test time significantly.

New unit tests (dead code coverage):
  "riccati_der! formula — Glasser 2018 Eq. 19": verifies riccati_der!
    correctly evaluates dS/dψ = w†F̄⁻¹w − SGS at several ψ points using
    Hermitian test states (physical constraint). Agrees with manual formula
    evaluation to machine precision (~1e-17). No extra integration needed.

  "compute_delta_prime_from_ca! matches inline Δ'": verifies the standalone
    Δ' formula gives bit-for-bit identical results to the inline computation
    in riccati_cross_ideal_singular_surf!. Reuses the shared odet_ric.

Total: 23 tests (was 18), runtime ~51s (was ~80s+ with redundant setup).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dd 3 unit tests

- Delete unused parallel_threads field from ForceFreeStatesControl: the field was
  silently ignored (Threads.@threads uses JULIA_NUM_THREADS at startup, not a runtime
  field). Removes false impression that thread count can be set from jpec.toml.

- Add apply_propagator_inverse! round-trip unit test: verifies Φ⁻¹·Φ = I algebraically,
  complementing the existing apply_propagator! identity and linearity tests.

- Add chunk_el_integration_bounds direction field test: verifies bidirectional=true
  sets direction=-1 on crossing chunks and direction=+1 on non-crossing chunks, and
  that balance_integration_chunks preserves direction correctly (right sub-chunk inherits,
  left sub-chunk always +1). Catches direction propagation regressions.

- Add delta_prime_matrix DIIID regression test: verifies the STRIDE BVP Δ' matrix is
  finite and non-zero for the large-N case (N≈26, multiple rational surfaces), where
  ill-conditioned (non-bidirectional) FM propagators would produce NaN/Inf entries.

56/56 parallel integration tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…f/riccati

One conflict resolved in src/GeneralizedPerturbedEquilibrium.jl (was src/JPEC.jl):
write_outputs_to_HDF5 vacuum section. Resolution:
- Keep new Δ' HDF5 outputs from perf/riccati (singular/m, singular/n,
  delta_prime, delta_prime_col, delta_prime_matrix)
- Adopt develop's vacuum output format: vac_data variable name,
  plasma_pts/wall_pts fields (3D Cartesian), y_plasma/y_wall entries,
  always-write pattern with empty arrays

All other files (EulerLagrange.jl, ForceFreeStatesStructs.jl, runtests.jl)
auto-merged cleanly. Default HDF5 filename updated to gpec.h5.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…um references post-rename

Update test files and benchmarks to use the new package name and config filename
(gpec.toml) following the GPEC rename merged from develop:
- test/runtests_riccati.jl
- test/runtests_parallel_integration.jl
- benchmarks/benchmark_threads.jl
- benchmarks/benchmark_riccati_der.jl
- benchmarks/benchmark_delta_prime_methods.jl

23/23 riccati tests and 56/56 parallel integration tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@logan-nc
Copy link
Copy Markdown
Collaborator Author

logan-nc commented Mar 9, 2026

Post-merge update — after develop (GPEC rename) merge

Merged origin/develop (GPEC name-change PR #168) into this branch. One conflict resolved in write_outputs_to_HDF5: kept the new Δ' HDF5 outputs from this branch, adopted develop's updated vacuum field names (vac_data, plasma_pts/wall_pts, 3D Cartesian). Updated all test and benchmark files to reference GeneralizedPerturbedEquilibrium and gpec.toml.

Test results (post-merge)

Suite Pass Total Time
runtests_riccati.jl 23 23 1m27s
runtests_parallel_integration.jl 56 56 1m31s

Benchmark results (post-merge, 1 thread)

Example Path et[1] Time Speedup vs std Error vs std
Solovev N=8 standard 16.56600 0.09s 1.00×
Solovev N=8 riccati 16.56691 0.17s 0.51× 0.0055%
Solovev N=8 parallel 16.56691 0.13s 0.66× 0.0055%
DIIID N=26 standard 1.21237 1.26s 1.00×
DIIID N=26 riccati 1.21386 0.63s 1.98× 0.1230%
DIIID N=26 parallel 1.21386 1.19s 1.06× 0.1231%

Benchmark results (post-merge, 4 threads)

Example Path et[1] Time Speedup vs std Error vs std
Solovev N=8 standard 16.56600 0.25s 1.00×
Solovev N=8 riccati 16.56691 0.09s 2.83× 0.0055%
Solovev N=8 parallel 16.56691 0.14s 1.80× 0.0055%
DIIID N=26 standard 1.21237 1.17s 1.00×
DIIID N=26 riccati 1.21386 0.91s 1.29× 0.1230%
DIIID N=26 parallel 1.21386 0.90s 1.30× 0.1231%

Note: 1-thread Solovev timings have high variance (sub-100ms runs) and are not meaningful for speedup comparisons. DIIID (1.2s+) is the reliable benchmark. At 4 threads, Riccati achieves 1.29× and parallel achieves 1.30× on DIIID. Energy agreement is unchanged: <0.13% for both paths vs standard.

logan-nc and others added 3 commits March 9, 2026 07:49
…page

Creates docs/src/stability.md covering the ForceFreeStates module:
- Newcomb/DCON ideal MHD stability criterion with paper citations
  (Glasser 2016 Phys. Plasmas 23 112506, 2018a 032507, 2018b 032501)
- Standard, Riccati, and parallel FM integration methods
- Bidirectional integration strategy for large-N accuracy
- Δ' tearing parameter: per-surface (delta_prime/delta_prime_col)
  and inter-surface matrix (delta_prime_matrix / STRIDE BVP)
- Configuration reference, API autodocs block, example usage

Adds page to docs/make.jl navigation and cross-links from equilibrium.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…kdown links)

1. Add Random stdlib to Project.toml [deps] and [compat] — required by
   runtests_riccati.jl but missing from declared dependencies, causing
   CI failure with "Package Random not found in current path".

2. Fix docstring markdown in Riccati.jl and ForceFreeStatesStructs.jl:
   - Wrap bare [array_notation] (link text) immediately followed by
     (description) (parsed as URL) in code fences to prevent Documenter
     from treating them as broken local links.
   - Affected: assemble_fm_matrix BVP unknowns block, Phi_L/Phi_R
     equations, and VacuumData plasma_pts/wall_pts field descriptions.

These were surfaced by the new @autodocs block in stability.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nt logging

Three targeted fixes from pre-merge code review:

1. Threads.@threads :static — since Julia 1.7, the default :dynamic
   scheduler can migrate tasks between OS threads mid-execution, making
   Threads.threadid() unreliable for indexing into odet_proxies. Using
   :static guarantees a 1:1 task-to-thread mapping for the parallel FM
   integration phase.

2. outer_chunk psi_end guard — the outer-plasma re-integration in
   parallel_eulerlagrange_integration now uses psilim*(1-eps) to match
   the guard applied by chunk_el_integration_bounds, avoiding a potential
   boundary evaluation at exactly psilim.

3. Replace println with @info/@warn — all verbose-mode output in Riccati.jl
   now uses Julia logging macros, consistent with EulerLagrange.jl. This
   enables log-level filtering and suppression in tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@logan-nc
Copy link
Copy Markdown
Collaborator Author

logan-nc commented Mar 9, 2026

Pre-merge cleanup (commits 1515591..725a527)

Three rounds of fixes applied after a code review pass:

CI fixes (1515591)

  • Random stdlib missing from Project.tomlruntests_riccati.jl uses using Random but the package was not declared in [deps], causing CI to fail with Package Random not found. Added to both [deps] and [compat].
  • Broken markdown links in docstrings — adding @autodocs in stability.md surfaced docstring patterns like x_left[j] (2N) that CommonMark parses as [link text](url) even with spaces between ] and (. Fixed by wrapping affected notation blocks in code fences (assemble_fm_matrix, compute_delta_prime_matrix!, VacuumData field docs).

Documentation (142a79c)

  • New docs/src/stability.md page covering the ForceFreeStates module: Newcomb/DCON stability criterion with paper citations (Glasser 2016/2018a/2018b), all three integration drivers, Δ' computation (per-surface and STRIDE BVP matrix), configuration reference, @autodocs API block, and example usage.
  • Added to docs/make.jl navigation; cross-linked from equilibrium.md.

Pre-merge code review fixes (725a527)

Three issues flagged as must-fix or high-priority:

  1. Threads.@threads :static — since Julia 1.7, the default :dynamic scheduler can migrate tasks between OS threads, making Threads.threadid() unreliable for indexing odet_proxies. :static guarantees a 1:1 task-to-thread mapping.

  2. outer_chunk psi_end guard — the outer-plasma re-integration now uses psilim * (1 - eps) to match the guard applied by chunk_el_integration_bounds, avoiding a potential boundary evaluation at exactly psilim.

  3. Consistent logging — all println calls in Riccati.jl replaced with @info macros, matching the pattern in EulerLagrange.jl and enabling log-level filtering/suppression in tests.

All 23 Riccati tests and 56 parallel FM tests pass locally after these changes.

@logan-nc
Copy link
Copy Markdown
Collaborator Author

@jhalpern30 will clean this up by moving U solutions to be 2D instead of 3D after this merges (#89)

@d-burg
Copy link
Copy Markdown
Collaborator

d-burg commented Apr 22, 2026

Final checks from 04/22 meeting:

  • Check how truncate_at_dW_peak is changing Δ', specifically order of finishing integration vs Δ' calculation
  • Double δW use_parallel=False
  • Add all TJ equil parameters to gpec.toml
  • Check energy U_store and Ud_store in use_parallel=True, these are DCON ξ functions. Essential for perturbed equilibrium. Compare against DCON and add to benchmark

d-burg and others added 2 commits April 28, 2026 13:52
… default 1 (serial) eliminates DIII-D 147131 thread-race

The parallel BVP path in `parallel_eulerlagrange_integration` was always invoking
`Threads.@threads :static` over the FM chunks, ignoring the `parallel_threads`
field on `ForceFreeStatesControl`. On numerically delicate equilibria (e.g.
DIII-D 147131 at βₚ ≈ 0.07) this exposed a sub-tolerance nondeterminism: chunk
crossings whose post-jump matrices depend on the order of independent FP
operations across threads, producing intermittently divergent FM matrices and
intermittent BVP failures. The algorithm is correct; the wall-time interleaving
of parallel chunks was perturbing it within tolerance.

Fix:
  * `Riccati.jl`: branch on `bvp_threads = clamp(parallel_threads, 1, julia_nthreads)`.
    `bvp_threads == 1` runs the chunks serially on the calling thread (race-free,
    bit-deterministic). Otherwise, the existing `:static` parallel path is used.
  * `ForceFreeStatesStructs.jl`: document `parallel_threads` semantics, default `1`,
    and the cost (~14% slower than 2-thread on DIII-D 147131 reference).

Verified: with `parallel_threads = 1` (default) and `JULIA_NUM_THREADS = 2`, the
DIII-D 147131 βₚ=0.07 reference Δ' diagonal matches CONVENTIONS.md §6 exactly:
  q=2: +7.92 - 0.03i
  q=3: -5.24 - 0.30i
  q=4: -40.20 + 209.91i
  q=5: +126.6 - 169.24i
in 54.5 s wall (single 4-singular-surface coupled BVP). No regressions on TJ.

Production scans should keep the default; users with robust equilibria and
strict wall-time budgets can opt in to `parallel_threads > 1` knowing the trade-off.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…BVP speedup; bit-identical Δ' in 15-trial DIII-D 147131 sweep)

Empirical reliability sweep on DIII-D 147131 βₚ≈0.07 (5 trials at each of
parallel_threads ∈ {1, 2, 4}, JULIA_NUM_THREADS=4, post-JIT, single Julia
session) showed:

  parallel_threads | wall (avg, single 4-singular-surface coupled BVP)
  -----------------|-------------------------------------------------
  1 (serial)       | 9.25 s  — bit-deterministic by construction
  2                | 7.37 s  — bit-identical Δ' in all 5 trials  (+20.3%)
  4                | 7.51 s  — bit-identical Δ' in all 5 trials  (+18.9%)

Δ′ diagonals were bit-identical across all 15 trials and matched the §6
reference values exactly. Speedup saturates at 2 threads — the BVP has
~10 FM chunks, so 2 threads is enough to amortize them; 4 adds scheduling
overhead with no benefit on this BVP.

Bumping default to 2 captures the ~20% wall-time win on production scans.
The serial path remains available (`parallel_threads = 1`) as a deterministic
fallback if the historical intermittent race re-manifests on a delicate
equilibrium. Documentation in `ForceFreeStatesControl` docstring updated to
record the trade-off and the empirical reliability data.

Use `parallel_threads = 1` (NOT `use_parallel = false`) if a parallel run
ever diverges — `use_parallel = false` produces silently wrong Δ' values
(see CONVENTIONS.md §7).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@matt-pharr
Copy link
Copy Markdown
Collaborator

@d-burg status on this?

d-burg and others added 7 commits May 13, 2026 13:57
…stent Δ' + LAR TJ TOML refactor + parallel ξ benchmark

Three perf/riccati cleanups:

1) ForceFreeStates - BUG FIX - truncate_at_dW_peak now keeps Δ' self-consistent
   with the truncated boundary (Option B).  Previously the FM propagators
   were built for the original psilim while the edge BC (wv) was applied at
   the truncated psilim, silently shifting the outermost rational's Δ' by
   tens of percent.  After the dW peak is identified:
   - rebuild the straddling FM chunk with psi_end=peak_psi and re-integrate
     its single propagator,
   - drop chunks entirely past the peak,
   - keep intr.psilim/qlim/odet.u at the new (truncated) boundary.
   This way compute_delta_prime_matrix! always sees propagators and wv that
   match intr.psilim.  ForceFreeStatesStructs.jl docstring updated; the
   "corrupts Δ' and δW" warning is removed since Option B keeps the metric
   well-defined.  Default truncate_at_dW_peak=false unchanged.

2) EXAMPLES - IMPROVEMENT - LAR_beta_scan and LAR_epsilon_scan TJ params are
   now in tj.toml (next to gpec.toml) instead of hardcoded constants inside
   run_scan.jl.  Each run_scan.jl reads the baseline tj.toml once and only
   overrides the single scanned variable (pc for β, lar_r0 for ε) per point.
   Matches the cleaner pattern already used by TJ_epsilon_pole_example.  Both
   `--test` modes verified end-to-end (3 points each, all converged).

3) BENCH - NEW - benchmark_xi_parallel_vs_serial.jl + Solovev xi_benchmark
   plot demonstrating the use_parallel=true ξ-function gap:
   - serial path (EL): 274 dense saved ψ, u_store and ud_store fully
     populated as DCON ξ_ψ, dξ_ψ/dψ, ξ_s
   - parallel path (Riccati FM): only 31 saved ψ (chunk endpoints +
     outer-plasma dense), and u_store actually holds the Riccati S matrix
     (from the (S, I) renormalisation) — NOT the DCON ξ functions
   - ud_store essentially zero in the inter-surface region (matches
     Riccati.jl:1497 caveat)
   The plot makes this unambiguous via per-mode norms vs ψ_N and step-count
   subtitle.  Downstream perturbed-equilibrium code that reads
   integration/xi_psi etc. must use use_parallel=false until a proper
   S→ξ conversion (or dense re-integration) is added to the parallel path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…entical regression + pinned Δ'

Three coupled changes for the parallel FM-propagator path so both Δ' AND
the DCON ξ functions come back from a single `use_parallel = true` run:

1. Dense ξ pass.  `parallel_eulerlagrange_integration` now appends a serial
   Euler-Lagrange dense pass at the end (helper `_populate_dense_xi_via_serial_el!`)
   that replaces the propagator-BVP `odet` with a fresh serial-EL odet
   whose `u_store`/`ud_store` are dense and in axis basis — the only
   convention the PerturbedEquilibrium / FieldReconstruction downstream
   code consumes correctly.  All BVP-relevant fields (`intr.psilim`,
   `intr.qlim`, `intr.sing[*].delta_prime`, `delta_prime_col`, `ua_left`,
   `psi_ua_left`) are saved/restored across the pass.  Gated by new
   `ctrl.populate_dense_xi::Bool = true` (default on).

2. Multi-resonance skip.  Replace the hard `@assert` in
   `compute_delta_prime_matrix!` (which crashed multi-`n` runs whose q
   value was rational for two distinct `(m, n)` tuples) with an early
   return + warning.  Per-surface Δ' from `riccati_cross_ideal_singular_surf!`
   and HDF5 `singular/delta_prime` remain populated; only the
   inter-surface BVP `singular/delta_prime_matrix` is omitted in that
   regime.  Full multi-resonance BVP support tracked as a follow-up.

3. Tests + benchmark.
   - New @testset "ξ functions bit-identical between use_parallel modes
     (populate_dense_xi)" proves `psi_store/q_store/u_store/ud_store/
     crit_store/step/nzero` from `use_parallel=true; populate_dense_xi=true`
     are byte-for-byte identical to a `use_parallel=false` run on both
     Solovev (small N) and DIIID-like (large N), plus a sparse-storage
     control assertion so the bit-identical claim can't trivially pass.
   - Pinned per-surface `intr.sing[s].delta_prime` values added to both
     Solovev and DIIID-like "Parallel FM integration matches standard
     ODE" testsets (rtol=0.05, matches existing `et_par ≈ 1.29` style).
   - Pinned diagonal `delta_prime_matrix` values added to both
     STRIDE BVP Solovev + DIIID-like testsets (rtol=0.05).
   - Benchmark `benchmarks/benchmark_xi_parallel_vs_serial.jl` rewritten:
     accepts any example dir (defaults to Solovev + DIIID-like), overlays
     all resonant modes on log-y, adds a right-column residual panel.

   Net: `runtests_parallel_integration.jl` grew from 113 to 127 tests
   (≈13 s extra per CI matrix entry); `runtests_fullruns.jl` went from
   8/9 (pre-existing multi-n crash) to 9/9 pass after change (2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ywhere

GPEC's analytic-equilibrium adaptation of R. Fitzpatrick's TJ code
(https://github.com/rfitzp/TJ) is now consistently named "TJ-like" in
identifiers and prose, to distinguish it from the upstream TJ code itself.
Fitzpatrick's TJ is cited at every definition and use site.

Identifier renames (BREAKING for direct API users):
  - Struct:   TJConfig → TJLikeConfig (both file-path and dict constructors)
  - Functions:
      tj_run        → tj_like_run
      tj_run_direct → tj_like_run_direct
      tj_f1         → tj_like_f1
      tj_f1p        → tj_like_f1p
      tj_shape_rhs! / tj_shape_initial / tj_shape_solve / tj_find_nu
        → tj_like_shape_rhs! / _initial / _solve / tj_like_find_nu
      TJShapeParams → TJLikeShapeParams
  - Local parameter `tj::TJLikeConfig` → `tjlike::TJLikeConfig` throughout
    AnalyticEquilibrium.jl.

Config / user-facing renames (BREAKING for existing gpec.toml files):
  - eq_type values: "tj" → "tj_like", "tj_direct" → "tj_like_direct"
  - Embedded TOML section: [TJ_INPUT] → [TJ_LIKE_INPUT]
  - EquilibriumConfig now makes `eq_filename` optional when the embedded
    [TJ_LIKE_INPUT] / [SOL_INPUT] / [LAR_INPUT] section is present.
  - Dropped a stale `sigma_type="tj"` reference on LargeAspectRatioConfig.qa.

Tests:
  - test/runtests_tj_analytic.jl → test/runtests_tj_like_analytic.jl
    (git-detected rename, 16/16 pass)
  - test/runtests.jl include path updated.

Coincidental matches in Vacuum/Field.jl ("fintjj") and
InnerLayer/GGJ/{Shooting,InnerAsymptotics}.jl ("_build_tjmat",
"inps_tjmat", loop-local `tj`) are intentionally left alone — they
have nothing to do with Fitzpatrick's TJ code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ation, TJ → TJ-like

LAR_beta_scan and LAR_epsilon_scan each now consist of a single
self-describing gpec.toml plus run_scan.jl.  No more tj.toml side-cars:
all TJ-like analytic-equilibrium parameters live in an embedded
[TJ_LIKE_INPUT] section that gets parsed via the new
EquilibriumConfig(::Dict) path.

Every field across [Equilibrium], [TJ_LIKE_INPUT], [Wall], and
[ForceFreeStates] has a one-liner comment describing what it actually
is (not just a label) — e.g. "Number of radial spline nodes used to
discretize ψ" instead of "Radial grid points".  The header of each
gpec.toml notes that the model follows R. Fitzpatrick's TJ code
(https://github.com/rfitzp/TJ) profile family.

run_scan.jl scripts updated:
  - import TJLikeConfig (was TJConfig)
  - override config["TJ_LIKE_INPUT"][...] (was config["TJ_INPUT"][...])
  - LAR_epsilon_scan flips eq_type → "tj_like_direct" per point
  - banners say "TJ-like β scan" / "TJ-like ε scan"

diagnose_profiles.jl docstring clarified that its "TJ" geqdsk
comparison data are produced by Fitzpatrick's external TJ code, not
GPEC's internal `tj_like` model.

End-to-end --test runs of both scans complete with Δ' values
bit-identical to pre-rename outputs (dp21 = {+10.0150, +15.7659,
+292.6038} for the β scan; {+9.2087, +5.5595, +2.4427} for the ε scan).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n case

The TJ_epsilon_pole_example/ directory and its
regression-harness/cases/tj_epsilon_pole.toml entry are removed.  The
ε ≈ 0.66 near-pole physics it exercised is already covered by the
ε-scan in examples/LAR_epsilon_scan/ (which sweeps ε up to 0.660 along
the same kink-pole approach) and by the
"tj_like_run_direct (Option B) — pole-approach physics at ε = 0.60"
testset in test/runtests_tj_like_analytic.jl.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to 6d07c07.  "TJ-like" reads as a weak hedge ("kinda like the
real thing"); "TJ-analytic" says exactly what this is — GPEC's
implementation of the analytic-profile model from R. Fitzpatrick's TJ
code (https://github.com/rfitzp/TJ).  Citation everywhere this model is
defined or used is unchanged.

Identifier renames (BREAKING again, layered on top of 6d07c07's first
breaking pass):
  - Struct:   TJLikeConfig       → TJAnalyticConfig
  - Struct:   TJLikeShapeParams  → TJAnalyticShapeParams
  - Functions:
      tj_like_run             → tj_analytic_run
      tj_like_run_direct      → tj_analytic_run_direct
      tj_like_f1 / _f1p       → tj_analytic_f1 / _f1p
      tj_like_shape_rhs!      → tj_analytic_shape_rhs!
      tj_like_shape_initial   → tj_analytic_shape_initial
      tj_like_shape_solve     → tj_analytic_shape_solve
      tj_like_find_nu         → tj_analytic_find_nu
  - Local parameter `tjlike::TJLikeConfig` → `tj::TJAnalyticConfig`
    (the parameter name reverts to the original short `tj` since the
    type signature now disambiguates without ambiguity).

Config / user-facing renames (BREAKING for any gpec.toml or downstream
code that adopted 6d07c07's `tj_like` names):
  - eq_type values: "tj_like" → "tj_analytic"
                    "tj_like_direct" → "tj_analytic_direct"
  - Embedded TOML section: [TJ_LIKE_INPUT] → [TJ_ANALYTIC_INPUT]

Test file renamed back:
  - test/runtests_tj_like_analytic.jl → test/runtests_tj_analytic.jl
    (git-detected rename; matches the original pre-perf/riccati name)

Docstrings + comments tightened where "TJ-like analytic" was redundant:
"TJ-like analytic equilibrium" → "TJ-analytic equilibrium", etc.
Where the prose refers to something that lives in Fitzpatrick's actual
TJ code (e.g. GetPSIvac, GetHHvac, EFIT writer, Setnu), the language
now says "TJ-analytic X (cf. Fitzpatrick's TJ)" or just "TJ X" — the
"-analytic" suffix is reserved for our model class, while bare "TJ"
refers to the upstream code.

Verification:
  - julia Pkg.precompile() clean
  - runtests_tj_analytic.jl: 16/16 pass
  - Full test suite: 846/846 pass
  - LAR_beta_scan --test: Δ' bit-identical to pre-rename
    (dp21 = +10.0150, +15.7659, +292.6038 for pc ∈ {0.001, 0.10, 0.17})
  - Banner now reads "TJ-analytic β scan" / "TJ-analytic ε scan"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ense ξ pass

The dense ξ pass in `_populate_dense_xi_via_serial_el!` (introduced in
5acf147) replaces `odet` with a fresh serial-EL odet, but the previous
implementation only saved/restored `intr.sing[*]` fields — leaving the
parallel BVP's (S, I) Riccati-gauge `odet.ca_l` and `odet.ca_r` to be
silently overwritten by the fresh EL pass's axis-basis values.

PerturbedEquilibrium's `SingularCoupling.jl` is calibrated against the
Riccati gauge:

  lbwp1, rbwp1 = ForceFreeStates_results.ca_l[resnum, resnum, 2, s],
                 ForceFreeStates_results.ca_r[resnum, resnum, 2, s]
  delta_prime  = (rbwp1 - lbwp1) / (twopi * chi1)
  delcurs      = (rbwp1 - lbwp1) * j_c * im / (twopi * m_res)
  singflx_mn   = compute_singular_flux(resonant_current_val, ...)
  resonant_flux[n_idx, s] = singflx_mn / area

With axis-basis `ca_l` / `ca_r` from the EL pass (where U₁ grows
exponentially from the axis), these magnitudes blow up by ~25 orders
of magnitude:

  3c8130d (perf/riccati pre-dense-pass): max|resonant_flux| = 5.81e-03
  HEAD before this fix:                   max|resonant_flux| = 2.85e+23
  HEAD after this fix:                    max|resonant_flux| = 5.81e-03  ✓ bit-identical

Cascading downstream quantities — `delta_prime`, `island_width_sq`,
`Chirikov parameter`, `resonant_current`, `penetrated_field` — all
return to their pre-dense-pass physical magnitudes.

The fix: save `odet.ca_l` / `odet.ca_r` to the `saved` tuple before
the dense pass, then copy them onto `fresh_odet.ca_l` / `fresh_odet.ca_r`
after the dense pass returns.  The fresh EL odet's own ca_l/ca_r
(axis basis) are discarded — they were never needed since `ξ`
reconstruction uses `u_store` and `compute_delta_prime_matrix!` uses
propagators/chunks rather than ca_l/ca_r.

Full test suite: 846/846 pass.  The bit-identical tests in
runtests_parallel_integration.jl don't check ca_l/ca_r (only
u_store/ud_store/psi_store/etc.), so they still pass — and now PE
downstream gets the correct Riccati-gauge ca matrices it expects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-burg
Copy link
Copy Markdown
Collaborator

d-burg commented May 14, 2026

perf/riccati: parallel ξ + Δ' from one run, TJ-analytic rename, multi-resonance skip

@logan-nc @matt-pharr what do you think? I believe we're just about ready to merge, pending your recommendations.

Summary

Brings the parallel FM-propagator BVP path to feature-parity with the
serial DCON path so a single use_parallel = true run produces both
the inter-surface Δ' matrix and dense DCON ξ functions in axis basis.
Renames GPEC's analytic-equilibrium adaptation of R. Fitzpatrick's TJ code
to tj_analytic (with citation everywhere it's defined or used), turns the
LAR scan examples into self-describing single-file gpec.toml, and
gracefully handles multi-n runs where one q value is rational for
multiple (m, n) tuples.

Commits since c6c845f

Hash Title
54d12fe2 ForceFreeStates - NEW FEATURE - Port set_psilim_via_dmlim + default tightenings
4845ec80 ForceFreeStates - IMPROVEMENT - Default use_parallel=true, singfac_min=1e-4
db7c490a ForceFreeStates - BUG FIX - Wire ctrl.parallel_threads into BVP path; default 1 (serial) eliminates DIII-D 147131 thread-race
7ac87c8e ForceFreeStates - PERFORMANCE - parallel_threads default 1 → 2 (≈20 % BVP speedup; bit-identical Δ' in 15-trial DIII-D 147131 sweep)
3c8130da ForceFreeStates - BUG FIX + EXAMPLES - truncate_at_dW_peak self-consistent Δ' + LAR TJ TOML refactor + parallel ξ benchmark
5acf1478 ForceFreeStates - NEW FEATURE - Dense ξ in parallel BVP path + bit-identical regression + pinned Δ'
6d07c07d EQUIL - REFACTOR - Rename TJ → TJ-like with Fitzpatrick citation everywhere
085de133 EXAMPLES - CLEANUP - LAR scans: single-file gpec.toml, per-line annotation, TJ → TJ-like
5a4c2c29 EXAMPLES - CLEANUP - Remove TJ_epsilon_pole_example and its regression case
cc2affeb EQUIL - REFACTOR - Rename tj_like → tj_analytic (cleaner, less hedge-y)
8073c126 ForceFreeStates - BUG FIX - Preserve Riccati-gauge ca_l/ca_r across dense ξ pass

Total: 20 files changed (+1355, −718) over the 5 new commits, on top of
the 5 already-pushed perf/riccati commits.

"What's in the box" (a proper Claude-ism)

ForceFreeStates — dense ξ in the parallel path (5acf1478)

Three coupled changes so the parallel FM-propagator BVP delivers both Δ'
and DCON eigenfunctions usable by downstream PerturbedEquilibrium:

  1. Dense ξ pass. parallel_eulerlagrange_integration now appends a
    serial Euler-Lagrange dense pass (helper
    _populate_dense_xi_via_serial_el!) that replaces the propagator-BVP
    odet with a fresh serial-EL odet whose u_store / ud_store are
    dense and in axis basis — the only convention PerturbedEquilibrium
    / FieldReconstruction consumes correctly. All BVP-relevant fields
    (intr.psilim, intr.qlim, intr.sing[*].delta_prime,
    delta_prime_col, ua_left, psi_ua_left) are saved/restored
    across the pass. Gated by the new
    ctrl.populate_dense_xi::Bool = true (default on).
image image
  1. Multi-resonance skip in compute_delta_prime_matrix!. Replace
    the hard @assert (which crashed multi-n runs whose q value was
    rational for two distinct (m, n) tuples) with an early return +
    warning. Per-surface scalar Δ' from
    riccati_cross_ideal_singular_surf! and HDF5
    singular/delta_prime / singular/delta_prime_col remain
    populated; only the inter-surface BVP singular/delta_prime_matrix
    is omitted in that regime.

  2. Tests + benchmark.

    • New @testset "ξ functions bit-identical between use_parallel modes (populate_dense_xi)" proves psi_store / q_store / u_store / ud_store / crit_store / step / nzero from
      use_parallel=true; populate_dense_xi=true are byte-for-byte
      identical to a use_parallel=false run on both Solovev (small N)
      and DIIID-like (large N). A populate_dense_xi=false control
      assertion proves the result isn't trivially passing through
      accidental sparse storage on both sides.
    • Pinned per-surface intr.sing[s].delta_prime values added to both
      "Parallel FM integration matches standard ODE" testsets
      (rtol = 5 %, matching the existing et_par ≈ 1.29 style).
    • Pinned diagonal delta_prime_matrix values added to both STRIDE
      BVP testsets (rtol = 5 %).
    • benchmarks/benchmark_xi_parallel_vs_serial.jl rewritten:
      accepts any example dir (defaults to Solovev + DIIID-like),
      overlays all resonant modes on log-y, adds a right-column residual
      panel. Confirms exact (max|Δ| == 0) match on both equilibria.

    Net: runtests_parallel_integration.jl grew 113 → 127 tests
    (≈13 s extra per CI matrix entry); runtests_fullruns.jl went from
    8/9 (pre-existing multi-n crash) to 9/9 pass.

EQUIL — TJ → TJ-analytic rename + Fitzpatrick citation (6d07c07d, cc2affeb)

GPEC's adaptation of R. Fitzpatrick's TJ code analytic-profile family
(https://github.com/rfitzp/TJ) is now consistently named "TJ-analytic" in
identifiers and prose. The upstream TJ code is cited at every
definition and use site. Why two commits: the first pass introduced
tj_like as the new identifier, but on review it read as a weak hedge
("kinda like the real one"); tj_analytic says exactly what this is —
the analytic-profile model from Fitzpatrick's TJ, fed through GPEC's
pipeline rather than the full TJ code.

Identifier renames:

  • TJConfigTJAnalyticConfig (both file-path and dict constructors)
  • tj_run / tj_run_directtj_analytic_run / tj_analytic_run_direct
  • tj_f1 / tj_f1ptj_analytic_f1 / tj_analytic_f1p
  • tj_shape_rhs! / _initial / _solvetj_analytic_shape_*
  • tj_find_nutj_analytic_find_nu
  • TJShapeParamsTJAnalyticShapeParams
  • Local parameter tj::TJAnalyticConfig (same short tj as before;
    the type signature carries the disambiguation).

Config / user-facing renames:

  • eq_type values: "tj""tj_analytic", "tj_direct""tj_analytic_direct"
  • Embedded TOML section: [TJ_INPUT][TJ_ANALYTIC_INPUT]
  • EquilibriumConfig now makes eq_filename optional when an embedded
    [TJ_ANALYTIC_INPUT] / [SOL_INPUT] / [LAR_INPUT] section is present.
  • Dropped a stale sigma_type="tj" reference on
    LargeAspectRatioConfig.qa.

Tests: test/runtests_tj_analytic.jl (16/16 pass). Coincidental
matches in Vacuum/Field.jl (fintjj) and
InnerLayer/GGJ/{Shooting,InnerAsymptotics}.jl (_build_tjmat,
inps_tjmat, loop-local tj) are intentionally left alone — they
have nothing to do with Fitzpatrick's TJ code.

Prose convention: the -analytic suffix is reserved for our model
class. Bare "TJ" refers to the upstream code (e.g. "TJ-analytic
GetPSIvac (cf. Fitzpatrick's TJ)"
in comments referring to functions
we ported the logic of).

EXAMPLES — LAR scans (085de133) + TJ_epsilon_pole removal (5a4c2c29)

LAR_beta_scan and LAR_epsilon_scan each now consist of a single
self-describing gpec.toml plus run_scan.jl. No more tj.toml
side-cars: all TJ-analytic equilibrium parameters live in an embedded
[TJ_ANALYTIC_INPUT] section. Every field across [Equilibrium],
[TJ_ANALYTIC_INPUT], [Wall], and [ForceFreeStates] has a
one-liner comment describing what it is (not just a label) — e.g.
"Number of radial spline nodes used to discretize ψ" rather than
"Radial grid points". Each header cites
https://github.com/rfitzp/TJ.

run_scan.jl scripts updated:

  • import TJAnalyticConfig (was TJConfig)
  • override config["TJ_ANALYTIC_INPUT"][...] (was config["TJ_INPUT"][...])
  • LAR_epsilon_scan flips eq_type"tj_analytic_direct" per point
  • banners say "TJ-analytic β scan" / "TJ-analytic ε scan"

diagnose_profiles.jl docstring clarified that its "TJ" geqdsk
comparison data are produced by Fitzpatrick's external TJ code, not
GPEC's internal tj_analytic model.

examples/TJ_epsilon_pole_example/ and the matching
regression-harness/cases/tj_epsilon_pole.toml removed. The
ε ≈ 0.66 near-pole physics it exercised is already covered by the
ε-scan in examples/LAR_epsilon_scan/ (which sweeps ε up to 0.660
along the same kink-pole approach) and by the
"tj_analytic_run_direct (Option B) — pole-approach physics at ε = 0.60"
testset in test/runtests_tj_analytic.jl.

Default behaviour after merge

With the shipped example tomls (ode_flag=true, vac_flag=true,
use_parallel=true, populate_dense_xi=true), one main() call
produces:

Quantity HDF5 path Required flags
DCON ξ functions, dense in axis basis integration/xi_psi, dxi_psi, xi_s, psi, q ode_flag, populate_dense_xi
Per-surface Δ' singular/delta_prime, singular/delta_prime_col ode_flag, use_parallel
Inter-surface PEST3 Δ' matrix singular/delta_prime_matrix ode_flag, use_parallel, vac_flag, single-resonance surfaces only
Free-boundary energies vacuum/ep, vacuum/ev, vacuum/et ode_flag, vac_flag

Tests

File Result
runtests_utilities.jl 11 / 11 ✓
runtests_vacuum.jl 252 / 252 ✓
runtests_equil.jl 240 / 240 ✓
runtests_eulerlagrange.jl 92 / 92 ✓
runtests_riccati.jl 24 / 24 ✓
runtests_parallel_integration.jl 127 / 127 ✓ (was 113 before this PR; +14 from pinned Δ' + bit-identical tests)
runtests_sing.jl 75 / 75 ✓
runtests_tj_analytic.jl 16 / 16 ✓
runtests_fullruns.jl 9 / 9 ✓ (was 8/9 errored on perf/riccati tip due to pre-existing multi-n assertion crash; now passes thanks to the skip in 5acf1478)
Total 846 / 846 ✓

End-to-end --test runs of both LAR scans complete with Δ' values
bit-identical to pre-rename outputs:

Scan Δ' values
LAR_beta_scan --test dp21 = {+10.0150, +15.7659, +292.6038} for pc ∈ {0.001, 0.10, 0.17}
LAR_epsilon_scan --test dp21 = {+9.2087, +5.5595, +2.4427} for ε ∈ {0.2495, 0.4072, 0.5510}

Regression-harness sweep against develop

Ran regress --cases diiid_n1,solovev_n1,solovev_multi_n --refs develop,local
(develop @ e0ee9dff, 2026-04-12). All differences are accounted for:

Expected shifts (introduced earlier in perf/riccati, intentional)

These show up across all three cases and reflect documented behavior changes:

Class Cause Approx scale
ODE step counts (integration/nstep_*) ↓ ~45-55 % singfac_min = 1e-4 (was 0) tightens inner-layer cutoff (4845ec80); dense ξ pass uses serial-EL adaptive stepping (5acf1478) DIIID 1526 → 756
Energy eigenvalues (et[1], ep[1], ev[1]) shift O(1)-O(100) free_run! now runs on the dense-pass serial-EL odet (axis basis) rather than the parallel Riccati odet (renormalized basis) DIIID et[1] 1.51 → −7.80
mpert change (DIIID 26 → 35) delta_mhigh doubling in main pipeline + tightenings (54d12fe2)
Δ' diagonal magnitudes Now computed by riccati_cross_ideal_singular_surf! in (S, I) Riccati gauge (4845ec80); old default path used a different normalization within-branch bit-identical (pinned in tests)
q profile (checksum), pressure profile (checksum) set_psilim_via_dmlim moved the integration limit (54d12fe2)

||resonant flux|| blow-up — diagnosed and fixed in this PR (8073c126)

Pre-fix the dense ξ pass produced max|resonant_flux| = 2.85e+23 on
diiid_n1 (vs 5.81e-03 on the pre-dense-pass perf/riccati tip
3c8130da). Cascading downstream: delta_prime, island_half_widths,
Chirikov parameter, resonant_current, penetrated_field all blew up.

Root cause: _populate_dense_xi_via_serial_el! (in 5acf1478)
replaced the parallel BVP's odet with a fresh serial-EL odet but
preserved only intr.sing[*] fields — silently letting the fresh EL
pass's axis-basis ca_l / ca_r overwrite the parallel BVP's
(S, I) Riccati-gauge values. PE's SingularCoupling.jl reads
ca_l[resnum, resnum, 2, s] / ca_r[resnum, resnum, 2, s] and is
written against the Riccati convention; in EL axis basis these
matrices carry the exponentially-growing U₁ at the inner-layer
boundary, inflating downstream resonant flux by ~25 orders of magnitude.

Fix (8073c126): also save odet.ca_l / odet.ca_r in the
helper's saved tuple and copy them onto fresh_odet after the dense
pass returns. The fresh EL odet's own ca_l/ca_r are discarded —
they're not needed because compute_delta_prime_matrix! uses
propagators/chunks (not ca) and the bit-identical-ξ test in
runtests_parallel_integration.jl only compares
u_store/ud_store/psi_store/q_store/crit_store (not ca).

Verification: after the fix, max|resonant_flux| = 0.005813333867190486
on diiid_n1 — bit-identical to 3c8130da (the perf/riccati
pre-dense-pass tip). Full test suite still 846/846.

solovev_n1 Ref 1 replay failure

The cached develop @ e0ee9dff entry for solovev_n1 fails to replay
inside the harness (runner.jl:384). Not a code issue — the harness
cache for that one case has gone stale. Local ran cleanly with
ODE steps = 274, et[1] = +9.985e-02, qmax = 3.147 (consistent
with the LAR scan results).

Known issues / open questions

  • Multi-resonance surface support in compute_delta_prime_matrix!
    (this PR makes the path graceful but doesn't fully extend it). When a
    multi-n run produces a singular surface satisfying two distinct
    (m, n) tuples (e.g. q = 2 with both (m=2, n=1) and (m=4, n=2)),
    the inter-surface Δ' BVP is now skipped with a warning instead of
    crashing. Per-surface Δ' (singular/delta_prime, delta_prime_col)
    is still produced. Tracked as a follow-up: matrix shape becomes
    (n_res_total, n_res_total) with a (surface, mode, side) ↔ BVP-row
    map; the BVP matrix size stays (2 + 4·msing)·N so the algorithmic
    change is bookkeeping, not new math (or at least I don't think so?).

  • Reverse-shear q-profile handling in sing_find! (pre-existing,
    separate code path from the multi-resonance issue above; not fixed by
    this PR). Question
    for review:
    can DCON or STRIDE type codes be run on a reverse shear equilibrium with multiple e.g. q = 2 crossings? If so, such an equilibrium where q
    crosses a rational value at two distinct ψ locations,
    the existing-surface check in sing_find! (Sing.jl:32) keys only
    on s.q, so the second crossing's (m, n) is pushed into the
    existing SingType's m/n vectors — silently losing the second ψ
    location. Two distinct singular surfaces collapse into one with
    multiplicity-2 m/n vectors.

Pre-merge checklist

  • Full test suite green (846/846) on macOS-aarch64 Julia 1.11
  • Per-line annotation pass on examples/LAR_*_scan/gpec.toml
  • Fitzpatrick TJ citation at every TJ-analytic definition + use site
  • Bit-identical ξ regression test on Solovev + DIIID-like
  • Pinned Δ' values regression test on Solovev + DIIID-like
  • TJ_epsilon_pole_example + its regression-harness case removed
  • No trailing whitespace, LF line endings, TOML/YAML syntax valid
  • No .h5 / figures/ / outputs/ artifacts committed
  • Regression harness sweep against develop (diiid_n1, solovev_n1,
    solovev_multi_n) — all diffs accounted for; see "Regression-harness
    sweep" section above. The ||resonant flux|| blow-up surfaced by
    the sweep has been fixed in 8073c126.
  • Multi-resonance Δ' BVP extension — separate follow-up PR
  • Reverse-shear sing_find! dedup fix — possible separate PR?

@logan-nc
Copy link
Copy Markdown
Collaborator Author

@claude review this PR. Be succinct. Do not summarize. Just point out any bugs or potential improvements needed prior to merging.

@logan-nc
Copy link
Copy Markdown
Collaborator Author

@d-burg thanks for getting this to the finish line! I am out of the office today but will give it a read through as soon as I can. I really appreciate the great effort here. This is a 🏆 feature.

@claude
Copy link
Copy Markdown

claude Bot commented May 15, 2026

Code Review — PR #178: Dual Riccati + Parallel FM integration + Δ' output

This PR implements two new integration paths and full Δ' computation. The physics is sound and the validation against Fortran STRIDE is well-documented. Issues below range from blocking to advisory.


Breaking default changes (most critical)

use_parallel = true is the new default. This is the single biggest concern. The parallel path:

  • Runs the Δ' STRIDE BVP matrix on every run via compute_delta_prime_matrix! — extra cost not reflected in the benchmark table.
  • With populate_dense_xi = true (also default), appends a full serial EL pass after the parallel BVP. The PR description benchmarks the Riccati path at 2.83× speedup, but the actual default runtime with use_parallel + populate_dense_xi is closer to 2× the serial path (parallel BVP ~1× + serial EL ~1×), which is a net regression, not a speedup. This should be measured and documented honestly before making it default.
  • Is thread-nondeterministic. The test comment in runtests_fullruns.jl explicitly says: "rtol is loose because this result is thread-count sensitive (drifts ~15% between single- and multi-threaded invocations)." A 15% energy variation based on OS scheduler decisions is a correctness concern, not just a performance note.

Other changed defaults that will silently alter existing user results without opt-in:

  • eulerlagrange_tolerance: 1e-7 → 1e-8 (tighter, but silent)
  • singfac_min: 0.0 → 1e-4 (required nonzero for parallel path, but changes standard path behavior too)
  • sing_order: 2 → 6 (more expensive asymptotic expansion everywhere)
  • parallel_threads: 1 → 2

Recommendation: Default use_parallel = false until the thread-nondeterminism is resolved and the performance accounting includes the populate_dense_xi overhead.


use_double64_bvp missing from docstring

use_double64_bvp::Bool = true is added to ForceFreeStatesControl but not listed in the struct's docstring. Every other field has a bullet; this one is silently absent.


Fullruns test tolerance is not a meaningful regression test

In runtests_fullruns.jl, the test uses rtol = 0.2 (±20%) on an expected value of -0.18. The comment says the result drifts ±15% by thread count, so the test can pass and fail on the same code depending on how many Julia threads are available. This is a compile check, not a regression guard. The thread-nondeterminism needs to be resolved before this test can have a meaningful tolerance.

The Δ' regression tests in runtests_parallel_integration.jl use rtol=0.05. For the Solovev case, some Δ' values have imaginary parts 40–500× larger than the real parts (e.g., -7.278138e+00 + 4.172681e+03im). The tests should either target the physically meaningful component or explain why the complex value is the right quantity to pin.


benchmark_integration_paths.jl mutates production example files

The benchmark writes a modified gpec.toml in place and restores it in a finally block. A Julia crash (OOM, segfault, SIGKILL) will leave the example's gpec.toml modified. The benchmark should copy the example to a mktempdir and run there, as benchmark_xi_parallel_vs_serial.jl already does.


diagnose_profiles.jl hard-codes a private git branch

The file calls git show perf/riccati-full-geqdsk-scans:examples/LAR_epsilon_scan/equilibria/$fname. This branch is not part of the main repo and will fail for all contributors who don't have it locally. Document the prerequisite clearly or make the file self-contained.


Docs files not wired into Documenter

docs/delta_prime_numerical_analysis.md and docs/stride_delta_prime_validation.md are placed in docs/ (not docs/src/) and are not added to docs/make.jl. They won't appear in the public documentation site. Move them to docs/src/ and add them to make.jl if they are permanent reference material.


Minor issues

  • balance_integration_chunks test: @test length(balanced) >= min(target_n, length(base_chunks) * 50) — in the common case this asserts >= target_n. The function produces at most target_n chunks; the test should use == or justify >=.

  • Edge-scan restore logic: After the diagnostic-only scan, the code saves/restores odet.psifac and odet.u, but findmax_dW_edge! also writes to odet.edge_scan. The comment should document all mutated state so future maintainers don't inadvertently add more.

  • Regression harness: The PR description compares develop @ e0ee9dff vs local, which conflates develop-side changes with PR changes. Per CLAUDE.md, the harness should compare develop vs the PR branch directly.

  • benchmark_xi_parallel_vs_serial.jl calls Pkg.activate inside the script, overriding the project environment when the script is already expected to run with --project=... Remove the Pkg.activate call.


What works well

The bidirectional integration approach for crossing chunks is well-motivated: the direction field cleanly tracks backward chunks, balance_integration_chunks preserves it correctly when splitting, and apply_propagator_inverse! provides the matching inverse. The edge-dW decoupling from integration truncation is a genuine correctness fix for Δ' at the outermost surface, and the opt-in truncate_at_dW_peak flag preserves backward compatibility without silent behavior changes. The docs/src/stability.md documentation is thorough and useful.

@logan-nc
Copy link
Copy Markdown
Collaborator Author

@d-burg I am ok with changing defaults, as we are still in the chaos of active dev.
But I think the rest of the claude review points are worth addressing. Especially the danger of the gpec.toml restore approach. Can you knock 'em out?

@d-burg
Copy link
Copy Markdown
Collaborator

d-burg commented May 15, 2026

@logan-nc on it right now! These were helpful catches

@d-burg
Copy link
Copy Markdown
Collaborator

d-burg commented May 19, 2026

Proposal: top-level [Run] section with three physics flags

Currently what GPEC computes is gated by a scattered set of low-level flags plus implicit conditions: two same-named force_termination flags in different structs, use_parallel implicitly unlocking the inter-surface Δ' matrix, and [PerturbedEquilibrium] section-presence as the PE on/off switch. Before final cleanup I'd like to add a brief user-facing layer:

[Run]
compute_ideal_stability       = true   # δW + per-surface Δ'
compute_deltaprime_matrix     = false  # Inter-surface Δ' matrix (STRIDE BVP, Glasser 2018)
compute_perturbed_equilibrium = false  # GPEC response, island widths, Chirikov, resonant flux

The driver auto-derives ode_flag, vac_flag, use_parallel, populate_dense_xi, and the PE entry point from these three. Explicit low-level flags still work as overrides.

Mode coverage:

Mode ideal_stability deltaprime_matrix perturbed_equilibrium
Equilibrium only F F F
Ideal stability + δW T F F
Δ' matrix (tearing study, no PE) T T F
GPEC response (no Δ' matrix) T F T
Full pipeline T T T

When deltaprime_matrix=true the parallel FM path is used, which leaves odet.u_store in (S, I) Riccati basis and ud_store ≈ 0 – unusable by PerturbedEquilibrium.FieldReconstruction, which expects axis-basis (EL basis) dense ξ. The dense-ξ pass (_populate_dense_xi_via_serial_el!) re-runs a serial Euler–Lagrange integration after the parallel BVP to repopulate ξ in axis basis. So whenever deltaprime_matrix=true AND perturbed_equilibrium=true, the driver must force populate_dense_xi=true – otherwise PE silently reads Riccati-basis garbage. The proposed auto-derivation does exactly this. With only one of the two set, populate_dense_xi stays off (no wasted serial pass for tearing-only studies; PE-only runs use the standard serial path and don't need it either). Kinetic damping stays orthogonal (continuous kinetic_factor).

What do you think @logan-nc @matt-pharr ?

@logan-nc
Copy link
Copy Markdown
Collaborator Author

@d-burg did you address the review issues? Please push so I can do a final pass

d-burg and others added 7 commits May 21, 2026 12:51
delta_prime_numerical_analysis.md and stride_delta_prime_validation.md are
internal development notes (numerical-sensitivity analysis and validation log
for the STRIDE Δ' BVP) — useful for our own reference but not appropriate for
the public docs site. Archived to ~/CTM-processing/GPEC_validation/ outside
the repo and removed from docs/.

Addresses @claude review feedback that flagged these files as "in docs/ but
not in docs/src/, not wired into Documenter, won't appear on the public docs
site."
Bundles four small @claude review responses with no behavioural impact on the
main pipeline:

1. **use_double64_bvp docstring entry.** Field exists in ForceFreeStatesControl
   (default true, plumbed through to compute_delta_prime_matrix! in Riccati.jl)
   but the struct docstring's ## Fields list omitted it. Add a bullet
   describing what the flag controls (Double64-precision Δ' BVP solve to
   preserve significance through the PEST3 cancellation), its scope (only with
   use_parallel = true), and its cost (~1.5–2× the BVP solve).

2. **balance_integration_chunks test tightened to ==.** The function exits its
   while loop when length(result) >= target_n and adds exactly one chunk per
   iteration, so under normal conditions length(balanced) is exactly target_n.
   The previous `>= min(target_n, length(base_chunks) * 50)` was correct but
   sloppy. Also fix the test's target_n formula to mirror the function — the
   test was missing the min_bvp_intervals term, so the previous `>=` would
   have failed silently if the assertion were ever tightened.

3. **Edge-scan save/restore comment.** Clarify that findmax_dW_edge! also
   (re)allocates odet.edge_scan, which is the diagnostic product and is
   intentionally NOT restored alongside psifac/u. Helps future maintainers
   understand which state is restored and which is intentionally produced.

4. **Drop Pkg.activate from benchmark_xi_parallel_vs_serial.jl.** The script
   is documented to run with `julia --project=..`, so the in-script activate
   was redundant and could mask environment issues.

All 127 runtests_parallel_integration.jl tests pass.
…was historical)

The runtests_fullruns.jl kinetic multi-n test was widened to `rtol = 0.2` on
expected `et[1] ≈ -0.18` because of an observed ~15% drift between thread
counts. That drift is no longer present: a sweep on this exact case across

  julia_nthreads ∈ {1, 2, 4}
  parallel_threads ∈ {1, 2, 4} (capped by julia_nthreads)
  use_parallel ∈ {true, false}

produces `et_re = -0.193593591803846` bit-identical to 15 decimal digits in
every one of the 9 configurations. The drift was almost certainly removed
by commit 5d5b8ee (edge-dW silent psilim truncation decoupling): pre-fix,
the dW peak's thread-sensitive sampling silently moved the integration limit,
which fed back into the kinetic eigenvalue. Post-fix, psilim is fixed by
qhigh/psihigh regardless of dW peak, and the result settles deterministically.

Test now pins `et[1] ≈ -0.193593591803846 rtol = 1e-6`, with a comment
explaining the determinism and the historical context. The old expected
value (-0.18) was a guess; the new one is the actual bit-deterministic answer.

Addresses @claude review feedback on PR 178: "rtol=0.2 is not a meaningful
regression test — passes and fails on the same code depending on thread
count."
…default flips

Bundles three coupled changes responding to @claude review feedback on flag
surfacing, plus the test re-pin needed to keep regression coverage intact:

1. **Remove `use_double64_bvp` flag, hardcode `Complex{Double64}`** in
   `compute_delta_prime_matrix!`. Parameter sensitivity study had already
   confirmed F64 vs Double64 makes no measurable difference on the validation
   cases (precision bottleneck is upstream of the BVP linear algebra), but
   Double64 is the conservative choice for the catastrophic PEST3 cancellation
   at low ε/β. Cost is ~1.5–2× the BVP solve, which is a small fraction of
   total Δ' wall-clock. Removing the knob simplifies the API without losing
   the safer behavior.

2. **Flip `set_psilim_via_dmlim` default false → true.** Fortran STRIDE found
   that truncating ~20 % above the outermost rational (`dmlim = 0.2`) avoids
   a numerical kink instability in δW that appears when the integration ends
   too close to or just below a rational surface. For diverted equilibria
   (q → ∞ at the separatrix — bulk of production use) this costs negligible
   physical domain because rationals get arbitrarily dense near the LCFS, so
   `true` is the safe and recommended default. For limited circular /
   analytical equilibria (Solovev, LAR scans) rationals are sparse and 20 %
   above the last rational chops too much edge — those examples now set
   `set_psilim_via_dmlim = false` explicitly. Updated docstring with the full
   physics + when-to-use guidance.

3. **`sing_lim!` skip-with-warning on multi-n with `set_psilim_via_dmlim = true`.**
   The dmlim truncation is ambiguous when n varies (which n defines "outermost
   rational + dmlim/n"?), but the previous behavior was a hard `error()` that
   would crash any multi-n run if the user forgot to override the new default.
   `sing_lim!` now warns and falls back to qhigh/psihigh truncation so
   production users running multi-n on diverted geqdsks don't need to
   remember to override the default.

4. **Surface all Δ' BVP / parallel flags explicitly in 10 example/test TOMLs.**
   `use_parallel`, `parallel_threads`, `populate_dense_xi`, `truncate_at_dW_peak`,
   `set_psilim_via_dmlim`, `dmlim` are now explicit (not commented) in every
   `gpec.toml`. DIIID-like sets `set_psilim_via_dmlim = true` (diverted
   production); all 9 Solovev/LAR/multi-n cases set it to `false` with an
   annotation explaining the limited-vs-multi-n reason.

5. **Re-pin DIIID-like Δ' regression values in `runtests_parallel_integration.jl`.**
   With `set_psilim_via_dmlim = true` on DIIID-like, `et_par` shifted +24 %
   (1.29 → 1.5988) and `dpm[5,5]` shifted −6.4 % (only `et_par` and
   `dpm[5,5]` fell outside the existing rtol = 5 %; other `dpm[i,i]` values
   drifted 0.4–1.2 %, within tolerance). Per-surface `sing[*].delta_prime[1]`
   are computed up to each rational and barely moved (≲ 1e-4 %), confirming
   the per-surface calculation is robust to edge-truncation choice. Re-pinned
   all values to current measurements with comments explaining the shifts.

**Regression-harness expectation:** `diiid_n1` baselines will shift on this
PR — intentional, reflecting the new production-correct DIIID-like
configuration. `solovev_n1` and `solovev_multi_n` stay unchanged (those
examples explicitly set `set_psilim_via_dmlim = false`).

All 9/9 `runtests_fullruns.jl`, 24/24 `runtests_riccati.jl`, and 127/127
`runtests_parallel_integration.jl` pass.
…surface Δ' (stub); BVP matrix is canonical

The per-surface Δ' computed in `riccati_cross_ideal_singular_surf!` from
(ca_r − ca_l) at each crossing is a stub calculation that doesn't agree with
the canonical STRIDE BVP Δ' matrix from `compute_delta_prime_matrix!`. It's
retained in the code (`intr.sing[*].delta_prime` / `delta_prime_col` fields)
for diagnostic / future-work use, but no longer reported, output, or regression
tested on any actual equilibrium. The BVP matrix diagonal is now the canonical
Δ' everywhere downstream.

**Solovev wall reverted to close conformal a=0.2415.** The earlier nowall change
(prior commits) made the Solovev fixture strongly kink-unstable (et[1] = -6.8)
because this equilibrium (q₀=1.9, e=1.6) is intrinsically free-boundary kink
unstable without wall stabilization. With the close conformal wall it's
marginally stable (et[1] = +0.24). Probe sweep over q₀ ∈ [1.1, 3.0] and shape
e ∈ [1.0, 2.0] found no Solovev configuration that's both stable AND
multi-resonance AND clean-BVP-Δ' — the family is fundamentally too
kink-prone. Documented in the TOML comment so future contributors don't
re-derive this finding.

**Per-surface Δ' regression tests dropped:**
- `runtests_parallel_integration.jl`: 7 per-surface assertions (Solovev sing[1-2],
  DIIID-like sing[1-5]) plus the entire Solovev BVP Δ' matrix testset (pinned
  values near marginal stability, ~10⁵-10¹¹ magnitudes with |Im/Re| ≫ 1).
- `runtests_riccati.jl`: entire `Δ' computed by Riccati path — Solovev regression`
  testset (10 assertions).

The DIIID-like BVP Δ' regression testset stays — that fixture is
intrinsically stable (et[1] = +1.6) so the BVP matrix is well-conditioned and
meaningful. Net test counts: parallel-integration 127 → 106, riccati 24 → 14.

**HDF5 outputs cleaned up:**
- Drop `singular/delta_prime` (FFS per-surface stub).
- Drop `singular/delta_prime_col` (FFS per-surface column stub).
- Drop `perturbed_equilibrium/singular_coupling/delta_prime` (PE redundant with
  the canonical BVP value).

Only `singular/delta_prime_matrix` (the STRIDE BVP) carries Δ' through HDF5.

**`PerturbedEquilibrium.SingularCoupling`** now reads `ffs_intr.delta_prime_matrix`
diagonal into `state.delta_prime` instead of computing the stub from
(rbwp1 − lbwp1) / (2π·χ'). Falls back to NaN when the BVP matrix isn't
populated (kinetic_factor > 0, multi-resonance multi-n). `lbwp1` and `rbwp1`
are still used for the resonant current calculation (which is a different
physical quantity — field-derivative jump weighted by current density, not Δ').

**`Analysis.plot_driven_delta_prime`** rewired to read `singular/delta_prime_matrix`
diagonal — the canonical Δ' — instead of the PE stub field that no longer
exists in HDF5.

**Regression harness** `diiid_n1.toml`: the `[quantities.delta_prime]` track now
reads `singular/delta_prime_matrix` via a new `diagonal_complex` extractor
(small extractor.jl extension). Was previously reading the PE stub value;
now tracks the canonical BVP diagonal. Values will shift on this PR
(intentional — the new track is physically meaningful).

**Per-surface stub kept in code** with a prominent comment in
`riccati_cross_ideal_singular_surf!` explaining that the calculation lives on
for future work but should not be relied on for physics, output, or regression.

Tests: parallel_integration 106/106 ✓, riccati 14/14 ✓, fullruns 9/9 ✓.
…1, V4)

- H1: Move Random to [extras]/[targets].test; DoubleFloats opt-in via
  ctrl.extended_precision_bvp (default true; Float64 drifts imag Δ' 2-5x on DIIID)
- H2: Delete dead integrate_backward_chunk_fms; clarify riccati_der! and
  compute_delta_prime_from_ca! as reference/stub-only; mark per-surface
  delta_prime/delta_prime_col on SingType as stubs (BVP matrix is canonical)
- H3: Decompose compute_delta_prime_matrix! (540 to 63 LOC + 11 helpers),
  parallel_eulerlagrange_integration (281 to 36 LOC + 7 helpers),
  riccati_cross_ideal_singular_surf! (122 to 20 LOC + 6 helpers). Bit-identical.
- H4: @info to @debug for heavy per-crossing vmat/asymptotic diagnostics
- H5: Guard FM-axis-BC fallback against direction=-1 crossing chunks
- D1: Inline equation citations (Eq. 19, 29, 31, 33, 37 + STRIDE sing_vmat)
- D2: Stale Tsit5/5th-order docstrings to Vern9/9th-order
- D3: Name SAVE_NEAR_END_FRAC, SAVE_NEAR_END_PSI, ODE_COST_AXIS/RAT/EDGE;
  document ucrit=1e4 rationale
- P1: Auto-skip populate_dense_xi serial-EL pass when force_termination=true
- V1: Tighten runtests_riccati.jl Solovev rtol 1e-2 to 1e-4 (PR claims 0.006%)
- V4: Split delta_prime_matrix rtol by entry magnitude (small entries 1e-2;
  large-magnitude FP-sensitive entries bracket |dpm|)
- Fix sing_lim! NaN qlim when nn_low <= 0 (guard dmlim branch)
- Platform-tolerance brackets on et[1] tests (Apple/Linux FP drift ~20%)

Full Pkg.test() suite passes on Apple aarch64 / Julia 1.11.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	Project.toml
#	src/Analysis/PerturbedEquilibrium.jl
#	src/Equilibrium/EquilibriumTypes.jl
#	src/ForceFreeStates/EulerLagrange.jl
#	src/GeneralizedPerturbedEquilibrium.jl
#	src/PerturbedEquilibrium/SingularCoupling.jl
#	src/PerturbedEquilibrium/Utils.jl
@d-burg
Copy link
Copy Markdown
Collaborator

d-burg commented May 22, 2026

Two new commits address the latest pre-merge audit and bring the branch up to date with develop.

a89bd5d2

  • Project.toml: Random moved to [extras]/[targets].test; DoubleFloats opt-in via ctrl.extended_precision_bvp (default true — Float64 drifts imaginary Δ' by 2–5× on DIIID).
  • Stubs: SingType.delta_prime / delta_prime_col and compute_delta_prime_from_ca! explicitly annotated as stubs; canonical Δ' is ForceFreeStatesInternal.delta_prime_matrix (STRIDE BVP).
  • Decompositions: compute_delta_prime_matrix! 540 → 63 LOC (+11 helpers), parallel_eulerlagrange_integration 281 → 36 LOC (+7), riccati_cross_ideal_singular_surf! 122 → 20 LOC (+6). All bit-identical.
  • Logging: Per-crossing vmat/asymptotic dumps demoted from @info to @debug.
  • Correctness: Assertion guards FM-axis BVP fallback against direction=-1 crossing chunks.
  • Inline citations: Glasser 2018 Eqs. 19 / 29 / 31 / 33 / 37; STRIDE sing_vmat reference in Sing.jl.
  • Naming: SAVE_NEAR_END_FRAC, SAVE_NEAR_END_PSI, ODE_COST_AXIS/RAT/EDGE constants; ucrit=1e4 rationale documented.
  • Performance: Auto-skip the populate_dense_xi serial-EL pass when force_termination=true (eliminates ~1× wasted runtime in non-PE runs).
  • Tests: Tightened Riccati Solovev rtol 1e-2 → 1e-4 (matches PR's 0.006% claim). Δ' matrix tolerances split by entry magnitude / |Im|/|Re| ratio (V4). Platform-tolerance brackets on FP-sensitive et[1] values. Fixed sing_lim! NaN when nn_low ≤ 0.

ace5e21c — Merge origin/develop

7 conflict files resolved:

  • Project.toml — kept julia=1.11 (develop) + [extras] (ours).
  • EquilibriumTypes.jl, GeneralizedPerturbedEquilibrium.jl, EulerLagrange.jl — preserved both sides.
  • SingularCoupling.jl, Analysis/PerturbedEquilibrium.jl, PerturbedEquilibrium/Utils.jl — took develop's PE refactor in full; clarified that ForceFreeStates' BVP delta_prime_matrix (equilibrium-intrinsic) and PE's state.delta_prime (forcing-driven) are distinct physical quantities sharing only a symbol — both now written independently to HDF5.

Three latent runtime crashes from auto-merge gaps also patched: missing use_galgrid struct field; missing bt_sign arg in DirectRunInput calls (AnalyticEquilibrium.jl, ReadEquilibrium.jl IMAS reader); stale state.coupling_coefficient / state.resonant_amplitude HDF5 writes (fields removed by develop's refactor).

Verification

  • Pkg.test() passes the full suite on the merged tree (Apple aarch64 / Julia 1.11).
  • Local regression-harness runs cleanly on all three standard cases (diiid_n1, solovev_n1, solovev_multi_n); develop-side comparison currently blocked by a harness/Pkg.instantiate FastInterpolations-BC issue independent of this PR.

…B3 guard, H1-H3, populate_dense_xi default flip)

Bundle of small, audit-driven fixes ahead of merging perf/riccati. No
numerical changes on tested platforms; all 19 testsets pass post-fix.

**B1 — Per-thread `ffit_hint` in `sing_der!` hot path.**  Replaces 21
calls in `sing_der!` (kinetic + ideal paths) of the form
`hint=ffit._hint` (shared `Ref{Int}` mutated by every worker thread) with
`hint=odet.ffit_hint`, where `odet` is already cloned per thread via
`odet_proxies` in the parallel BVP path.  Adds matching
`odet_proxy.ffit_hint[] = 1` resets next to the existing
`spline_hint[] = 1` resets at all four proxy-setup sites in `Riccati.jl`.
Defensive: M1 Max reproducer showed 19 runs (t ∈ {1,4,8}, parallel_threads
∈ {2,8}) bit-identical to `-0.193593591803846` with the shared `Ref`
in place, because `FastInterpolations.RefHint` is stale-tolerant — the
race exists in source but does not produce numerical drift on tested
platforms.  Fix removes the only remaining theoretical race on the
parallel path and completes the per-thread isolation pattern.
`compute_sing_asymptotics`, `_log_bvp_pest3`, and test/benchmark code
keep `ffit._hint` (all serial setup or debug).

**B3 — `assemble_fm_matrix` size inference.**  Determine `N` from
`T_init` if provided, else from `propagators[first(idx_range)]` when the
range is non-empty, else assert and fall back.  Empty-range guard still
fires; the change makes the size-inference robust against an empty
`propagators` list (degenerate msing=0 chunking).

**H1 — `parallel_threads` honored in `balance_integration_chunks`.**
Uses `min(Threads.nthreads(), ctrl.parallel_threads)` instead of
`Threads.nthreads()` when computing sub-chunk target count.  A user on
`julia -t 16` setting `parallel_threads=2` for determinism no longer
pays for 8× the requested sub-chunk count.

**H2 — Drop re-introduced Fortran line citations in `Sing.jl`.**
Removes `[Fortran sing.f lines XXXX]` annotations on lines 838-840, 862,
870 (reintroduced via the kinetic merge after commit b9c177e explicitly
removed them).  Logan 2015 App. C eq. refs already on line 837 carry the
provenance.

**H3 — Compress historical-narration block in HDF5 writer.**
`GeneralizedPerturbedEquilibrium.jl:534-540` (7-line block explaining
what was previously emitted) → 1-line pointer to the `SingType.delta_prime`
docstring.  Aligns with CLAUDE.md "Keep code comments concise" rule.

**Default flip: `populate_dense_xi` true → false** (`ForceFreeStatesStructs.jl:289`).
Motivation is *clarity of intent* (set this flag only if PerturbedEquilibrium
will consume dense axis-basis ξ), not the "75 % regression rescue" framing
floated in the audit.  The audit estimate was extrapolated from a small-N
(DIIID N=26 force_termination=true) benchmark setup; on full-scale
user-facing examples the dense-xi serial-EL re-run costs only ~1× the
*parallel BVP* wall-clock (not ~1× standalone serial EL).  Empirically on
`examples/Solovev_ideal_example` (delta_m=8 → mpert ~25):
  - use_parallel=true + populate=true : ~97 s
  - use_parallel=false               : ~494 s
The parallel BVP path wins by ~5× on this configuration even with the
dense-xi pass enabled; flipping use_parallel=false to "skip the wasted
re-run" is a measurement-grade regression on real configs.  The default
flip therefore stands as a UI clarification: PE-using TOMLs explicitly
opt into `populate_dense_xi = true`, non-PE TOMLs leave it default false
(saving ~10–30 % parallel-BVP-wall-clock, not 75 %).  Example TOMLs
updated accordingly:
  - 2 PE examples (Solovev_ideal_example, DIIID-like_ideal_example):
    explicit `populate_dense_xi = true` with strengthened comment
    explaining the requirement.
  - 4 non-PE examples (LAR_beta_scan, LAR_epsilon_scan,
    Solovev_ideal_example_3D, Solovev_ideal_example_multi_n): flip to
    `populate_dense_xi = false`.  All four keep `use_parallel = true`
    because the parallel BVP is faster than serial EL on large grids
    regardless of populate setting; the Solovev multi-n and 3D examples
    pick up explicit comments documenting the empirical speedup.
  - 4 test fixtures in `test/test_data/` intentionally untouched to
    preserve their pinned et[1] regression values.

**Tightened kinetic multi-n rtol.**  `runtests_fullruns.jl`: replaces
the decade-wide bracket (`-0.30 < et < -0.10`) with a tight pin
`isapprox(et, -0.193593591803846; rtol=1e-3)`.  Justified by the M1 Max
bit-identity measurement (19 runs across thread sweeps); the prior
"Apple silicon drifts ~20 %" warning in the test comment does not
reproduce on the current code path.  1e-3 catches real regressions in
the kinetic / edge-dW path while tolerating cross-platform / BLAS drift.
@d-burg
Copy link
Copy Markdown
Collaborator

d-burg commented May 26, 2026

Pre-merge audit cleanup — 4fbd882
Five small fixes addressing the most recent audit, all green on 19 testsets.

B1 — Per-thread ffit_hint in sing_der!. 21 hot-path callsites converted from shared ffit._hint::Ref{Int} to odet.ffit_hint (per-thread via odet_proxies). Honest framing: I built a reproducer on Apple M1 Max — 19 runs across julia_nthreads ∈ {1,4,8} × parallel_threads ∈ {2,8} were all bit-identical to -0.193593591803846. The race exists in source but FastInterpolations.RefHint is stale-tolerant so it doesn't produce drift in practice. Fix is defensive (completes the per-thread isolation pattern, removes a future-flake source if FastInterpolations' hint semantics change), not a drift fix.

B3 — assemble_fm_matrix size inference. Empty-range guard ordering fixed; size now inferred from T_init or propagators[first(idx_range)] instead of unconditionally dereferencing propagators[1].

H1 — parallel_threads honored in balance_integration_chunks. Uses min(nthreads, ctrl.parallel_threads); a julia -t 16 user setting parallel_threads=2 no longer pays for 8× the requested sub-chunks.

H2 / H3 — Comment cleanup. Drops re-introduced [Fortran sing.f lines XXXX] brackets in Sing.jl (Logan 2015 App. C eq. refs already carry the provenance); compresses the 7-line historical-narration block in the HDF5 writer to one line per CLAUDE.md.

populate_dense_xi default flip (true → false) — clarity, not perf rescue. The audit's "75 % regression" framing was extrapolated from a small-N benchmark and doesn't generalize. Empirical on examples/Solovev_ideal_example (delta_m=8): parallel+populate ≈ 97 s, serial only ≈ 494 s — parallel wins by 5× even with populate=true. The flip stands as a UI clarification (PE-using TOMLs opt in explicitly; non-PE TOMLs save ~10-30 % parallel-BVP wall-clock). 2 PE examples set populate_dense_xi = true explicitly with a strengthened comment; 4 non-PE examples flip to false; 4 test fixtures untouched to preserve pinned et[1] values.

Tightened kinetic multi-n rtol. runtests_fullruns.jl: decade-wide bracket → isapprox(et, -0.193593591803846; rtol=1e-3). The "Apple silicon drifts ~20 %" warning in the old comment doesn't reproduce on the current code path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants