ForceFreeStates: Dual Riccati + Parallel FM integration + Δ' output by logan-nc · Pull Request #178 · OpenFUSIONToolkit/GPEC

logan-nc · 2026-02-28T10:15:13Z

Summary

Implements two new integration paths for the Euler-Lagrange ODE and adds full Δ' (tearing stability parameter) computation to the Riccati and parallel FM paths.

Dual Riccati integration (use_riccati = true): Sequential reformulation as a matrix Riccati ODE S = U₁·U₂⁻¹ with periodic renormalization to maintain bounded (U₁, U₂). Validated on Solovev (0.006% energy error) and DIIID n=1 (0.12% error). Faster than standard for large N.
Parallel FM integration (use_parallel = true): Each integration chunk is solved independently from identity ICs using Threads.@threads, then assembled serially. Uses bidirectional integration — crossing chunks (near rational surfaces) are integrated backward to keep FM propagators well-conditioned. The well-conditioned backward FM inverse is applied via LU solve during serial assembly. Fixes a ~10% energy error for DIIID N=26 that existed with the all-forward approach.
Per-surface Δ': Both paths compute intr.sing[s].delta_prime (scalar) and delta_prime_col (full N-vector of off-diagonal coupling). Written to singular/delta_prime and singular/delta_prime_col in HDF5.
Inter-surface Δ' matrix (parallel FM only): intr.delta_prime_matrix (2·msing × 2·msing) via the STRIDE global BVP [Glasser 2018 Phys. Plasmas 25, 032501]. Written to singular/delta_prime_matrix in HDF5.
Edge-dW scan decoupled from integration truncation (new, commit 5d5b8ee): The edge-dW scan over ψ ∈ [psiedge, psilim] is now diagnostic-only by default. It no longer silently moves psilim/qlim/u to the dW peak, which was corrupting Δ' of the outermost rational by tens of percent and making the answer depend on a subtle, hidden heuristic. Legacy behaviour is preserved behind an opt-in truncate_at_dW_peak = true flag (marked in the docstring as "Δ'/δW unreliable in this mode"). psihigh / qhigh / dmlim are now the only controls over the integration domain. Validated against Fortran STRIDE on 42+56-point TJ β and ε scans: Δ'(2/1), Δ'(3/1), δWp, δWv, δWt all match within ~1% (see LAR_beta_scan and LAR_epsilon_scan comparison plots on the companion perf/riccati-full-geqdsk-scans branch).

Accuracy

Example	Path	et[1]	Error vs std
Solovev N=8	standard	16.56600	—
Solovev N=8	riccati	16.56691	0.006%
Solovev N=8	parallel	16.56691	0.006%
DIIID N=26	standard	1.21237	—
DIIID N=26	riccati	1.21386	0.123%
DIIID N=26	parallel	1.21386	0.123%

The ~0.12% gap on DIIID is algorithmic (different crossing convention: Riccati-style vs Gaussian Reduction), not ODE tolerance. Both paths are physically correct; 0.12% is well within physical uncertainty.

Performance (4 threads)

Example	Path	Time	Speedup
Solovev N=8	standard	0.25s	1.00×
Solovev N=8	riccati	0.09s	2.83×
Solovev N=8	parallel	0.14s	1.80×
DIIID N=26	standard	1.17s	1.00×
DIIID N=26	riccati	0.91s	1.29×
DIIID N=26	parallel	0.90s	1.30×

Tests

Suite	Pass	Total
`runtests_riccati.jl`	23	23
`runtests_parallel_integration.jl`	56	56
`runtests_fullruns.jl`	9	9

Riccati tests include unit tests for: renormalize_riccati_inplace!, renormalize_riccati!, Riccati end state U₂≈I, Δ' regression values, delta_prime_col shape/diagonal consistency, riccati_der! formula (Glasser 2018 Eq. 19), and compute_delta_prime_from_ca! bit-identical consistency.

Parallel tests include: apply_propagator! identity and linearity, apply_propagator_inverse! round-trip, chunk balance target count, direction field propagation through bidirectional=true, Solovev energy accuracy, DIIID energy accuracy (key bidirectional fix regression), ode_itime_cost additivity, and Δ' matrix regression for both Solovev and DIIID.

runtests_fullruns.jl updated: regression_solovev_kinetic_multi_n/et[1] expected from −0.01248 → −0.19359 with inline comment. The old value reflected the truncated-integration behaviour from the edge-dW heuristic; the new value reflects the full-domain answer controlled solely by psihigh/qhigh.

Breaking behaviour change

Edge-dW scan is no longer a silent psilim truncation. Users who relied on psiedge < psilim to shrink the integration domain must now either:

Set psihigh or qhigh (or dmlim) explicitly to the desired outer boundary (recommended — the whole point of this change is that the edge is user-controlled and visible), or
Add truncate_at_dW_peak = true to their [ForceFreeStates] TOML section to restore the legacy heuristic (flagged as unreliable for Δ' and δW; preserved for experimental work on improved edge-mode filters).

The edge_scan/ HDF5 group is still written with psi, q, plasma_energy, vacuum_energy, total_energy, vacuum_eigenvalue for whatever post-processing previously used it as a diagnostic.

Regression harness snapshot (develop @ `e0ee9df` vs perf/riccati @ `685a92a`)

Run via regress --cases diiid_n1,solovev_n1,solovev_multi_n,tj_epsilon_pole --refs develop,local --force.

solovev_n1: 16 CHANGED, 5 unchanged
- et[1]: −0.2646 → +0.0999 (sign flip — driven by the edge-dW decoupling plus pre-existing Riccati/parallel-FM work on this branch)
- vacuum matrix min eigenvalue: 1.905 → 2.284 (new HDF5 quantity)
- ODE steps (saved): 607 → 273 (55 % fewer due to the psihigh refactor on develop between 04-09 and 04-12, unrelated to this PR's edits)
diiid_n1: 21 CHANGED, 6 unchanged
- Leading-eigenvalue shifts track the energy-principle refactor
- Large shifts in delta prime, island half-widths, Chirikov parameter, ||resonant flux|| are pre-existing on develop (||resonant flux|| went 452 → 3.26 × 10⁶ between develop @ 04-09 and develop @ 04-12 already — see ticket below) and not caused by this PR. The in-PR perturbed-equilibrium changes are expected to re-baseline these; a separate investigation should confirm the DIIID singular-coupling numbers are physically sane.
solovev_multi_n: 11 CHANGED, 4 unchanged — qualitatively the same pattern as solovev_n1
tj_epsilon_pole: new case on this branch, not present on develop

Files changed

File	Change
`src/ForceFreeStates/ForceFreeStatesStructs.jl`	`IntegrationChunk.direction`, `SingType.delta_prime_col`, `ForceFreeStatesInternal.delta_prime_matrix`; new `truncate_at_dW_peak::Bool = false` control with docstring warning
`src/ForceFreeStates/EulerLagrange.jl`	`chunk_el_integration_bounds(bidirectional=false)`; `balance_integration_chunks` direction propagation; edge-dW scan split into diagnostic (default) vs legacy-truncation paths
`src/ForceFreeStates/Riccati.jl`	Full Riccati path; `integrate_propagator_chunk!` bidirectional tspan; `apply_propagator_inverse!`; bidirectional serial assembly; `compute_delta_prime_matrix!` with Phi_L/Phi_R BVP; `assemble_fm_matrix` empty-range guard; single-resonance assertion; edge-dW decoupling at both the sequential Riccati and parallel FM exits
`src/GeneralizedPerturbedEquilibrium.jl`	HDF5 output for `singular/delta_prime`, `delta_prime_col`, `delta_prime_matrix`; `edge_scan/` diagnostic group preserved
`test/runtests_riccati.jl`	23 tests, shared setup, `riccati_der!` formula test, `compute_delta_prime_from_ca!` test
`test/runtests_parallel_integration.jl`	56 tests including `apply_propagator_inverse!`, direction field, DIIID Δ' matrix
`test/runtests_fullruns.jl`	Solovev kinetic multi-n `et[1]` expected value updated to reflect full-domain integration
`examples/LAR_{beta,epsilon}_scan/gpec.toml`	Drop stale `kin_flag`/`con_flag` keys (removed from the control struct during the develop merge)
`examples/TJ_epsilon_pole_example/gpec.toml`	Same stale-flag cleanup
`benchmarks/benchmark_threads.jl`	Thread-scaling benchmark
`benchmarks/benchmark_riccati_der.jl`	`riccati_der!` sanity check
`benchmarks/benchmark_delta_prime_methods.jl`	`compute_delta_prime_from_ca!` sanity check

Code review responses

Fixed: assemble_fm_matrix empty-range guard · single-resonance assertion in compute_delta_prime_matrix! · stale comment in parallel tests · dead parallel_threads field removed · stale kin_flag/con_flag in LAR/TJ example TOMLs (would crash at runtime) · edge-dW silent psilim truncation (was corrupting Δ' by >50 % at some ε-scan points)

Deferred: apply_propagator! per-call allocations · test boilerplate in runtests_parallel_integration.jl · DIIID perturbed-equilibrium coupling-metric magnitudes (pre-existing on develop, separate investigation)

🤖 Generated with Claude Code

…(1.6x speedup on Solovev) Implements the dual Riccati matrix S = U₁·U₂⁻¹ as a faster alternative to the standard Euler-Lagrange ODE integration. Enable with `use_riccati = true` in jpec.toml. Integration strategy: uses `sing_der!` (same ODE RHS as standard) with periodic Riccati renormalization S = U₁·U₂⁻¹, U₂ = I in the callback when column norms exceed ucrit. This is mathematically equivalent to the explicit Riccati ODE (dS/dψ = B + A·S - S·D - S·C·S) but numerically stable: the explicit Riccati ODE has quadratic blowup for explicit solvers when K̄·S >> Q, while sing_der! + renorm tracks the bounded ratio S = U₁/U₂. The Riccati crossing (`riccati_cross_ideal_singular_surf!`) skips Gaussian reduction (which can produce NaN/Inf when S is near-zero near the axis) and uses `ipert_res` directly. Benchmarks on Solovev example (N=8, 1 singular surface): Standard ODE: 83.7 ms, 157 steps Riccati ODE: 51.4 ms, 121 steps (1.63x speedup, 0.006% energy difference) See: Glasser (2018) Phys. Plasmas 25, 032507 — Eq. 19 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@threads

## Part 1: Δ' output (tearing stability parameter) - Add `delta_prime::Vector{ComplexF64}` to `SingType` - Add `compute_delta_prime_from_ca!` in EulerLagrange.jl, called at end of `eulerlagrange_integration` (standard path only — see normalization note below) - Write `singular/delta_prime` as (msing × n_modes) ComplexF64 to HDF5 output in JPEC.jl - Riccati path does NOT compute delta_prime: ca_l is accumulated in (S,I) normalization which is inconsistent with the Δ' formula (standard (U1,U2) normalization required) ## Part 2: Parallel Fundamental Matrix (FM) integration - Add `ChunkPropagator` struct (two N×N×2 blocks for identity-block ICs) in Structs - Add `use_parallel::Bool = false` control flag in ForceFreeStatesControl - Add `integrate_propagator_chunk!` — integrates each chunk from IC=(I,0) and IC=(0,I) independently using BS5 solver, no callback; suitable for Threads.@threads - Add `apply_propagator!` — in-place 2×2 block matrix multiply on odet.u - Add `balance_integration_chunks` — sub-divides chunks using ode_itime_cost for load-balanced parallel work; target = max(2*msing+3, 4*nthreads) - Add `ode_itime_cost` — log-divergent cost model from STRIDE (Glasser 2018) - Add `parallel_eulerlagrange_integration` — parallel phase with Threads.@threads, serial assembly calling renormalize_riccati_inplace! before each crossing (needed because apply_propagator! gives general (U1,U2) state but riccati crossing expects (S,I) form); uses ipert_res-direct zeroing to correctly identify the resonant column - Dispatch from eulerlagrange_integration: use_parallel → use_riccati → standard ## Tests (29 total: 11 Riccati + 18 Parallel FM) - runtests_riccati.jl: update Δ' test — only standard path populates delta_prime - runtests_parallel_integration.jl (new): ChunkPropagator identity/linearity, balance_integration_chunks count/coverage/crossings, ode_itime_cost additivity, parallel FM energy match (rtol=2%, Solovev) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rallel tests to suite Δ' is now computed inline in riccati_cross_ideal_singular_surf! using the diagonal formula on the bounded (U₁, U₂) state (max ≤ ucrit, no GR permutation). This gives physically correct values: 57.3 and -4.03 for the two Solovev singular surfaces. The standard path does not populate delta_prime — Gaussian Reduction inflates the resonant column's asymptotic coefficients, making ca_l non-physical regardless of when it is computed. A comment in cross_ideal_singular_surf! explains the limitation. Also adds runtests_riccati.jl and runtests_parallel_integration.jl to the default test suite (runtests.jl). Both were previously excluded. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… lacks Δ' The comment in cross_ideal_singular_surf! previously said the issue was GR "normalization inflation." The real reason is more subtle: Δ' is a complex, normalization-convention-dependent quantity. The Riccati renormalization (U₂→I) continuously phases solution columns into a specific gauge where the diagonal formula (ca_r - ca_l)/denom gives physically meaningful values. The standard path's solution columns grow from the axis with an arbitrary complex phase; dividing by the outer asymptotic coefficient normalizes the magnitude but not the complex phase, producing a value in a different convention that does not match what SingularCoupling.jl expects. Also reverts the failed attempt to compute Δ' in cross_ideal_singular_surf! via perm_col + A_outer normalization, which produced -0.10-0.54i vs the Riccati 57.3+58.3i (same physical quantity, incompatible conventions). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…large-N documentation 1. Add SingType.delta_prime_col (N × n_res_modes Matrix) storing the full column (ca_r[:, ipert_res, 2] - ca_l[:, ipert_res, 2]) / (4π²·psio) at each crossing. The diagonal element matches delta_prime[i] exactly. Off-diagonal elements give intra-surface coupling of all N modes to each resonant mode through the singular layer asymptotic expansion. Only populated for Riccati/parallel FM paths. 2. Add singular/m, singular/n, singular/delta_prime_col HDF5 outputs so downstream users can access the full off-diagonal Δ' without needing to index ca_left/ca_right. 3. Document the known numerical limitation of the parallel FM path for large N: FM propagators become ill-conditioned for N ≳ 20 without QR orthogonalization, causing ~10% energy error for DIIID (N=26) with no wall-clock speedup over Riccati. Deferred fix: bidirectional integration or continuous QR (noted in docstring/tests). 4. Update outer-plasma Riccati re-integration (already committed) docstring to match. Tests: 50/50 Riccati+parallel, 84/84 EulerLagrange all pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…trix Implements the STRIDE global boundary value problem for computing the full 2·msing × 2·msing inter-surface tearing stability matrix. Each entry gives the U₂[ipert_res] response amplitude at one surface boundary when driving with unit amplitude at another, encoding cross-surface coupling. Changes: - Riccati.jl: add assemble_fm_matrix (chunk FM product) and compute_delta_prime_matrix! (BVP assembly + solve via STRIDE formulation from Glasser 2018 Phys. Plasmas 25, 032501 Sec. III.B); call from parallel_eulerlagrange_integration - ForceFreeStatesStructs.jl: add delta_prime_matrix field to ForceFreeStatesInternal with docstring - JPEC.jl: write delta_prime_matrix to singular/delta_prime_matrix in HDF5 - test/runtests_parallel_integration.jl: add delta_prime_matrix regression test (shape, finiteness, non-zero diagonal); 30 tests total (was 23) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… for large-N accuracy The all-forward parallel FM path had ~10% energy error for large-N problems (DIIID N=26, n=1) because the chunk immediately before each rational surface crossing integrates into exponentially growing solution territory, producing an ill-conditioned FM propagator. Fix: integrate the crossing chunk *backward* (psi_end → psi_start). Solutions that grow exponentially forward decay backward, yielding a well-conditioned backward FM Φ_bwd. The accurate forward propagation is recovered as Φ_bwd⁻¹ via a stable LU solve in apply_propagator_inverse!. The same backward FM is used directly in the Δ' BVP (compute_delta_prime_matrix!) as Phi_L[j], splitting each ill-conditioned inter-surface FM product into well-conditioned Phi_R (forward chunks) and Phi_L (backward crossing chunk). Changes: - IntegrationChunk: add direction::Int=1 field (+1 forward, -1 backward) - chunk_el_integration_bounds: add bidirectional=false kwarg; crossing chunks get direction=-1 when true - balance_integration_chunks: left sub-chunk always direction=1; right inherits chunk.direction so the near-singularity chunk stays backward after splitting - integrate_propagator_chunk!: reverses tspan for direction=-1 chunks - apply_propagator_inverse!: new function, LU solve Φ_bwd·x = u_old - Serial assembly: branches on chunk.direction (inverse vs forward apply) - parallel_eulerlagrange_integration: passes bidirectional=true - compute_delta_prime_matrix!: BVP now uses Phi_R·x_right - Phi_L·x_left = 0 at each junction instead of ill-conditioned monolithic Phi_segs product - assemble_fm_matrix: safe for empty idx_range (uses propagators[1] for N) Results (et[1] stability eigenvalue): Solovev N=8: 0.006% error (was already fine) DIIID N=26: 0.236% error (was ~10.5% — 44× accuracy improvement) Tests: 31/31 pass in runtests_parallel_integration.jl (+1 DIIID accuracy test) 18/18 pass in runtests_riccati.jl (unchanged) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

logan-nc · 2026-03-08T21:23:05Z

@claude review this

claude · 2026-03-08T21:23:24Z

Claude finished @logan-nc's task in 3m 55s —— View job

Code Review

This is a well-designed PR implementing three complementary integration paths. The theory is clearly traced to Glasser (2016/2018), the docstrings are excellent, and the bidirectional fix for large-N conditioning is elegant. Below are issues found, ordered by severity.

Bug: `assemble_fm_matrix` panics on empty range

src/ForceFreeStates/Riccati.jl:77-87

function assemble_fm_matrix(propagators::Vector{ChunkPropagator}, idx_range)
    N = size(propagators[1].block_upper_ic, 1)   # ← accesses propagators[1] unconditionally
    Phi = Matrix{ComplexF64}(I, 2N, 2N)
    for i in idx_range
        Phi = Phi_i * Phi
    end
    return Phi
end

Called in compute_delta_prime_matrix! as:

Phi_R_mats[1] = assemble_fm_matrix(propagators, 1:i_crossings[1]-1)

If the first chunk is already a crossing chunk (i_crossings[1] == 1), the range is 1:0 (empty). propagators[1] still exists but the indexing to determine N is a fragile pattern — and more importantly, the result should be the identity matrix, which requires knowing N from somewhere else when the range is empty. The function should guard against this:

function assemble_fm_matrix(propagators::Vector{ChunkPropagator}, idx_range)
    isempty(idx_range) && return Matrix{ComplexF64}(I, 2*size(propagators[1].block_upper_ic,1), 2*size(propagators[1].block_upper_ic,1))
    N = size(propagators[first(idx_range)].block_upper_ic, 1)
    ...
end

Or better: pass N as an explicit argument. Fix this →

Minor: `apply_propagator!` allocates on every call in the serial assembly loop

src/ForceFreeStates/Riccati.jl:739-758

function apply_propagator!(odet::OdeState, prop::ChunkPropagator)
    u1_prev = copy(@view odet.u[:, :, 1])   # allocation
    u2_prev = copy(@view odet.u[:, :, 2])   # allocation
    tmp = similar(u1_prev)                   # allocation
    ...
end

This is called once per chunk in the serial assembly phase, so for a typical DIIID run (say 50 chunks) the overhead is ~150 allocations of N×N complex matrices. For N=26 this is negligible, but for the design intent of supporting large N it is worth pre-allocating a work buffer in parallel_eulerlagrange_integration and threading it through. Same for apply_propagator_inverse!. Fix this →

Minor: Stale comment in `runtests_parallel_integration.jl`

test/runtests_parallel_integration.jl:125-133

# Note: this test uses the Solovev example (N=8 modes) where FM propagators
# are well-conditioned. For large-N problems (N ≳ 20, e.g. DIIID with N=26),
# FM propagator ill-conditioning leads to ~10% energy error with no speedup
# over the serial Riccati path. See parallel_eulerlagrange_integration docstring
# for details and deferred fix approaches (bidirectional integration / continuous QR).

This comment predates the bidirectional fix that is the main contribution of this PR. The large-N issue is now resolved — the DIIID accuracy is within 0.24% (see test at line 190). This comment should be updated to reflect the current state. Fix this →

Minor: Dead code — `riccati_der!` and `compute_delta_prime_from_ca!`

Both functions have docstrings acknowledging they are unused ("retained for reference"):

riccati_der! (Riccati.jl:281): uses the @with_pool pool macro, tying it to an execution context. If this function is never called, it should either be tested (e.g., as a sanity check that the explicit form matches sing_der!), or removed. The docstring says it could be used with an implicit solver — that's a legitimate reason to keep it, but at minimum add a comment linking to the issue/TODO tracking that future work.
compute_delta_prime_from_ca! (EulerLagrange.jl:22): the docstring explicitly says "no longer called from any integration driver". If it's retained for testing, add a test. Otherwise remove.

Minor: `compute_delta_prime_matrix!` single-resonance assumption not asserted early

Riccati.jl:167-172

ipert_all = [begin
    sp = intr.sing[j]
    idx = 1 + sp.m[1] - intr.mlow + (sp.n[1] - intr.nlow) * intr.mpert
    @assert 1 <= idx <= N "Resonant mode index out of range"
    idx
end for j in 1:msing]

The function uses sp.m[1] / sp.n[1], silently assuming exactly one resonant mode per surface. The limitation is documented in the docstring, but multi-resonance surfaces will silently give wrong results (only the first resonant mode is coupled). A stronger guard at the function entry would be safer:

@assert all(j -> length(intr.sing[j].m) == 1, 1:msing) "compute_delta_prime_matrix! only supports single-resonance surfaces; got multi-resonance surface"

Fix this →

Minor: Test boilerplate duplication

The 15-line equilibrium setup block (parse TOML → setup_equilibrium → sing_find! → set modes) is repeated verbatim across 7+ test cases in runtests_riccati.jl and runtests_parallel_integration.jl. Extract a helper function:

function setup_solovev_intr_ctrl(; use_riccati=false, use_parallel=false)
    ex = joinpath(@__DIR__, "test_data", "regression_solovev_ideal_example")
    inputs = TOML.parsefile(joinpath(ex, "jpec.toml"))
    inputs["ForceFreeStates"]["verbose"] = false
    inputs["ForceFreeStates"]["use_riccati"] = use_riccati
    inputs["ForceFreeStates"]["use_parallel"] = use_parallel
    ...
    return intr, ctrl, equil, ffit
end

This would cut the test file length by ~40% and make failures easier to localize. Fix this →

Minor: `parallel_threads` field now superseded

ForceFreeStatesControl.parallel_threads (already present, default 1) was previously documented as "not yet implemented". Now that use_parallel routes through Threads.nthreads(), parallel_threads is effectively dead. Either: (a) wire parallel_threads into parallel_eulerlagrange_integration via @threads :static with thread count control, or (b) document it as deprecated in favour of JULIA_NUM_THREADS env variable.

Positive observations

The bidirectional integration rationale is crystal clear. The docstring on parallel_eulerlagrange_integration (and the STRIDE BVP comments in compute_delta_prime_matrix!) trace every design decision back to the specific equations in Glasser (2018). This level of traceability is excellent.
The outer plasma re-integration (Riccati rather than FM for the post-last-crossing region) is a thoughtful decision: it avoids FM ill-conditioning in the outer plasma and provides dense checkpoints for findmax_dW_edge!. The comment explaining why is clear.
renormalize_riccati_inplace! after every chunk in the serial assembly loop is the Julia analogue of ode_fixup in STRIDE — this connection is explicitly called out in the code comment, which is appreciated.
integrate_propagator_chunk! with no callback is correctly justified: identity ICs keep the solution bounded within each chunk (the chunk is designed to stop before the rational surface), so normalization is unnecessary.
Δ' normalization convention is carefully documented in riccati_cross_ideal_singular_surf!: the skip-renorm-before-crossing approach ensuring ca_r[ipert_res, ipert_res, 2] = 1 is a subtle but correct choice that makes the left/right Δ' comparison gauge-consistent.
Test coverage is solid. Unit tests for renormalize_riccati_inplace! (idempotent case, random case), apply_propagator! (identity, linearity), balance_integration_chunks (chunk count, contiguity, crossing count), and end-to-end regression tests for both Solovev and DIIID.

Adds benchmarks/benchmark_threads.jl to measure wall-clock time and accuracy of the standard, Riccati, and parallel FM integration paths across varying thread counts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Three fixes from code review of PR #178: - assemble_fm_matrix: add explicit isempty guard before the propagator loop so an empty idx_range (e.g. i_crossings[1]==1) returns the identity matrix without relying on the loop falling through silently. - compute_delta_prime_matrix!: add @Assert at function entry that all singular surfaces have exactly one resonant mode, so multi-resonance surfaces fail loudly instead of silently using only sp.m[1]/sp.n[1]. - runtests_parallel_integration.jl: remove stale comment that described large-N FM ill-conditioning as an open problem with ~10% energy error; bidirectional integration (now the default for use_parallel=true) has resolved this. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…er! and compute_delta_prime_from_ca! Two developer benchmark scripts for verifying the two dead-code reference implementations flagged in the Claude code review of PR #178: benchmarks/benchmark_riccati_der.jl Verifies riccati_der! correctly evaluates Glasser 2018 Eq. 19: dS/dψ = w†·F̄⁻¹·w - S·Ḡ·S, w = Q - K̄·S Uses Hermitian test states (physical constraint: the EL system preserves S†=S from the axis) and compares riccati_der! against manual evaluation of the same formula using the ffit splines directly. Observed error: ~1e-17 (machine epsilon). No TOML flags needed. benchmarks/benchmark_delta_prime_methods.jl Verifies compute_delta_prime_from_ca! gives bit-for-bit identical Δ' values to the inline computation in riccati_cross_ideal_singular_surf!. Both apply the same diagonal formula to the same ca_l/ca_r arrays, so the result must be exactly zero difference. Observed difference: 0.0 (exact). No TOML flags needed. Neither script requires new TOML flags: they call internal functions directly without going through ForceFreeStatesControl. Developer-only knobs belong in scripts, not in user-facing config. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tring The previous "O(Δψ)" phrasing in the Integration Strategy section read as a global accuracy statement, suggesting the Riccati path is only first-order accurate. This is wrong: the method integrates the linear EL ODE with Tsit5 (5th-order) and recovers S = U₁·U₂⁻¹ by exact renormalization, achieving the full ODE solver reltol. Rewrite the section in three clearly labelled parts: - Why riccati_der! (quadratic ODE) is avoided: relative error control is unfaithful when |S| is large, not a step-size problem, not fixable by adaptation without an implicit solver. - What the implementation actually does: sing_der! (linear ODE, exact RHS), Tsit5 (5th-order), exact renormalization, same global accuracy as standard. - Local consistency analysis: the O(Δψ) expansion is retained but now labelled explicitly as a consistency check, not an accuracy claim. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…setup + new unit tests Two changes in one pass: Shared setup (performance): equil (Grad-Shafranov) and ffit (metric matrices) are now built once and shared across all integration-dependent testsets via a make_solovev_intr helper for cheap fresh intr construction. Previously, setup_equilibrium + make_metric + make_matrix ran 4 times and riccati_eulerlagrange_integration ran 3 times. Now each runs once, cutting total test time significantly. New unit tests (dead code coverage): "riccati_der! formula — Glasser 2018 Eq. 19": verifies riccati_der! correctly evaluates dS/dψ = w†F̄⁻¹w − SGS at several ψ points using Hermitian test states (physical constraint). Agrees with manual formula evaluation to machine precision (~1e-17). No extra integration needed. "compute_delta_prime_from_ca! matches inline Δ'": verifies the standalone Δ' formula gives bit-for-bit identical results to the inline computation in riccati_cross_ideal_singular_surf!. Reuses the shared odet_ric. Total: 23 tests (was 18), runtime ~51s (was ~80s+ with redundant setup). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@threads

…dd 3 unit tests - Delete unused parallel_threads field from ForceFreeStatesControl: the field was silently ignored (Threads.@threads uses JULIA_NUM_THREADS at startup, not a runtime field). Removes false impression that thread count can be set from jpec.toml. - Add apply_propagator_inverse! round-trip unit test: verifies Φ⁻¹·Φ = I algebraically, complementing the existing apply_propagator! identity and linearity tests. - Add chunk_el_integration_bounds direction field test: verifies bidirectional=true sets direction=-1 on crossing chunks and direction=+1 on non-crossing chunks, and that balance_integration_chunks preserves direction correctly (right sub-chunk inherits, left sub-chunk always +1). Catches direction propagation regressions. - Add delta_prime_matrix DIIID regression test: verifies the STRIDE BVP Δ' matrix is finite and non-zero for the large-N case (N≈26, multiple rational surfaces), where ill-conditioned (non-bidirectional) FM propagators would produce NaN/Inf entries. 56/56 parallel integration tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…f/riccati One conflict resolved in src/GeneralizedPerturbedEquilibrium.jl (was src/JPEC.jl): write_outputs_to_HDF5 vacuum section. Resolution: - Keep new Δ' HDF5 outputs from perf/riccati (singular/m, singular/n, delta_prime, delta_prime_col, delta_prime_matrix) - Adopt develop's vacuum output format: vac_data variable name, plasma_pts/wall_pts fields (3D Cartesian), y_plasma/y_wall entries, always-write pattern with empty arrays All other files (EulerLagrange.jl, ForceFreeStatesStructs.jl, runtests.jl) auto-merged cleanly. Default HDF5 filename updated to gpec.h5. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…um references post-rename Update test files and benchmarks to use the new package name and config filename (gpec.toml) following the GPEC rename merged from develop: - test/runtests_riccati.jl - test/runtests_parallel_integration.jl - benchmarks/benchmark_threads.jl - benchmarks/benchmark_riccati_der.jl - benchmarks/benchmark_delta_prime_methods.jl 23/23 riccati tests and 56/56 parallel integration tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

logan-nc · 2026-03-09T02:50:09Z

Post-merge update — after develop (GPEC rename) merge

Merged origin/develop (GPEC name-change PR #168) into this branch. One conflict resolved in write_outputs_to_HDF5: kept the new Δ' HDF5 outputs from this branch, adopted develop's updated vacuum field names (vac_data, plasma_pts/wall_pts, 3D Cartesian). Updated all test and benchmark files to reference GeneralizedPerturbedEquilibrium and gpec.toml.

Test results (post-merge)

Suite	Pass	Total	Time
`runtests_riccati.jl`	23	23	1m27s
`runtests_parallel_integration.jl`	56	56	1m31s

Benchmark results (post-merge, 1 thread)

Example	Path	et[1]	Time	Speedup vs std	Error vs std
Solovev N=8	standard	16.56600	0.09s	1.00×	—
Solovev N=8	riccati	16.56691	0.17s	0.51×	0.0055%
Solovev N=8	parallel	16.56691	0.13s	0.66×	0.0055%
DIIID N=26	standard	1.21237	1.26s	1.00×	—
DIIID N=26	riccati	1.21386	0.63s	1.98×	0.1230%
DIIID N=26	parallel	1.21386	1.19s	1.06×	0.1231%

Benchmark results (post-merge, 4 threads)

Example	Path	et[1]	Time	Speedup vs std	Error vs std
Solovev N=8	standard	16.56600	0.25s	1.00×	—
Solovev N=8	riccati	16.56691	0.09s	2.83×	0.0055%
Solovev N=8	parallel	16.56691	0.14s	1.80×	0.0055%
DIIID N=26	standard	1.21237	1.17s	1.00×	—
DIIID N=26	riccati	1.21386	0.91s	1.29×	0.1230%
DIIID N=26	parallel	1.21386	0.90s	1.30×	0.1231%

Note: 1-thread Solovev timings have high variance (sub-100ms runs) and are not meaningful for speedup comparisons. DIIID (1.2s+) is the reliable benchmark. At 4 threads, Riccati achieves 1.29× and parallel achieves 1.30× on DIIID. Energy agreement is unchanged: <0.13% for both paths vs standard.

…page Creates docs/src/stability.md covering the ForceFreeStates module: - Newcomb/DCON ideal MHD stability criterion with paper citations (Glasser 2016 Phys. Plasmas 23 112506, 2018a 032507, 2018b 032501) - Standard, Riccati, and parallel FM integration methods - Bidirectional integration strategy for large-N accuracy - Δ' tearing parameter: per-surface (delta_prime/delta_prime_col) and inter-surface matrix (delta_prime_matrix / STRIDE BVP) - Configuration reference, API autodocs block, example usage Adds page to docs/make.jl navigation and cross-links from equilibrium.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@autodocs

…kdown links) 1. Add Random stdlib to Project.toml [deps] and [compat] — required by runtests_riccati.jl but missing from declared dependencies, causing CI failure with "Package Random not found in current path". 2. Fix docstring markdown in Riccati.jl and ForceFreeStatesStructs.jl: - Wrap bare [array_notation] (link text) immediately followed by (description) (parsed as URL) in code fences to prevent Documenter from treating them as broken local links. - Affected: assemble_fm_matrix BVP unknowns block, Phi_L/Phi_R equations, and VacuumData plasma_pts/wall_pts field descriptions. These were surfaced by the new @autodocs block in stability.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@threads

…nt logging Three targeted fixes from pre-merge code review: 1. Threads.@threads :static — since Julia 1.7, the default :dynamic scheduler can migrate tasks between OS threads mid-execution, making Threads.threadid() unreliable for indexing into odet_proxies. Using :static guarantees a 1:1 task-to-thread mapping for the parallel FM integration phase. 2. outer_chunk psi_end guard — the outer-plasma re-integration in parallel_eulerlagrange_integration now uses psilim*(1-eps) to match the guard applied by chunk_el_integration_bounds, avoiding a potential boundary evaluation at exactly psilim. 3. Replace println with @info/@warn — all verbose-mode output in Riccati.jl now uses Julia logging macros, consistent with EulerLagrange.jl. This enables log-level filtering and suppression in tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

logan-nc · 2026-03-09T12:46:37Z

Pre-merge cleanup (commits `1515591`..`725a527`)

Three rounds of fixes applied after a code review pass:

CI fixes (`1515591`)

Random stdlib missing from Project.toml — runtests_riccati.jl uses using Random but the package was not declared in [deps], causing CI to fail with Package Random not found. Added to both [deps] and [compat].
Broken markdown links in docstrings — adding @autodocs in stability.md surfaced docstring patterns like x_left[j] (2N) that CommonMark parses as [link text](url) even with spaces between ] and (. Fixed by wrapping affected notation blocks in code fences (assemble_fm_matrix, compute_delta_prime_matrix!, VacuumData field docs).

Documentation (`142a79c`)

New docs/src/stability.md page covering the ForceFreeStates module: Newcomb/DCON stability criterion with paper citations (Glasser 2016/2018a/2018b), all three integration drivers, Δ' computation (per-surface and STRIDE BVP matrix), configuration reference, @autodocs API block, and example usage.
Added to docs/make.jl navigation; cross-linked from equilibrium.md.

Pre-merge code review fixes (`725a527`)

Three issues flagged as must-fix or high-priority:

Threads.@threads :static — since Julia 1.7, the default :dynamic scheduler can migrate tasks between OS threads, making Threads.threadid() unreliable for indexing odet_proxies. :static guarantees a 1:1 task-to-thread mapping.
outer_chunk psi_end guard — the outer-plasma re-integration now uses psilim * (1 - eps) to match the guard applied by chunk_el_integration_bounds, avoiding a potential boundary evaluation at exactly psilim.
Consistent logging — all println calls in Riccati.jl replaced with @info macros, matching the pattern in EulerLagrange.jl and enabling log-level filtering/suppression in tests.

All 23 Riccati tests and 56 parallel FM tests pass locally after these changes.

logan-nc · 2026-04-22T17:45:32Z

@jhalpern30 will clean this up by moving U solutions to be 2D instead of 3D after this merges (#89)

d-burg · 2026-04-22T17:49:15Z

Final checks from 04/22 meeting:

Check how truncate_at_dW_peak is changing Δ', specifically order of finishing integration vs Δ' calculation
Double δW use_parallel=False
Add all TJ equil parameters to gpec.toml
Check energy U_store and Ud_store in use_parallel=True, these are DCON ξ functions. Essential for perturbed equilibrium. Compare against DCON and add to benchmark

@threads

… default 1 (serial) eliminates DIII-D 147131 thread-race The parallel BVP path in `parallel_eulerlagrange_integration` was always invoking `Threads.@threads :static` over the FM chunks, ignoring the `parallel_threads` field on `ForceFreeStatesControl`. On numerically delicate equilibria (e.g. DIII-D 147131 at βₚ ≈ 0.07) this exposed a sub-tolerance nondeterminism: chunk crossings whose post-jump matrices depend on the order of independent FP operations across threads, producing intermittently divergent FM matrices and intermittent BVP failures. The algorithm is correct; the wall-time interleaving of parallel chunks was perturbing it within tolerance. Fix: * `Riccati.jl`: branch on `bvp_threads = clamp(parallel_threads, 1, julia_nthreads)`. `bvp_threads == 1` runs the chunks serially on the calling thread (race-free, bit-deterministic). Otherwise, the existing `:static` parallel path is used. * `ForceFreeStatesStructs.jl`: document `parallel_threads` semantics, default `1`, and the cost (~14% slower than 2-thread on DIII-D 147131 reference). Verified: with `parallel_threads = 1` (default) and `JULIA_NUM_THREADS = 2`, the DIII-D 147131 βₚ=0.07 reference Δ' diagonal matches CONVENTIONS.md §6 exactly: q=2: +7.92 - 0.03i q=3: -5.24 - 0.30i q=4: -40.20 + 209.91i q=5: +126.6 - 169.24i in 54.5 s wall (single 4-singular-surface coupled BVP). No regressions on TJ. Production scans should keep the default; users with robust equilibria and strict wall-time budgets can opt in to `parallel_threads > 1` knowing the trade-off. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…BVP speedup; bit-identical Δ' in 15-trial DIII-D 147131 sweep) Empirical reliability sweep on DIII-D 147131 βₚ≈0.07 (5 trials at each of parallel_threads ∈ {1, 2, 4}, JULIA_NUM_THREADS=4, post-JIT, single Julia session) showed: parallel_threads | wall (avg, single 4-singular-surface coupled BVP) -----------------|------------------------------------------------- 1 (serial) | 9.25 s — bit-deterministic by construction 2 | 7.37 s — bit-identical Δ' in all 5 trials (+20.3%) 4 | 7.51 s — bit-identical Δ' in all 5 trials (+18.9%) Δ′ diagonals were bit-identical across all 15 trials and matched the §6 reference values exactly. Speedup saturates at 2 threads — the BVP has ~10 FM chunks, so 2 threads is enough to amortize them; 4 adds scheduling overhead with no benefit on this BVP. Bumping default to 2 captures the ~20% wall-time win on production scans. The serial path remains available (`parallel_threads = 1`) as a deterministic fallback if the historical intermittent race re-manifests on a delicate equilibrium. Documentation in `ForceFreeStatesControl` docstring updated to record the trade-off and the empirical reliability data. Use `parallel_threads = 1` (NOT `use_parallel = false`) if a parallel run ever diverges — `use_parallel = false` produces silently wrong Δ' values (see CONVENTIONS.md §7). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

matt-pharr · 2026-05-08T19:44:07Z

@d-burg status on this?

…stent Δ' + LAR TJ TOML refactor + parallel ξ benchmark Three perf/riccati cleanups: 1) ForceFreeStates - BUG FIX - truncate_at_dW_peak now keeps Δ' self-consistent with the truncated boundary (Option B). Previously the FM propagators were built for the original psilim while the edge BC (wv) was applied at the truncated psilim, silently shifting the outermost rational's Δ' by tens of percent. After the dW peak is identified: - rebuild the straddling FM chunk with psi_end=peak_psi and re-integrate its single propagator, - drop chunks entirely past the peak, - keep intr.psilim/qlim/odet.u at the new (truncated) boundary. This way compute_delta_prime_matrix! always sees propagators and wv that match intr.psilim. ForceFreeStatesStructs.jl docstring updated; the "corrupts Δ' and δW" warning is removed since Option B keeps the metric well-defined. Default truncate_at_dW_peak=false unchanged. 2) EXAMPLES - IMPROVEMENT - LAR_beta_scan and LAR_epsilon_scan TJ params are now in tj.toml (next to gpec.toml) instead of hardcoded constants inside run_scan.jl. Each run_scan.jl reads the baseline tj.toml once and only overrides the single scanned variable (pc for β, lar_r0 for ε) per point. Matches the cleaner pattern already used by TJ_epsilon_pole_example. Both `--test` modes verified end-to-end (3 points each, all converged). 3) BENCH - NEW - benchmark_xi_parallel_vs_serial.jl + Solovev xi_benchmark plot demonstrating the use_parallel=true ξ-function gap: - serial path (EL): 274 dense saved ψ, u_store and ud_store fully populated as DCON ξ_ψ, dξ_ψ/dψ, ξ_s - parallel path (Riccati FM): only 31 saved ψ (chunk endpoints + outer-plasma dense), and u_store actually holds the Riccati S matrix (from the (S, I) renormalisation) — NOT the DCON ξ functions - ud_store essentially zero in the inter-surface region (matches Riccati.jl:1497 caveat) The plot makes this unambiguous via per-mode norms vs ψ_N and step-count subtitle. Downstream perturbed-equilibrium code that reads integration/xi_psi etc. must use use_parallel=false until a proper S→ξ conversion (or dense re-integration) is added to the parallel path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@testset

…entical regression + pinned Δ' Three coupled changes for the parallel FM-propagator path so both Δ' AND the DCON ξ functions come back from a single `use_parallel = true` run: 1. Dense ξ pass. `parallel_eulerlagrange_integration` now appends a serial Euler-Lagrange dense pass at the end (helper `_populate_dense_xi_via_serial_el!`) that replaces the propagator-BVP `odet` with a fresh serial-EL odet whose `u_store`/`ud_store` are dense and in axis basis — the only convention the PerturbedEquilibrium / FieldReconstruction downstream code consumes correctly. All BVP-relevant fields (`intr.psilim`, `intr.qlim`, `intr.sing[*].delta_prime`, `delta_prime_col`, `ua_left`, `psi_ua_left`) are saved/restored across the pass. Gated by new `ctrl.populate_dense_xi::Bool = true` (default on). 2. Multi-resonance skip. Replace the hard `@assert` in `compute_delta_prime_matrix!` (which crashed multi-`n` runs whose q value was rational for two distinct `(m, n)` tuples) with an early return + warning. Per-surface Δ' from `riccati_cross_ideal_singular_surf!` and HDF5 `singular/delta_prime` remain populated; only the inter-surface BVP `singular/delta_prime_matrix` is omitted in that regime. Full multi-resonance BVP support tracked as a follow-up. 3. Tests + benchmark. - New @testset "ξ functions bit-identical between use_parallel modes (populate_dense_xi)" proves `psi_store/q_store/u_store/ud_store/ crit_store/step/nzero` from `use_parallel=true; populate_dense_xi=true` are byte-for-byte identical to a `use_parallel=false` run on both Solovev (small N) and DIIID-like (large N), plus a sparse-storage control assertion so the bit-identical claim can't trivially pass. - Pinned per-surface `intr.sing[s].delta_prime` values added to both Solovev and DIIID-like "Parallel FM integration matches standard ODE" testsets (rtol=0.05, matches existing `et_par ≈ 1.29` style). - Pinned diagonal `delta_prime_matrix` values added to both STRIDE BVP Solovev + DIIID-like testsets (rtol=0.05). - Benchmark `benchmarks/benchmark_xi_parallel_vs_serial.jl` rewritten: accepts any example dir (defaults to Solovev + DIIID-like), overlays all resonant modes on log-y, adds a right-column residual panel. Net: `runtests_parallel_integration.jl` grew from 113 to 127 tests (≈13 s extra per CI matrix entry); `runtests_fullruns.jl` went from 8/9 (pre-existing multi-n crash) to 9/9 pass after change (2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ywhere GPEC's analytic-equilibrium adaptation of R. Fitzpatrick's TJ code (https://github.com/rfitzp/TJ) is now consistently named "TJ-like" in identifiers and prose, to distinguish it from the upstream TJ code itself. Fitzpatrick's TJ is cited at every definition and use site. Identifier renames (BREAKING for direct API users): - Struct: TJConfig → TJLikeConfig (both file-path and dict constructors) - Functions: tj_run → tj_like_run tj_run_direct → tj_like_run_direct tj_f1 → tj_like_f1 tj_f1p → tj_like_f1p tj_shape_rhs! / tj_shape_initial / tj_shape_solve / tj_find_nu → tj_like_shape_rhs! / _initial / _solve / tj_like_find_nu TJShapeParams → TJLikeShapeParams - Local parameter `tj::TJLikeConfig` → `tjlike::TJLikeConfig` throughout AnalyticEquilibrium.jl. Config / user-facing renames (BREAKING for existing gpec.toml files): - eq_type values: "tj" → "tj_like", "tj_direct" → "tj_like_direct" - Embedded TOML section: [TJ_INPUT] → [TJ_LIKE_INPUT] - EquilibriumConfig now makes `eq_filename` optional when the embedded [TJ_LIKE_INPUT] / [SOL_INPUT] / [LAR_INPUT] section is present. - Dropped a stale `sigma_type="tj"` reference on LargeAspectRatioConfig.qa. Tests: - test/runtests_tj_analytic.jl → test/runtests_tj_like_analytic.jl (git-detected rename, 16/16 pass) - test/runtests.jl include path updated. Coincidental matches in Vacuum/Field.jl ("fintjj") and InnerLayer/GGJ/{Shooting,InnerAsymptotics}.jl ("_build_tjmat", "inps_tjmat", loop-local `tj`) are intentionally left alone — they have nothing to do with Fitzpatrick's TJ code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ation, TJ → TJ-like LAR_beta_scan and LAR_epsilon_scan each now consist of a single self-describing gpec.toml plus run_scan.jl. No more tj.toml side-cars: all TJ-like analytic-equilibrium parameters live in an embedded [TJ_LIKE_INPUT] section that gets parsed via the new EquilibriumConfig(::Dict) path. Every field across [Equilibrium], [TJ_LIKE_INPUT], [Wall], and [ForceFreeStates] has a one-liner comment describing what it actually is (not just a label) — e.g. "Number of radial spline nodes used to discretize ψ" instead of "Radial grid points". The header of each gpec.toml notes that the model follows R. Fitzpatrick's TJ code (https://github.com/rfitzp/TJ) profile family. run_scan.jl scripts updated: - import TJLikeConfig (was TJConfig) - override config["TJ_LIKE_INPUT"][...] (was config["TJ_INPUT"][...]) - LAR_epsilon_scan flips eq_type → "tj_like_direct" per point - banners say "TJ-like β scan" / "TJ-like ε scan" diagnose_profiles.jl docstring clarified that its "TJ" geqdsk comparison data are produced by Fitzpatrick's external TJ code, not GPEC's internal `tj_like` model. End-to-end --test runs of both scans complete with Δ' values bit-identical to pre-rename outputs (dp21 = {+10.0150, +15.7659, +292.6038} for the β scan; {+9.2087, +5.5595, +2.4427} for the ε scan). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n case The TJ_epsilon_pole_example/ directory and its regression-harness/cases/tj_epsilon_pole.toml entry are removed. The ε ≈ 0.66 near-pole physics it exercised is already covered by the ε-scan in examples/LAR_epsilon_scan/ (which sweeps ε up to 0.660 along the same kink-pole approach) and by the "tj_like_run_direct (Option B) — pole-approach physics at ε = 0.60" testset in test/runtests_tj_like_analytic.jl. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Follow-up to 6d07c07. "TJ-like" reads as a weak hedge ("kinda like the real thing"); "TJ-analytic" says exactly what this is — GPEC's implementation of the analytic-profile model from R. Fitzpatrick's TJ code (https://github.com/rfitzp/TJ). Citation everywhere this model is defined or used is unchanged. Identifier renames (BREAKING again, layered on top of 6d07c07's first breaking pass): - Struct: TJLikeConfig → TJAnalyticConfig - Struct: TJLikeShapeParams → TJAnalyticShapeParams - Functions: tj_like_run → tj_analytic_run tj_like_run_direct → tj_analytic_run_direct tj_like_f1 / _f1p → tj_analytic_f1 / _f1p tj_like_shape_rhs! → tj_analytic_shape_rhs! tj_like_shape_initial → tj_analytic_shape_initial tj_like_shape_solve → tj_analytic_shape_solve tj_like_find_nu → tj_analytic_find_nu - Local parameter `tjlike::TJLikeConfig` → `tj::TJAnalyticConfig` (the parameter name reverts to the original short `tj` since the type signature now disambiguates without ambiguity). Config / user-facing renames (BREAKING for any gpec.toml or downstream code that adopted 6d07c07's `tj_like` names): - eq_type values: "tj_like" → "tj_analytic" "tj_like_direct" → "tj_analytic_direct" - Embedded TOML section: [TJ_LIKE_INPUT] → [TJ_ANALYTIC_INPUT] Test file renamed back: - test/runtests_tj_like_analytic.jl → test/runtests_tj_analytic.jl (git-detected rename; matches the original pre-perf/riccati name) Docstrings + comments tightened where "TJ-like analytic" was redundant: "TJ-like analytic equilibrium" → "TJ-analytic equilibrium", etc. Where the prose refers to something that lives in Fitzpatrick's actual TJ code (e.g. GetPSIvac, GetHHvac, EFIT writer, Setnu), the language now says "TJ-analytic X (cf. Fitzpatrick's TJ)" or just "TJ X" — the "-analytic" suffix is reserved for our model class, while bare "TJ" refers to the upstream code. Verification: - julia Pkg.precompile() clean - runtests_tj_analytic.jl: 16/16 pass - Full test suite: 846/846 pass - LAR_beta_scan --test: Δ' bit-identical to pre-rename (dp21 = +10.0150, +15.7659, +292.6038 for pc ∈ {0.001, 0.10, 0.17}) - Banner now reads "TJ-analytic β scan" / "TJ-analytic ε scan" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ense ξ pass The dense ξ pass in `_populate_dense_xi_via_serial_el!` (introduced in 5acf147) replaces `odet` with a fresh serial-EL odet, but the previous implementation only saved/restored `intr.sing[*]` fields — leaving the parallel BVP's (S, I) Riccati-gauge `odet.ca_l` and `odet.ca_r` to be silently overwritten by the fresh EL pass's axis-basis values. PerturbedEquilibrium's `SingularCoupling.jl` is calibrated against the Riccati gauge: lbwp1, rbwp1 = ForceFreeStates_results.ca_l[resnum, resnum, 2, s], ForceFreeStates_results.ca_r[resnum, resnum, 2, s] delta_prime = (rbwp1 - lbwp1) / (twopi * chi1) delcurs = (rbwp1 - lbwp1) * j_c * im / (twopi * m_res) singflx_mn = compute_singular_flux(resonant_current_val, ...) resonant_flux[n_idx, s] = singflx_mn / area With axis-basis `ca_l` / `ca_r` from the EL pass (where U₁ grows exponentially from the axis), these magnitudes blow up by ~25 orders of magnitude: 3c8130d (perf/riccati pre-dense-pass): max|resonant_flux| = 5.81e-03 HEAD before this fix: max|resonant_flux| = 2.85e+23 HEAD after this fix: max|resonant_flux| = 5.81e-03 ✓ bit-identical Cascading downstream quantities — `delta_prime`, `island_width_sq`, `Chirikov parameter`, `resonant_current`, `penetrated_field` — all return to their pre-dense-pass physical magnitudes. The fix: save `odet.ca_l` / `odet.ca_r` to the `saved` tuple before the dense pass, then copy them onto `fresh_odet.ca_l` / `fresh_odet.ca_r` after the dense pass returns. The fresh EL odet's own ca_l/ca_r (axis basis) are discarded — they were never needed since `ξ` reconstruction uses `u_store` and `compute_delta_prime_matrix!` uses propagators/chunks rather than ca_l/ca_r. Full test suite: 846/846 pass. The bit-identical tests in runtests_parallel_integration.jl don't check ca_l/ca_r (only u_store/ud_store/psi_store/etc.), so they still pass — and now PE downstream gets the correct Riccati-gauge ca matrices it expects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

d-burg · 2026-05-14T19:54:23Z

perf/riccati: parallel ξ + Δ' from one run, TJ-analytic rename, multi-resonance skip

@logan-nc @matt-pharr what do you think? I believe we're just about ready to merge, pending your recommendations.

Summary

Brings the parallel FM-propagator BVP path to feature-parity with the
serial DCON path so a single use_parallel = true run produces both
the inter-surface Δ' matrix and dense DCON ξ functions in axis basis.
Renames GPEC's analytic-equilibrium adaptation of R. Fitzpatrick's TJ code
to tj_analytic (with citation everywhere it's defined or used), turns the
LAR scan examples into self-describing single-file gpec.toml, and
gracefully handles multi-n runs where one q value is rational for
multiple (m, n) tuples.

Commits since `c6c845f`

Hash	Title
`54d12fe2`	ForceFreeStates - NEW FEATURE - Port set_psilim_via_dmlim + default tightenings
`4845ec80`	ForceFreeStates - IMPROVEMENT - Default use_parallel=true, singfac_min=1e-4
`db7c490a`	ForceFreeStates - BUG FIX - Wire ctrl.parallel_threads into BVP path; default 1 (serial) eliminates DIII-D 147131 thread-race
`7ac87c8e`	ForceFreeStates - PERFORMANCE - parallel_threads default 1 → 2 (≈20 % BVP speedup; bit-identical Δ' in 15-trial DIII-D 147131 sweep)
`3c8130da`	ForceFreeStates - BUG FIX + EXAMPLES - truncate_at_dW_peak self-consistent Δ' + LAR TJ TOML refactor + parallel ξ benchmark
`5acf1478`	ForceFreeStates - NEW FEATURE - Dense ξ in parallel BVP path + bit-identical regression + pinned Δ'
`6d07c07d`	EQUIL - REFACTOR - Rename TJ → TJ-like with Fitzpatrick citation everywhere
`085de133`	EXAMPLES - CLEANUP - LAR scans: single-file gpec.toml, per-line annotation, TJ → TJ-like
`5a4c2c29`	EXAMPLES - CLEANUP - Remove TJ_epsilon_pole_example and its regression case
`cc2affeb`	EQUIL - REFACTOR - Rename tj_like → tj_analytic (cleaner, less hedge-y)
`8073c126`	ForceFreeStates - BUG FIX - Preserve Riccati-gauge ca_l/ca_r across dense ξ pass

Total: 20 files changed (+1355, −718) over the 5 new commits, on top of
the 5 already-pushed perf/riccati commits.

"What's in the box" (a proper Claude-ism)

`ForceFreeStates` — dense ξ in the parallel path (`5acf1478`)

Three coupled changes so the parallel FM-propagator BVP delivers both Δ'
and DCON eigenfunctions usable by downstream PerturbedEquilibrium:

Dense ξ pass. parallel_eulerlagrange_integration now appends a
serial Euler-Lagrange dense pass (helper
_populate_dense_xi_via_serial_el!) that replaces the propagator-BVP
odet with a fresh serial-EL odet whose u_store / ud_store are
dense and in axis basis — the only convention PerturbedEquilibrium
/ FieldReconstruction consumes correctly. All BVP-relevant fields
(intr.psilim, intr.qlim, intr.sing[*].delta_prime,
delta_prime_col, ua_left, psi_ua_left) are saved/restored
across the pass. Gated by the new
ctrl.populate_dense_xi::Bool = true (default on).

Multi-resonance skip in compute_delta_prime_matrix!. Replace
the hard @assert (which crashed multi-n runs whose q value was
rational for two distinct (m, n) tuples) with an early return +
warning. Per-surface scalar Δ' from
riccati_cross_ideal_singular_surf! and HDF5
singular/delta_prime / singular/delta_prime_col remain
populated; only the inter-surface BVP singular/delta_prime_matrix
is omitted in that regime.
Tests + benchmark.
- New @testset "ξ functions bit-identical between use_parallel modes (populate_dense_xi)" proves psi_store / q_store / u_store / ud_store / crit_store / step / nzero from
  use_parallel=true; populate_dense_xi=true are byte-for-byte
  identical to a use_parallel=false run on both Solovev (small N)
  and DIIID-like (large N). A populate_dense_xi=false control
  assertion proves the result isn't trivially passing through
  accidental sparse storage on both sides.
- Pinned per-surface intr.sing[s].delta_prime values added to both
  "Parallel FM integration matches standard ODE" testsets
  (rtol = 5 %, matching the existing et_par ≈ 1.29 style).
- Pinned diagonal delta_prime_matrix values added to both STRIDE
  BVP testsets (rtol = 5 %).
- benchmarks/benchmark_xi_parallel_vs_serial.jl rewritten:
  accepts any example dir (defaults to Solovev + DIIID-like),
  overlays all resonant modes on log-y, adds a right-column residual
  panel. Confirms exact (max|Δ| == 0) match on both equilibria.
Net: runtests_parallel_integration.jl grew 113 → 127 tests
(≈13 s extra per CI matrix entry); runtests_fullruns.jl went from
8/9 (pre-existing multi-n crash) to 9/9 pass.

`EQUIL` — TJ → TJ-analytic rename + Fitzpatrick citation (`6d07c07d`, `cc2affeb`)

GPEC's adaptation of R. Fitzpatrick's TJ code analytic-profile family
(https://github.com/rfitzp/TJ) is now consistently named "TJ-analytic" in
identifiers and prose. The upstream TJ code is cited at every
definition and use site. Why two commits: the first pass introduced
tj_like as the new identifier, but on review it read as a weak hedge
("kinda like the real one"); tj_analytic says exactly what this is —
the analytic-profile model from Fitzpatrick's TJ, fed through GPEC's
pipeline rather than the full TJ code.

Identifier renames:

TJConfig → TJAnalyticConfig (both file-path and dict constructors)
tj_run / tj_run_direct → tj_analytic_run / tj_analytic_run_direct
tj_f1 / tj_f1p → tj_analytic_f1 / tj_analytic_f1p
tj_shape_rhs! / _initial / _solve → tj_analytic_shape_*
tj_find_nu → tj_analytic_find_nu
TJShapeParams → TJAnalyticShapeParams
Local parameter tj::TJAnalyticConfig (same short tj as before;
the type signature carries the disambiguation).

Config / user-facing renames:

eq_type values: "tj" → "tj_analytic", "tj_direct" → "tj_analytic_direct"
Embedded TOML section: [TJ_INPUT] → [TJ_ANALYTIC_INPUT]
EquilibriumConfig now makes eq_filename optional when an embedded
[TJ_ANALYTIC_INPUT] / [SOL_INPUT] / [LAR_INPUT] section is present.
Dropped a stale sigma_type="tj" reference on
LargeAspectRatioConfig.qa.

Tests: test/runtests_tj_analytic.jl (16/16 pass). Coincidental
matches in Vacuum/Field.jl (fintjj) and
InnerLayer/GGJ/{Shooting,InnerAsymptotics}.jl (_build_tjmat,
inps_tjmat, loop-local tj) are intentionally left alone — they
have nothing to do with Fitzpatrick's TJ code.

Prose convention: the -analytic suffix is reserved for our model
class. Bare "TJ" refers to the upstream code (e.g. "TJ-analytic
GetPSIvac (cf. Fitzpatrick's TJ)" in comments referring to functions
we ported the logic of).

`EXAMPLES` — LAR scans (`085de133`) + TJ_epsilon_pole removal (`5a4c2c29`)

LAR_beta_scan and LAR_epsilon_scan each now consist of a single
self-describing gpec.toml plus run_scan.jl. No more tj.toml
side-cars: all TJ-analytic equilibrium parameters live in an embedded
[TJ_ANALYTIC_INPUT] section. Every field across [Equilibrium],
[TJ_ANALYTIC_INPUT], [Wall], and [ForceFreeStates] has a
one-liner comment describing what it is (not just a label) — e.g.
"Number of radial spline nodes used to discretize ψ" rather than
"Radial grid points". Each header cites
https://github.com/rfitzp/TJ.

run_scan.jl scripts updated:

import TJAnalyticConfig (was TJConfig)
override config["TJ_ANALYTIC_INPUT"][...] (was config["TJ_INPUT"][...])
LAR_epsilon_scan flips eq_type → "tj_analytic_direct" per point
banners say "TJ-analytic β scan" / "TJ-analytic ε scan"

diagnose_profiles.jl docstring clarified that its "TJ" geqdsk
comparison data are produced by Fitzpatrick's external TJ code, not
GPEC's internal tj_analytic model.

examples/TJ_epsilon_pole_example/ and the matching
regression-harness/cases/tj_epsilon_pole.toml removed. The
ε ≈ 0.66 near-pole physics it exercised is already covered by the
ε-scan in examples/LAR_epsilon_scan/ (which sweeps ε up to 0.660
along the same kink-pole approach) and by the
"tj_analytic_run_direct (Option B) — pole-approach physics at ε = 0.60"
testset in test/runtests_tj_analytic.jl.

Default behaviour after merge

With the shipped example tomls (ode_flag=true, vac_flag=true,
use_parallel=true, populate_dense_xi=true), one main() call
produces:

Quantity	HDF5 path	Required flags
DCON ξ functions, dense in axis basis	`integration/xi_psi`, `dxi_psi`, `xi_s`, `psi`, `q`	`ode_flag`, `populate_dense_xi`
Per-surface Δ'	`singular/delta_prime`, `singular/delta_prime_col`	`ode_flag`, `use_parallel`
Inter-surface PEST3 Δ' matrix	`singular/delta_prime_matrix`	`ode_flag`, `use_parallel`, `vac_flag`, single-resonance surfaces only
Free-boundary energies	`vacuum/ep`, `vacuum/ev`, `vacuum/et`	`ode_flag`, `vac_flag`

Tests

File	Result
`runtests_utilities.jl`	11 / 11 ✓
`runtests_vacuum.jl`	252 / 252 ✓
`runtests_equil.jl`	240 / 240 ✓
`runtests_eulerlagrange.jl`	92 / 92 ✓
`runtests_riccati.jl`	24 / 24 ✓
`runtests_parallel_integration.jl`	127 / 127 ✓ (was 113 before this PR; +14 from pinned Δ' + bit-identical tests)
`runtests_sing.jl`	75 / 75 ✓
`runtests_tj_analytic.jl`	16 / 16 ✓
`runtests_fullruns.jl`	9 / 9 ✓ (was 8/9 errored on perf/riccati tip due to pre-existing multi-n assertion crash; now passes thanks to the skip in `5acf1478`)
Total	846 / 846 ✓

End-to-end --test runs of both LAR scans complete with Δ' values
bit-identical to pre-rename outputs:

Scan	Δ' values
LAR_beta_scan `--test`	`dp21 = {+10.0150, +15.7659, +292.6038}` for `pc ∈ {0.001, 0.10, 0.17}`
LAR_epsilon_scan `--test`	`dp21 = {+9.2087, +5.5595, +2.4427}` for `ε ∈ {0.2495, 0.4072, 0.5510}`

Regression-harness sweep against `develop`

Ran regress --cases diiid_n1,solovev_n1,solovev_multi_n --refs develop,local
(develop @ e0ee9dff, 2026-04-12). All differences are accounted for:

Expected shifts (introduced earlier in perf/riccati, intentional)

These show up across all three cases and reflect documented behavior changes:

Class	Cause	Approx scale
ODE step counts (`integration/nstep_*`) ↓ ~45-55 %	`singfac_min = 1e-4` (was 0) tightens inner-layer cutoff (`4845ec80`); dense ξ pass uses serial-EL adaptive stepping (`5acf1478`)	DIIID 1526 → 756
Energy eigenvalues (`et[1]`, `ep[1]`, `ev[1]`) shift O(1)-O(100)	`free_run!` now runs on the dense-pass serial-EL `odet` (axis basis) rather than the parallel Riccati `odet` (renormalized basis)	DIIID `et[1]` 1.51 → −7.80
`mpert` change (DIIID 26 → 35)	`delta_mhigh` doubling in main pipeline + tightenings (`54d12fe2`)	—
Δ' diagonal magnitudes	Now computed by `riccati_cross_ideal_singular_surf!` in (S, I) Riccati gauge (`4845ec80`); old default path used a different normalization	within-branch bit-identical (pinned in tests)
`q profile (checksum)`, `pressure profile (checksum)`	`set_psilim_via_dmlim` moved the integration limit (`54d12fe2`)	—

`||resonant flux||` blow-up — diagnosed and fixed in this PR (`8073c126`)

Pre-fix the dense ξ pass produced max|resonant_flux| = 2.85e+23 on
diiid_n1 (vs 5.81e-03 on the pre-dense-pass perf/riccati tip
3c8130da). Cascading downstream: delta_prime, island_half_widths,
Chirikov parameter, resonant_current, penetrated_field all blew up.

Root cause: _populate_dense_xi_via_serial_el! (in 5acf1478)
replaced the parallel BVP's odet with a fresh serial-EL odet but
preserved only intr.sing[*] fields — silently letting the fresh EL
pass's axis-basis ca_l / ca_r overwrite the parallel BVP's
(S, I) Riccati-gauge values. PE's SingularCoupling.jl reads
ca_l[resnum, resnum, 2, s] / ca_r[resnum, resnum, 2, s] and is
written against the Riccati convention; in EL axis basis these
matrices carry the exponentially-growing U₁ at the inner-layer
boundary, inflating downstream resonant flux by ~25 orders of magnitude.

Fix (8073c126): also save odet.ca_l / odet.ca_r in the
helper's saved tuple and copy them onto fresh_odet after the dense
pass returns. The fresh EL odet's own ca_l/ca_r are discarded —
they're not needed because compute_delta_prime_matrix! uses
propagators/chunks (not ca) and the bit-identical-ξ test in
runtests_parallel_integration.jl only compares
u_store/ud_store/psi_store/q_store/crit_store (not ca).

Verification: after the fix, max|resonant_flux| = 0.005813333867190486
on diiid_n1 — bit-identical to 3c8130da (the perf/riccati
pre-dense-pass tip). Full test suite still 846/846.

`solovev_n1` Ref 1 replay failure

The cached develop @ e0ee9dff entry for solovev_n1 fails to replay
inside the harness (runner.jl:384). Not a code issue — the harness
cache for that one case has gone stale. Local ran cleanly with
ODE steps = 274, et[1] = +9.985e-02, qmax = 3.147 (consistent
with the LAR scan results).

Known issues / open questions

Multi-resonance surface support in compute_delta_prime_matrix!
(this PR makes the path graceful but doesn't fully extend it). When a
multi-n run produces a singular surface satisfying two distinct
(m, n) tuples (e.g. q = 2 with both (m=2, n=1) and (m=4, n=2)),
the inter-surface Δ' BVP is now skipped with a warning instead of
crashing. Per-surface Δ' (singular/delta_prime, delta_prime_col)
is still produced. Tracked as a follow-up: matrix shape becomes
(n_res_total, n_res_total) with a (surface, mode, side) ↔ BVP-row
map; the BVP matrix size stays (2 + 4·msing)·N so the algorithmic
change is bookkeeping, not new math (or at least I don't think so?).
Reverse-shear q-profile handling in sing_find! (pre-existing,
separate code path from the multi-resonance issue above; not fixed by
this PR). Question
for review: can DCON or STRIDE type codes be run on a reverse shear equilibrium with multiple e.g. q = 2 crossings? If so, such an equilibrium where q
crosses a rational value at two distinct ψ locations,
the existing-surface check in sing_find! (Sing.jl:32) keys only
on s.q, so the second crossing's (m, n) is pushed into the
existing SingType's m/n vectors — silently losing the second ψ
location. Two distinct singular surfaces collapse into one with
multiplicity-2 m/n vectors.

Pre-merge checklist

logan-nc · 2026-05-15T16:36:15Z

@claude review this PR. Be succinct. Do not summarize. Just point out any bugs or potential improvements needed prior to merging.

logan-nc · 2026-05-15T16:37:30Z

@d-burg thanks for getting this to the finish line! I am out of the office today but will give it a read through as soon as I can. I really appreciate the great effort here. This is a 🏆 feature.

claude · 2026-05-15T16:42:57Z

Code Review — PR #178: Dual Riccati + Parallel FM integration + Δ' output

This PR implements two new integration paths and full Δ' computation. The physics is sound and the validation against Fortran STRIDE is well-documented. Issues below range from blocking to advisory.

Breaking default changes (most critical)

use_parallel = true is the new default. This is the single biggest concern. The parallel path:

Runs the Δ' STRIDE BVP matrix on every run via compute_delta_prime_matrix! — extra cost not reflected in the benchmark table.
With populate_dense_xi = true (also default), appends a full serial EL pass after the parallel BVP. The PR description benchmarks the Riccati path at 2.83× speedup, but the actual default runtime with use_parallel + populate_dense_xi is closer to 2× the serial path (parallel BVP ~1× + serial EL ~1×), which is a net regression, not a speedup. This should be measured and documented honestly before making it default.
Is thread-nondeterministic. The test comment in runtests_fullruns.jl explicitly says: "rtol is loose because this result is thread-count sensitive (drifts ~15% between single- and multi-threaded invocations)." A 15% energy variation based on OS scheduler decisions is a correctness concern, not just a performance note.

Other changed defaults that will silently alter existing user results without opt-in:

eulerlagrange_tolerance: 1e-7 → 1e-8 (tighter, but silent)
singfac_min: 0.0 → 1e-4 (required nonzero for parallel path, but changes standard path behavior too)
sing_order: 2 → 6 (more expensive asymptotic expansion everywhere)
parallel_threads: 1 → 2

Recommendation: Default use_parallel = false until the thread-nondeterminism is resolved and the performance accounting includes the populate_dense_xi overhead.

`use_double64_bvp` missing from docstring

use_double64_bvp::Bool = true is added to ForceFreeStatesControl but not listed in the struct's docstring. Every other field has a bullet; this one is silently absent.

Fullruns test tolerance is not a meaningful regression test

In runtests_fullruns.jl, the test uses rtol = 0.2 (±20%) on an expected value of -0.18. The comment says the result drifts ±15% by thread count, so the test can pass and fail on the same code depending on how many Julia threads are available. This is a compile check, not a regression guard. The thread-nondeterminism needs to be resolved before this test can have a meaningful tolerance.

The Δ' regression tests in runtests_parallel_integration.jl use rtol=0.05. For the Solovev case, some Δ' values have imaginary parts 40–500× larger than the real parts (e.g., -7.278138e+00 + 4.172681e+03im). The tests should either target the physically meaningful component or explain why the complex value is the right quantity to pin.

`benchmark_integration_paths.jl` mutates production example files

The benchmark writes a modified gpec.toml in place and restores it in a finally block. A Julia crash (OOM, segfault, SIGKILL) will leave the example's gpec.toml modified. The benchmark should copy the example to a mktempdir and run there, as benchmark_xi_parallel_vs_serial.jl already does.

`diagnose_profiles.jl` hard-codes a private git branch

The file calls git show perf/riccati-full-geqdsk-scans:examples/LAR_epsilon_scan/equilibria/$fname. This branch is not part of the main repo and will fail for all contributors who don't have it locally. Document the prerequisite clearly or make the file self-contained.

Docs files not wired into Documenter

docs/delta_prime_numerical_analysis.md and docs/stride_delta_prime_validation.md are placed in docs/ (not docs/src/) and are not added to docs/make.jl. They won't appear in the public documentation site. Move them to docs/src/ and add them to make.jl if they are permanent reference material.

Minor issues

balance_integration_chunks test: @test length(balanced) >= min(target_n, length(base_chunks) * 50) — in the common case this asserts >= target_n. The function produces at most target_n chunks; the test should use == or justify >=.
Edge-scan restore logic: After the diagnostic-only scan, the code saves/restores odet.psifac and odet.u, but findmax_dW_edge! also writes to odet.edge_scan. The comment should document all mutated state so future maintainers don't inadvertently add more.
Regression harness: The PR description compares develop @ e0ee9dff vs local, which conflates develop-side changes with PR changes. Per CLAUDE.md, the harness should compare develop vs the PR branch directly.
benchmark_xi_parallel_vs_serial.jl calls Pkg.activate inside the script, overriding the project environment when the script is already expected to run with --project=... Remove the Pkg.activate call.

What works well

The bidirectional integration approach for crossing chunks is well-motivated: the direction field cleanly tracks backward chunks, balance_integration_chunks preserves it correctly when splitting, and apply_propagator_inverse! provides the matching inverse. The edge-dW decoupling from integration truncation is a genuine correctness fix for Δ' at the outermost surface, and the opt-in truncate_at_dW_peak flag preserves backward compatibility without silent behavior changes. The docs/src/stability.md documentation is thorough and useful.

logan-nc · 2026-05-15T17:16:00Z

@d-burg I am ok with changing defaults, as we are still in the chaos of active dev.
But I think the rest of the claude review points are worth addressing. Especially the danger of the gpec.toml restore approach. Can you knock 'em out?

d-burg · 2026-05-15T17:22:07Z

@logan-nc on it right now! These were helpful catches

d-burg · 2026-05-19T16:10:14Z

Proposal: top-level `[Run]` section with three physics flags

Currently what GPEC computes is gated by a scattered set of low-level flags plus implicit conditions: two same-named force_termination flags in different structs, use_parallel implicitly unlocking the inter-surface Δ' matrix, and [PerturbedEquilibrium] section-presence as the PE on/off switch. Before final cleanup I'd like to add a brief user-facing layer:

[Run]
compute_ideal_stability       = true   # δW + per-surface Δ'
compute_deltaprime_matrix     = false  # Inter-surface Δ' matrix (STRIDE BVP, Glasser 2018)
compute_perturbed_equilibrium = false  # GPEC response, island widths, Chirikov, resonant flux

The driver auto-derives ode_flag, vac_flag, use_parallel, populate_dense_xi, and the PE entry point from these three. Explicit low-level flags still work as overrides.

Mode coverage:

Mode	`ideal_stability`	`deltaprime_matrix`	`perturbed_equilibrium`
Equilibrium only	F	F	F
Ideal stability + δW	T	F	F
Δ' matrix (tearing study, no PE)	T	T	F
GPEC response (no Δ' matrix)	T	F	T
Full pipeline	T	T	T

When deltaprime_matrix=true the parallel FM path is used, which leaves odet.u_store in (S, I) Riccati basis and ud_store ≈ 0 – unusable by PerturbedEquilibrium.FieldReconstruction, which expects axis-basis (EL basis) dense ξ. The dense-ξ pass (_populate_dense_xi_via_serial_el!) re-runs a serial Euler–Lagrange integration after the parallel BVP to repopulate ξ in axis basis. So whenever deltaprime_matrix=true AND perturbed_equilibrium=true, the driver must force populate_dense_xi=true – otherwise PE silently reads Riccati-basis garbage. The proposed auto-derivation does exactly this. With only one of the two set, populate_dense_xi stays off (no wasted serial pass for tearing-only studies; PE-only runs use the standard serial path and don't need it either). Kinetic damping stays orthogonal (continuous kinetic_factor).

What do you think @logan-nc @matt-pharr ?

logan-nc · 2026-05-20T00:05:27Z

@d-burg did you address the review issues? Please push so I can do a final pass

@claude

delta_prime_numerical_analysis.md and stride_delta_prime_validation.md are internal development notes (numerical-sensitivity analysis and validation log for the STRIDE Δ' BVP) — useful for our own reference but not appropriate for the public docs site. Archived to ~/CTM-processing/GPEC_validation/ outside the repo and removed from docs/. Addresses @claude review feedback that flagged these files as "in docs/ but not in docs/src/, not wired into Documenter, won't appear on the public docs site."

@claude

Bundles four small @claude review responses with no behavioural impact on the main pipeline: 1. **use_double64_bvp docstring entry.** Field exists in ForceFreeStatesControl (default true, plumbed through to compute_delta_prime_matrix! in Riccati.jl) but the struct docstring's ## Fields list omitted it. Add a bullet describing what the flag controls (Double64-precision Δ' BVP solve to preserve significance through the PEST3 cancellation), its scope (only with use_parallel = true), and its cost (~1.5–2× the BVP solve). 2. **balance_integration_chunks test tightened to ==.** The function exits its while loop when length(result) >= target_n and adds exactly one chunk per iteration, so under normal conditions length(balanced) is exactly target_n. The previous `>= min(target_n, length(base_chunks) * 50)` was correct but sloppy. Also fix the test's target_n formula to mirror the function — the test was missing the min_bvp_intervals term, so the previous `>=` would have failed silently if the assertion were ever tightened. 3. **Edge-scan save/restore comment.** Clarify that findmax_dW_edge! also (re)allocates odet.edge_scan, which is the diagnostic product and is intentionally NOT restored alongside psifac/u. Helps future maintainers understand which state is restored and which is intentionally produced. 4. **Drop Pkg.activate from benchmark_xi_parallel_vs_serial.jl.** The script is documented to run with `julia --project=..`, so the in-script activate was redundant and could mask environment issues. All 127 runtests_parallel_integration.jl tests pass.

@claude

…was historical) The runtests_fullruns.jl kinetic multi-n test was widened to `rtol = 0.2` on expected `et[1] ≈ -0.18` because of an observed ~15% drift between thread counts. That drift is no longer present: a sweep on this exact case across julia_nthreads ∈ {1, 2, 4} parallel_threads ∈ {1, 2, 4} (capped by julia_nthreads) use_parallel ∈ {true, false} produces `et_re = -0.193593591803846` bit-identical to 15 decimal digits in every one of the 9 configurations. The drift was almost certainly removed by commit 5d5b8ee (edge-dW silent psilim truncation decoupling): pre-fix, the dW peak's thread-sensitive sampling silently moved the integration limit, which fed back into the kinetic eigenvalue. Post-fix, psilim is fixed by qhigh/psihigh regardless of dW peak, and the result settles deterministically. Test now pins `et[1] ≈ -0.193593591803846 rtol = 1e-6`, with a comment explaining the determinism and the historical context. The old expected value (-0.18) was a guess; the new one is the actual bit-deterministic answer. Addresses @claude review feedback on PR 178: "rtol=0.2 is not a meaningful regression test — passes and fails on the same code depending on thread count."

@claude

…default flips Bundles three coupled changes responding to @claude review feedback on flag surfacing, plus the test re-pin needed to keep regression coverage intact: 1. **Remove `use_double64_bvp` flag, hardcode `Complex{Double64}`** in `compute_delta_prime_matrix!`. Parameter sensitivity study had already confirmed F64 vs Double64 makes no measurable difference on the validation cases (precision bottleneck is upstream of the BVP linear algebra), but Double64 is the conservative choice for the catastrophic PEST3 cancellation at low ε/β. Cost is ~1.5–2× the BVP solve, which is a small fraction of total Δ' wall-clock. Removing the knob simplifies the API without losing the safer behavior. 2. **Flip `set_psilim_via_dmlim` default false → true.** Fortran STRIDE found that truncating ~20 % above the outermost rational (`dmlim = 0.2`) avoids a numerical kink instability in δW that appears when the integration ends too close to or just below a rational surface. For diverted equilibria (q → ∞ at the separatrix — bulk of production use) this costs negligible physical domain because rationals get arbitrarily dense near the LCFS, so `true` is the safe and recommended default. For limited circular / analytical equilibria (Solovev, LAR scans) rationals are sparse and 20 % above the last rational chops too much edge — those examples now set `set_psilim_via_dmlim = false` explicitly. Updated docstring with the full physics + when-to-use guidance. 3. **`sing_lim!` skip-with-warning on multi-n with `set_psilim_via_dmlim = true`.** The dmlim truncation is ambiguous when n varies (which n defines "outermost rational + dmlim/n"?), but the previous behavior was a hard `error()` that would crash any multi-n run if the user forgot to override the new default. `sing_lim!` now warns and falls back to qhigh/psihigh truncation so production users running multi-n on diverted geqdsks don't need to remember to override the default. 4. **Surface all Δ' BVP / parallel flags explicitly in 10 example/test TOMLs.** `use_parallel`, `parallel_threads`, `populate_dense_xi`, `truncate_at_dW_peak`, `set_psilim_via_dmlim`, `dmlim` are now explicit (not commented) in every `gpec.toml`. DIIID-like sets `set_psilim_via_dmlim = true` (diverted production); all 9 Solovev/LAR/multi-n cases set it to `false` with an annotation explaining the limited-vs-multi-n reason. 5. **Re-pin DIIID-like Δ' regression values in `runtests_parallel_integration.jl`.** With `set_psilim_via_dmlim = true` on DIIID-like, `et_par` shifted +24 % (1.29 → 1.5988) and `dpm[5,5]` shifted −6.4 % (only `et_par` and `dpm[5,5]` fell outside the existing rtol = 5 %; other `dpm[i,i]` values drifted 0.4–1.2 %, within tolerance). Per-surface `sing[*].delta_prime[1]` are computed up to each rational and barely moved (≲ 1e-4 %), confirming the per-surface calculation is robust to edge-truncation choice. Re-pinned all values to current measurements with comments explaining the shifts. **Regression-harness expectation:** `diiid_n1` baselines will shift on this PR — intentional, reflecting the new production-correct DIIID-like configuration. `solovev_n1` and `solovev_multi_n` stay unchanged (those examples explicitly set `set_psilim_via_dmlim = false`). All 9/9 `runtests_fullruns.jl`, 24/24 `runtests_riccati.jl`, and 127/127 `runtests_parallel_integration.jl` pass.

…surface Δ' (stub); BVP matrix is canonical The per-surface Δ' computed in `riccati_cross_ideal_singular_surf!` from (ca_r − ca_l) at each crossing is a stub calculation that doesn't agree with the canonical STRIDE BVP Δ' matrix from `compute_delta_prime_matrix!`. It's retained in the code (`intr.sing[*].delta_prime` / `delta_prime_col` fields) for diagnostic / future-work use, but no longer reported, output, or regression tested on any actual equilibrium. The BVP matrix diagonal is now the canonical Δ' everywhere downstream. **Solovev wall reverted to close conformal a=0.2415.** The earlier nowall change (prior commits) made the Solovev fixture strongly kink-unstable (et[1] = -6.8) because this equilibrium (q₀=1.9, e=1.6) is intrinsically free-boundary kink unstable without wall stabilization. With the close conformal wall it's marginally stable (et[1] = +0.24). Probe sweep over q₀ ∈ [1.1, 3.0] and shape e ∈ [1.0, 2.0] found no Solovev configuration that's both stable AND multi-resonance AND clean-BVP-Δ' — the family is fundamentally too kink-prone. Documented in the TOML comment so future contributors don't re-derive this finding. **Per-surface Δ' regression tests dropped:** - `runtests_parallel_integration.jl`: 7 per-surface assertions (Solovev sing[1-2], DIIID-like sing[1-5]) plus the entire Solovev BVP Δ' matrix testset (pinned values near marginal stability, ~10⁵-10¹¹ magnitudes with |Im/Re| ≫ 1). - `runtests_riccati.jl`: entire `Δ' computed by Riccati path — Solovev regression` testset (10 assertions). The DIIID-like BVP Δ' regression testset stays — that fixture is intrinsically stable (et[1] = +1.6) so the BVP matrix is well-conditioned and meaningful. Net test counts: parallel-integration 127 → 106, riccati 24 → 14. **HDF5 outputs cleaned up:** - Drop `singular/delta_prime` (FFS per-surface stub). - Drop `singular/delta_prime_col` (FFS per-surface column stub). - Drop `perturbed_equilibrium/singular_coupling/delta_prime` (PE redundant with the canonical BVP value). Only `singular/delta_prime_matrix` (the STRIDE BVP) carries Δ' through HDF5. **`PerturbedEquilibrium.SingularCoupling`** now reads `ffs_intr.delta_prime_matrix` diagonal into `state.delta_prime` instead of computing the stub from (rbwp1 − lbwp1) / (2π·χ'). Falls back to NaN when the BVP matrix isn't populated (kinetic_factor > 0, multi-resonance multi-n). `lbwp1` and `rbwp1` are still used for the resonant current calculation (which is a different physical quantity — field-derivative jump weighted by current density, not Δ'). **`Analysis.plot_driven_delta_prime`** rewired to read `singular/delta_prime_matrix` diagonal — the canonical Δ' — instead of the PE stub field that no longer exists in HDF5. **Regression harness** `diiid_n1.toml`: the `[quantities.delta_prime]` track now reads `singular/delta_prime_matrix` via a new `diagonal_complex` extractor (small extractor.jl extension). Was previously reading the PE stub value; now tracks the canonical BVP diagonal. Values will shift on this PR (intentional — the new track is physically meaningful). **Per-surface stub kept in code** with a prominent comment in `riccati_cross_ideal_singular_surf!` explaining that the calculation lives on for future work but should not be relied on for physics, output, or regression. Tests: parallel_integration 106/106 ✓, riccati 14/14 ✓, fullruns 9/9 ✓.

@debug

…1, V4) - H1: Move Random to [extras]/[targets].test; DoubleFloats opt-in via ctrl.extended_precision_bvp (default true; Float64 drifts imag Δ' 2-5x on DIIID) - H2: Delete dead integrate_backward_chunk_fms; clarify riccati_der! and compute_delta_prime_from_ca! as reference/stub-only; mark per-surface delta_prime/delta_prime_col on SingType as stubs (BVP matrix is canonical) - H3: Decompose compute_delta_prime_matrix! (540 to 63 LOC + 11 helpers), parallel_eulerlagrange_integration (281 to 36 LOC + 7 helpers), riccati_cross_ideal_singular_surf! (122 to 20 LOC + 6 helpers). Bit-identical. - H4: @info to @debug for heavy per-crossing vmat/asymptotic diagnostics - H5: Guard FM-axis-BC fallback against direction=-1 crossing chunks - D1: Inline equation citations (Eq. 19, 29, 31, 33, 37 + STRIDE sing_vmat) - D2: Stale Tsit5/5th-order docstrings to Vern9/9th-order - D3: Name SAVE_NEAR_END_FRAC, SAVE_NEAR_END_PSI, ODE_COST_AXIS/RAT/EDGE; document ucrit=1e4 rationale - P1: Auto-skip populate_dense_xi serial-EL pass when force_termination=true - V1: Tighten runtests_riccati.jl Solovev rtol 1e-2 to 1e-4 (PR claims 0.006%) - V4: Split delta_prime_matrix rtol by entry magnitude (small entries 1e-2; large-magnitude FP-sensitive entries bracket |dpm|) - Fix sing_lim! NaN qlim when nn_low <= 0 (guard dmlim branch) - Platform-tolerance brackets on et[1] tests (Apple/Linux FP drift ~20%) Full Pkg.test() suite passes on Apple aarch64 / Julia 1.11. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts: # Project.toml # src/Analysis/PerturbedEquilibrium.jl # src/Equilibrium/EquilibriumTypes.jl # src/ForceFreeStates/EulerLagrange.jl # src/GeneralizedPerturbedEquilibrium.jl # src/PerturbedEquilibrium/SingularCoupling.jl # src/PerturbedEquilibrium/Utils.jl

d-burg · 2026-05-22T20:38:05Z

Two new commits address the latest pre-merge audit and bring the branch up to date with develop.

a89bd5d2

Project.toml: Random moved to [extras]/[targets].test; DoubleFloats opt-in via ctrl.extended_precision_bvp (default true — Float64 drifts imaginary Δ' by 2–5× on DIIID).
Stubs: SingType.delta_prime / delta_prime_col and compute_delta_prime_from_ca! explicitly annotated as stubs; canonical Δ' is ForceFreeStatesInternal.delta_prime_matrix (STRIDE BVP).
Decompositions: compute_delta_prime_matrix! 540 → 63 LOC (+11 helpers), parallel_eulerlagrange_integration 281 → 36 LOC (+7), riccati_cross_ideal_singular_surf! 122 → 20 LOC (+6). All bit-identical.
Logging: Per-crossing vmat/asymptotic dumps demoted from @info to @debug.
Correctness: Assertion guards FM-axis BVP fallback against direction=-1 crossing chunks.
Inline citations: Glasser 2018 Eqs. 19 / 29 / 31 / 33 / 37; STRIDE sing_vmat reference in Sing.jl.
Naming: SAVE_NEAR_END_FRAC, SAVE_NEAR_END_PSI, ODE_COST_AXIS/RAT/EDGE constants; ucrit=1e4 rationale documented.
Performance: Auto-skip the populate_dense_xi serial-EL pass when force_termination=true (eliminates ~1× wasted runtime in non-PE runs).
Tests: Tightened Riccati Solovev rtol 1e-2 → 1e-4 (matches PR's 0.006% claim). Δ' matrix tolerances split by entry magnitude / |Im|/|Re| ratio (V4). Platform-tolerance brackets on FP-sensitive et[1] values. Fixed sing_lim! NaN when nn_low ≤ 0.

ace5e21c — Merge origin/develop

7 conflict files resolved:

Project.toml — kept julia=1.11 (develop) + [extras] (ours).
EquilibriumTypes.jl, GeneralizedPerturbedEquilibrium.jl, EulerLagrange.jl — preserved both sides.
SingularCoupling.jl, Analysis/PerturbedEquilibrium.jl, PerturbedEquilibrium/Utils.jl — took develop's PE refactor in full; clarified that ForceFreeStates' BVP delta_prime_matrix (equilibrium-intrinsic) and PE's state.delta_prime (forcing-driven) are distinct physical quantities sharing only a symbol — both now written independently to HDF5.

Three latent runtime crashes from auto-merge gaps also patched: missing use_galgrid struct field; missing bt_sign arg in DirectRunInput calls (AnalyticEquilibrium.jl, ReadEquilibrium.jl IMAS reader); stale state.coupling_coefficient / state.resonant_amplitude HDF5 writes (fields removed by develop's refactor).

Verification

Pkg.test() passes the full suite on the merged tree (Apple aarch64 / Julia 1.11).
Local regression-harness runs cleanly on all three standard cases (diiid_n1, solovev_n1, solovev_multi_n); develop-side comparison currently blocked by a harness/Pkg.instantiate FastInterpolations-BC issue independent of this PR.

…B3 guard, H1-H3, populate_dense_xi default flip) Bundle of small, audit-driven fixes ahead of merging perf/riccati. No numerical changes on tested platforms; all 19 testsets pass post-fix. **B1 — Per-thread `ffit_hint` in `sing_der!` hot path.** Replaces 21 calls in `sing_der!` (kinetic + ideal paths) of the form `hint=ffit._hint` (shared `Ref{Int}` mutated by every worker thread) with `hint=odet.ffit_hint`, where `odet` is already cloned per thread via `odet_proxies` in the parallel BVP path. Adds matching `odet_proxy.ffit_hint[] = 1` resets next to the existing `spline_hint[] = 1` resets at all four proxy-setup sites in `Riccati.jl`. Defensive: M1 Max reproducer showed 19 runs (t ∈ {1,4,8}, parallel_threads ∈ {2,8}) bit-identical to `-0.193593591803846` with the shared `Ref` in place, because `FastInterpolations.RefHint` is stale-tolerant — the race exists in source but does not produce numerical drift on tested platforms. Fix removes the only remaining theoretical race on the parallel path and completes the per-thread isolation pattern. `compute_sing_asymptotics`, `_log_bvp_pest3`, and test/benchmark code keep `ffit._hint` (all serial setup or debug). **B3 — `assemble_fm_matrix` size inference.** Determine `N` from `T_init` if provided, else from `propagators[first(idx_range)]` when the range is non-empty, else assert and fall back. Empty-range guard still fires; the change makes the size-inference robust against an empty `propagators` list (degenerate msing=0 chunking). **H1 — `parallel_threads` honored in `balance_integration_chunks`.** Uses `min(Threads.nthreads(), ctrl.parallel_threads)` instead of `Threads.nthreads()` when computing sub-chunk target count. A user on `julia -t 16` setting `parallel_threads=2` for determinism no longer pays for 8× the requested sub-chunk count. **H2 — Drop re-introduced Fortran line citations in `Sing.jl`.** Removes `[Fortran sing.f lines XXXX]` annotations on lines 838-840, 862, 870 (reintroduced via the kinetic merge after commit b9c177e explicitly removed them). Logan 2015 App. C eq. refs already on line 837 carry the provenance. **H3 — Compress historical-narration block in HDF5 writer.** `GeneralizedPerturbedEquilibrium.jl:534-540` (7-line block explaining what was previously emitted) → 1-line pointer to the `SingType.delta_prime` docstring. Aligns with CLAUDE.md "Keep code comments concise" rule. **Default flip: `populate_dense_xi` true → false** (`ForceFreeStatesStructs.jl:289`). Motivation is *clarity of intent* (set this flag only if PerturbedEquilibrium will consume dense axis-basis ξ), not the "75 % regression rescue" framing floated in the audit. The audit estimate was extrapolated from a small-N (DIIID N=26 force_termination=true) benchmark setup; on full-scale user-facing examples the dense-xi serial-EL re-run costs only ~1× the *parallel BVP* wall-clock (not ~1× standalone serial EL). Empirically on `examples/Solovev_ideal_example` (delta_m=8 → mpert ~25): - use_parallel=true + populate=true : ~97 s - use_parallel=false : ~494 s The parallel BVP path wins by ~5× on this configuration even with the dense-xi pass enabled; flipping use_parallel=false to "skip the wasted re-run" is a measurement-grade regression on real configs. The default flip therefore stands as a UI clarification: PE-using TOMLs explicitly opt into `populate_dense_xi = true`, non-PE TOMLs leave it default false (saving ~10–30 % parallel-BVP-wall-clock, not 75 %). Example TOMLs updated accordingly: - 2 PE examples (Solovev_ideal_example, DIIID-like_ideal_example): explicit `populate_dense_xi = true` with strengthened comment explaining the requirement. - 4 non-PE examples (LAR_beta_scan, LAR_epsilon_scan, Solovev_ideal_example_3D, Solovev_ideal_example_multi_n): flip to `populate_dense_xi = false`. All four keep `use_parallel = true` because the parallel BVP is faster than serial EL on large grids regardless of populate setting; the Solovev multi-n and 3D examples pick up explicit comments documenting the empirical speedup. - 4 test fixtures in `test/test_data/` intentionally untouched to preserve their pinned et[1] regression values. **Tightened kinetic multi-n rtol.** `runtests_fullruns.jl`: replaces the decade-wide bracket (`-0.30 < et < -0.10`) with a tight pin `isapprox(et, -0.193593591803846; rtol=1e-3)`. Justified by the M1 Max bit-identity measurement (19 runs across thread sweeps); the prior "Apple silicon drifts ~20 %" warning in the test comment does not reproduce on the current code path. 1e-3 catches real regressions in the kinetic / edge-dW path while tolerating cross-platform / BLAS drift.

d-burg · 2026-05-26T02:40:33Z

Pre-merge audit cleanup — 4fbd882
Five small fixes addressing the most recent audit, all green on 19 testsets.

B1 — Per-thread ffit_hint in sing_der!. 21 hot-path callsites converted from shared ffit._hint::Ref{Int} to odet.ffit_hint (per-thread via odet_proxies). Honest framing: I built a reproducer on Apple M1 Max — 19 runs across julia_nthreads ∈ {1,4,8} × parallel_threads ∈ {2,8} were all bit-identical to -0.193593591803846. The race exists in source but FastInterpolations.RefHint is stale-tolerant so it doesn't produce drift in practice. Fix is defensive (completes the per-thread isolation pattern, removes a future-flake source if FastInterpolations' hint semantics change), not a drift fix.

B3 — assemble_fm_matrix size inference. Empty-range guard ordering fixed; size now inferred from T_init or propagators[first(idx_range)] instead of unconditionally dereferencing propagators[1].

H1 — parallel_threads honored in balance_integration_chunks. Uses min(nthreads, ctrl.parallel_threads); a julia -t 16 user setting parallel_threads=2 no longer pays for 8× the requested sub-chunks.

H2 / H3 — Comment cleanup. Drops re-introduced [Fortran sing.f lines XXXX] brackets in Sing.jl (Logan 2015 App. C eq. refs already carry the provenance); compresses the 7-line historical-narration block in the HDF5 writer to one line per CLAUDE.md.

populate_dense_xi default flip (true → false) — clarity, not perf rescue. The audit's "75 % regression" framing was extrapolated from a small-N benchmark and doesn't generalize. Empirical on examples/Solovev_ideal_example (delta_m=8): parallel+populate ≈ 97 s, serial only ≈ 494 s — parallel wins by 5× even with populate=true. The flip stands as a UI clarification (PE-using TOMLs opt in explicitly; non-PE TOMLs save ~10-30 % parallel-BVP wall-clock). 2 PE examples set populate_dense_xi = true explicitly with a strengthened comment; 4 non-PE examples flip to false; 4 test fixtures untouched to preserve pinned et[1] values.

Tightened kinetic multi-n rtol. runtests_fullruns.jl: decade-wide bracket → isapprox(et, -0.193593591803846; rtol=1e-3). The "Apple silicon drifts ~20 %" warning in the old comment doesn't reproduce on the current code path.

logan-nc and others added 4 commits February 28, 2026 01:03

logan-nc changed the title ~~ForceFreeStates - NEW FEATURE - Dual Riccati EL integration (1.2–1.6x speedup)~~ ForceFreeStates: Dual Riccati + Parallel FM integration + Δ' output Mar 1, 2026

logan-nc and others added 3 commits February 28, 2026 23:16

logan-nc self-assigned this Mar 8, 2026

logan-nc and others added 8 commits March 8, 2026 17:27

logan-nc and others added 3 commits March 9, 2026 07:49

logan-nc requested review from StuartBenjamin, matt-pharr and parkjk March 24, 2026 10:26

logan-nc added the enhancement New feature or request label Mar 24, 2026

logan-nc mentioned this pull request Mar 27, 2026

FFS: Reverse integration to reduce spurious core steps #205

Open

Merge develop into perf/riccati

a63dc69

d-burg and others added 2 commits April 28, 2026 13:52

d-burg and others added 7 commits May 13, 2026 13:57

d-burg and others added 7 commits May 21, 2026 12:51

Conversation

logan-nc commented Feb 28, 2026 • edited by d-burg Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Accuracy

Performance (4 threads)

Tests

Breaking behaviour change

Regression harness snapshot (develop @ e0ee9df vs perf/riccati @ 685a92a)

Files changed

Code review responses

Uh oh!

logan-nc commented Mar 8, 2026

Uh oh!

claude Bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Bug: assemble_fm_matrix panics on empty range

Minor: apply_propagator! allocates on every call in the serial assembly loop

Minor: Stale comment in runtests_parallel_integration.jl

Minor: Dead code — riccati_der! and compute_delta_prime_from_ca!

Minor: compute_delta_prime_matrix! single-resonance assumption not asserted early

Minor: Test boilerplate duplication

Minor: parallel_threads field now superseded

Positive observations

Uh oh!

logan-nc commented Mar 9, 2026

Post-merge update — after develop (GPEC rename) merge

Test results (post-merge)

Benchmark results (post-merge, 1 thread)

Benchmark results (post-merge, 4 threads)

Uh oh!

logan-nc commented Mar 9, 2026

Pre-merge cleanup (commits 1515591..725a527)

CI fixes (1515591)

Documentation (142a79c)

Pre-merge code review fixes (725a527)

Uh oh!

logan-nc commented Apr 22, 2026

Uh oh!

d-burg commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matt-pharr commented May 8, 2026

Uh oh!

d-burg commented May 14, 2026

perf/riccati: parallel ξ + Δ' from one run, TJ-analytic rename, multi-resonance skip

Summary

Commits since c6c845f

"What's in the box" (a proper Claude-ism)

ForceFreeStates — dense ξ in the parallel path (5acf1478)

EQUIL — TJ → TJ-analytic rename + Fitzpatrick citation (6d07c07d, cc2affeb)

EXAMPLES — LAR scans (085de133) + TJ_epsilon_pole removal (5a4c2c29)

Default behaviour after merge

Tests

Regression-harness sweep against develop

Expected shifts (introduced earlier in perf/riccati, intentional)

||resonant flux|| blow-up — diagnosed and fixed in this PR (8073c126)

solovev_n1 Ref 1 replay failure

Known issues / open questions

Pre-merge checklist

Uh oh!

logan-nc commented May 15, 2026

Uh oh!

logan-nc commented May 15, 2026

Uh oh!

claude Bot commented May 15, 2026

Code Review — PR #178: Dual Riccati + Parallel FM integration + Δ' output

Breaking default changes (most critical)

use_double64_bvp missing from docstring

Fullruns test tolerance is not a meaningful regression test

benchmark_integration_paths.jl mutates production example files

diagnose_profiles.jl hard-codes a private git branch

Docs files not wired into Documenter

Minor issues

What works well

Uh oh!

logan-nc commented May 15, 2026

Uh oh!

d-burg commented May 15, 2026

logan-nc commented Feb 28, 2026 •

edited by d-burg

Loading

Regression harness snapshot (develop @ `e0ee9df` vs perf/riccati @ `685a92a`)

claude Bot commented Mar 8, 2026 •

edited

Loading

Bug: `assemble_fm_matrix` panics on empty range

Minor: `apply_propagator!` allocates on every call in the serial assembly loop

Minor: Stale comment in `runtests_parallel_integration.jl`

Minor: Dead code — `riccati_der!` and `compute_delta_prime_from_ca!`

Minor: `compute_delta_prime_matrix!` single-resonance assumption not asserted early

Minor: `parallel_threads` field now superseded

Pre-merge cleanup (commits `1515591`..`725a527`)

CI fixes (`1515591`)

Documentation (`142a79c`)

Pre-merge code review fixes (`725a527`)

d-burg commented Apr 22, 2026 •

edited

Loading

Commits since `c6c845f`

`ForceFreeStates` — dense ξ in the parallel path (`5acf1478`)

`EQUIL` — TJ → TJ-analytic rename + Fitzpatrick citation (`6d07c07d`, `cc2affeb`)

`EXAMPLES` — LAR scans (`085de133`) + TJ_epsilon_pole removal (`5a4c2c29`)

Regression-harness sweep against `develop`

`||resonant flux||` blow-up — diagnosed and fixed in this PR (`8073c126`)

`solovev_n1` Ref 1 replay failure

`use_double64_bvp` missing from docstring

`benchmark_integration_paths.jl` mutates production example files

`diagnose_profiles.jl` hard-codes a private git branch

Proposal: top-level `[Run]` section with three physics flags