Add qr build option for numerically stable GAP fitting#720
Merged
albapa merged 1 commit intoApr 27, 2026
Conversation
Add a new meson build option 'qr' (default: true) that enables the -DHAVE_QR compiler flag. This activates the QR decomposition solve path in gp_predict.F90, which is numerically stable for ill-conditioned kernel matrices that cause Cholesky (dpotrf) to fail with 'LA_Matrix_Factorise: cannot factorise'. The QR code path already exists in the GAP source (gp_predict.F90, lines 782-905) but was never wired into the build system. Fixes issues where gap_fit crashes on SOAP descriptors with moderate sparse point counts (500-2000).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a new meson build option qr (default: true) that enables the -DHAVE_QR compiler flag.
This activates the QR decomposition solve path in gp_predict.F90, which is numerically stable
for ill-conditioned kernel matrices that cause Cholesky (dpotrf) to fail with
LA_Matrix_Factorise: cannot factorise.
Problem
gap_fit routinely crashes with LA_Matrix_Factorise: cannot factorise, error: 8/9 when fitting
SOAP descriptors with moderate sparse point counts (500–2000). The Cholesky factorization of
the kernel matrix Q = C_MM + C_MN * Λ⁻¹ * C_NM fails when Q is ill-conditioned. This is a
known issue — see #229, #132, #312.
The only documented workaround (sparse_jitter) does not reliably fix it. I tested 4
combinations of sparse_jitter (1e-10 to 1e-2), zeta (2–4), n_sparse (500–2000), and
default_sigma — all crashed at the same point.
Solution
QUIP already contains a complete QR decomposition solve path in src/GAP/gp_predict.F90 (lines
782–905), gated behind #ifdef HAVE_QR. This path reformulates the system as a tall
least-squares problem solved via LAPACK's dgeqrf, which is numerically stable for
ill-conditioned matrices. The required LAPACK routines are already provided by
OpenBLAS/LAPACK.
This PR adds a meson build option to enable it, following the existing pattern of gap and mpi
options:
meson.options
option('qr', type: 'boolean', value: true, description: 'Use QR decomposition for numerically
stable GAP fitting (requires GAP)')
Usage
Default (QR enabled)
meson setup builddir -Dgap=true
Explicitly disable QR (old Cholesky behavior)
meson setup builddir -Dgap=true -Dqr=false
Test Results
Cu FCC, 108-atom supercell, 350 training frames, SOAP cutoff=5.28 Å:
Before (Cholesky): SYSTEM ABORT: LA_Matrix_Factorise: cannot factorise, error: 9 — crashed on
every attempt.
After (QR, -Dqr=true):
┌─────────────────┬──────────────────────┬───────────────────────┐
│ │ n_sparse=500, zeta=2 │ n_sparse=2000, zeta=4 │
├─────────────────┼──────────────────────┼───────────────────────┤
│ SOAP covariance │ 26 min │ 101 min │
├─────────────────┼──────────────────────┼───────────────────────┤
│ QR solve │ 1 sec │ 7 sec │
├─────────────────┼──────────────────────┼───────────────────────┤
│ Result │ ✅ Success │ ✅ Success │
├─────────────────┼──────────────────────┼───────────────────────┤
│ Energy RMSE │ — │ 2.5 meV/atom │
├─────────────────┼──────────────────────┼───────────────────────┤
│ Force RMSE │ — │ 0.25 eV/Å │
└─────────────────┴──────────────────────┴───────────────────────┘
Log output confirms: Using LAPACK to solve QR
Changes
2 files changed, 7 insertions, 2 deletions. No Fortran source changes — the QR code path
already exists.