Recover O(N^2) scaling in 4G LAMMPS Interface Force Calculation#209
Open
lxknll wants to merge 3 commits intoCompPhysVienna:masterfrom
Open
Recover O(N^2) scaling in 4G LAMMPS Interface Force Calculation#209lxknll wants to merge 3 commits intoCompPhysVienna:masterfrom
lxknll wants to merge 3 commits intoCompPhysVienna:masterfrom
Conversation
added 3 commits
May 6, 2026 09:31
…t to the total tally.
Otherwise it does not compile with current LAMMPS.
…c derivative evaluation routines. The original routines loop over all reciprocal space vectors. This is problematic, since their numbers scales linearly with the cell volume and thus, assuming constant density, with the number of atoms. I experimented with more ideal choices for the real- and reciprocal space cutoff radii, but the most efficient solution I could find was leaving the number of k vectors fixed and introducing a structure factor approach as is done in the iQEq loop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR proposes performance optimizations and minor fixes for the
pair hdnnp/4gLAMMPS pair style. It reduces the time per MD step for a moderately large water box with 2600 atoms from 6000 s to 70 s in a single MPI task.Details
Hi there! I am one of the lead developers of RuNNer in Jörg Behlers research group in Bochum. I have been working on performance comparisons between the 4G-HDNNP LAMMPS interfaces of n2p2, PANNA, and RuNNer. In those tests, I found two things that made me curious:
This PR proposes fixes to these problems.
Incorrect total energy ( ➡️ 6b41f10)
I suspect that the 4G potential energy tally on
masteris missing the reciprocal electrostatic energy contribution. It is correctly calculated, but never added to the final value. This is fixed by introducing a new variable that collects the contribution on all MPI tasks. Since it is a global property, the final tally is divided by the number of tasks.Neither the charges nor the forces are not affected by this.
Cubic Scaling in Force Computation ( ➡️ ec1953d)
The original routines loop over all reciprocal space vectors. This is problematic, since their number scales linearly with the cell volume and thus (assuming constant density) with the number of atoms. This is because the formulas for choosing the real-space and reciprocal space cutoff radii are not ideal: the real-space cutoff radius is kept fixed (the maximum ACSF cutoff), and the reciprocal space cutoff radius is chosen such that the desired electrostatic precision is obtained.
In a first attempt, I introduced the same formulas for determining the cutoff radii that we use in RuNNer. They take the cell volume into account. However, this means that the computational burden is progressively shifted towards the real-space part, and I had to request a new LAMMPS neighbor list with the calculated real-space cutoff. It quickly grew prohibitively large for large systems.
The most efficient solution I found was leaving the number of k vectors fixed and introducing a structure factor approach to the gradient calculation as is done in the iQEq loop.
For me, this reduces the runtime for a moderately large system with 2600 atoms from 6000 s to 70 s. MPI scaling is unaffected.
FFT interface ( ➡️ 07cd78b)
I added a missing argument to the three FFT calls.
Remaining problems / To Do
masterbranch and in this PR the force computation scales nicely with the number of MPI tasks. However, I found that the scaling of the two iQEq steps (charge and lambda charge computation) depends strongly on the compiler: with icx (+ MKL as the GSL backend) it scales very well, but with gcc (same MKL backend) it gets slower by two orders of magnitude as the number of MPI tasks increases. I did not find a good solution, but also did not spend a lot of time on it.