Skip to content

Recover O(N^2) scaling in 4G LAMMPS Interface Force Calculation#209

Open
lxknll wants to merge 3 commits intoCompPhysVienna:masterfrom
lxknll:4g-speedup
Open

Recover O(N^2) scaling in 4G LAMMPS Interface Force Calculation#209
lxknll wants to merge 3 commits intoCompPhysVienna:masterfrom
lxknll:4g-speedup

Conversation

@lxknll
Copy link
Copy Markdown

@lxknll lxknll commented May 6, 2026

Summary

This PR proposes performance optimizations and minor fixes for the pair hdnnp/4g LAMMPS pair style. It reduces the time per MD step for a moderately large water box with 2600 atoms from 6000 s to 70 s in a single MPI task.

Details

Hi there! I am one of the lead developers of RuNNer in Jörg Behlers research group in Bochum. I have been working on performance comparisons between the 4G-HDNNP LAMMPS interfaces of n2p2, PANNA, and RuNNer. In those tests, I found two things that made me curious:

  • the reported total energy of RuNNer and n2p2 are not the same
  • n2p2's force calculation scales $\mathcal{O}(N_{\mathrm{at}}^3)$ with the number of atoms in the structure. This is unexpected, since the Ewald-based iterative QEq solver should scale $\mathcal{O}(N_{\mathrm{at}}^2)$ (and I can confirm that it does in fact scale quadratically for the charge equilibration and pure energy calculation).
  • (minor third point): current LAMMPS did not compile due to small updates to the FFT routine interfaces.

This PR proposes fixes to these problems.

Incorrect total energy ( ➡️ 6b41f10)

I suspect that the 4G potential energy tally on master is missing the reciprocal electrostatic energy contribution. It is correctly calculated, but never added to the final value. This is fixed by introducing a new variable that collects the contribution on all MPI tasks. Since it is a global property, the final tally is divided by the number of tasks.

Neither the charges nor the forces are not affected by this.

Cubic Scaling in Force Computation ( ➡️ ec1953d)

The original routines loop over all reciprocal space vectors. This is problematic, since their number scales linearly with the cell volume and thus (assuming constant density) with the number of atoms. This is because the formulas for choosing the real-space and reciprocal space cutoff radii are not ideal: the real-space cutoff radius is kept fixed (the maximum ACSF cutoff), and the reciprocal space cutoff radius is chosen such that the desired electrostatic precision is obtained.

In a first attempt, I introduced the same formulas for determining the cutoff radii that we use in RuNNer. They take the cell volume into account. However, this means that the computational burden is progressively shifted towards the real-space part, and I had to request a new LAMMPS neighbor list with the calculated real-space cutoff. It quickly grew prohibitively large for large systems.

The most efficient solution I found was leaving the number of k vectors fixed and introducing a structure factor approach to the gradient calculation as is done in the iQEq loop.

For me, this reduces the runtime for a moderately large system with 2600 atoms from 6000 s to 70 s. MPI scaling is unaffected.

FFT interface ( ➡️ 07cd78b)

I added a missing argument to the three FFT calls.

Remaining problems / To Do

  • The calculated total energy matches RuNNer's value up to approx. 1e-5 Hartree. After a few timesteps, trajectories still diverge slightly. This may simply be a numerical artifact from the iterative solver. I am fairly confident in the correctness of the value in RuNNer.
  • On the master branch and in this PR the force computation scales nicely with the number of MPI tasks. However, I found that the scaling of the two iQEq steps (charge and lambda charge computation) depends strongly on the compiler: with icx (+ MKL as the GSL backend) it scales very well, but with gcc (same MKL backend) it gets slower by two orders of magnitude as the number of MPI tasks increases. I did not find a good solution, but also did not spend a lot of time on it.

Alexander Knoll added 3 commits May 6, 2026 09:31
Otherwise it does not compile with current LAMMPS.
…c derivative evaluation routines.

The original routines loop over all reciprocal space vectors. This is problematic, since their numbers scales linearly with the cell volume and thus, assuming constant density, with the number of atoms. I experimented with more ideal choices for the real- and reciprocal space cutoff radii, but the most efficient solution I could find was leaving the number of k vectors fixed and introducing a structure factor approach as is done in the iQEq loop.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant