The new A2 fast path code from PR #277 only increase performance on recent compilers (see discussion in the PR comments).
I'm not sure exactly which compilers, but so far GCC12, Clang14 and the latest MSVC all produce slower code with the new A2 fast optimizations.
Clang 22.1.3 and GCC15 both produce faster code.
Maybe this change could be gated behind a compiler flag so that it can be selectively enabled based on the compiler context?
The new A2 fast path code from PR #277 only increase performance on recent compilers (see discussion in the PR comments).
I'm not sure exactly which compilers, but so far GCC12, Clang14 and the latest MSVC all produce slower code with the new A2 fast optimizations.
Clang 22.1.3 and GCC15 both produce faster code.
Maybe this change could be gated behind a compiler flag so that it can be selectively enabled based on the compiler context?