android-arm64: multi-variant runtime dispatch with dotprod and i8mm#38
Open
thereisnotime wants to merge 3 commits into
Open
android-arm64: multi-variant runtime dispatch with dotprod and i8mm#38thereisnotime wants to merge 3 commits into
thereisnotime wants to merge 3 commits into
Conversation
Change -march=armv8-a to -march=armv8.2-a+dotprod+fp16 for the android-arm64 build path. armv8-a produces zero sdot instructions; with this flag the same binary has 1044 sdot instructions. On Cortex-A76+ (all flagship Android since 2018) this translates to roughly 2-4x faster CPU inference on quantised models. ARMv8.2 is the baseline for Android 10+ on 64-bit hardware and is listed as a required feature in the Android NDK ABI documentation. llama.cpp already gates the dot-product kernels behind ggml_cpu_has_dotprod(), so there is no risk of executing the fast path on hardware that does not support it.
Adds a proper multi-variant build system for Android ARM64, matching the existing AVX/AVX2/AVX512 pattern used on desktop: - New ARCHITECTURE=android-arm64_dotprod variant compiles with -march=armv8.2-a+dotprod+fp16 (Cortex-A75+, all flagships since 2018) - Existing android-arm64 stays on baseline -march=armv8-a (all 64-bit Android devices, no regression risk) - New archchecker_arm.cpp/.h detects HWCAP_ASIMDDP and HWCAP2_I8MM at runtime via getauxval(AT_HWCAP) — same approach as the x86 CPUID checker - Runtime library (libllamalib_android-arm64_runtime.so) now built for Android; it calls has_dotprod()/has_i8mm() and dlopens the best available sibling .so - get_executable_directory() uses dladdr() on Android to locate the runtime .so's directory, so sibling variants are found correctly - CMakeLists: OpenSSL path now uses the platform prefix (android-arm64) rather than the full ARCHITECTURE string, so it resolves correctly for all variants
The runtime dispatcher already detects HWCAP2_I8MM and tries to load libllamalib_android-arm64_i8mm.so — this adds the missing build target. - third_party/CMakeLists.txt: android-arm64_i8mm uses -march=armv8.6-a+dotprod+i8mm+fp16 - build_library.yaml: add android-arm64_dotprod and android-arm64_i8mm matrix entries Targets Cortex-A510/A710/X2 and later (Exynos 2200+, SD 8 Gen 1+).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Proper multi-variant runtime dispatch for Android ARM64, matching the AVX/AVX2/AVX512 pattern already in use on desktop.
Problem with the previous approach: changing the baseline
android-arm64build to use-march=armv8.2-a+dotprod+fp16would crash on any device without dotprod support (Cortex-A53/A55, ~2015–2018 budget/mid-range hardware).This PR instead:
android-arm64on baseline-march=armv8-a(all 64-bit Android, unchanged)android-arm64_dotprodcompiled with-march=armv8.2-a+dotprod+fp16(Cortex-A75+, 2018+ flagships)android-arm64_i8mmcompiled with-march=armv8.6-a+dotprod+i8mm+fp16(Cortex-A510/A710/X2+, 2021+)libllamalib_android-arm64_runtime.so— runtime dispatcher that selects the best available variantRuntime selection order
On first load, the runtime dispatcher checks CPU features via
getauxvaland loads the first variant whose.sois present:libllamalib_android-arm64_i8mm.so— ifHWCAP2_I8MMis setlibllamalib_android-arm64_dotprod.so— ifHWCAP_ASIMDDPis setlibllamalib_android-arm64.so— baseline fallbackChanges
New files
include/archchecker_arm.h/src/archchecker_arm.cpp— detectsHWCAP_ASIMDDP(dotprod) andHWCAP2_I8MMat runtime viagetauxval(AT_HWCAP[2])Modified files
third_party/CMakeLists.txt—android-arm64_i8mmusesarmv8.6-a+dotprod+i8mm+fp16;android-arm64_dotprodusesarmv8.2-a+dotprod+fp16; baseline staysarmv8-a.github/workflows/build_library.yaml— addsandroid-arm64_dotprodandandroid-arm64_i8mmto the build matrixsrc/CMakeLists.txt— builds the Android runtime dispatcher; skips static library and desktop OpenSSL merge (not needed on Android)src/LLM_runtime.cpp—available_architectures()checks i8mm then dotprod then baseline on Android AArch64;get_executable_directory()usesdladdr()to locate sibling.sofilesinclude/LLM_runtime.h— includesarchchecker_arm.hon Android AArch64CMakeLists.txt— OpenSSL path uses platform prefix (android-arm64) not fullARCHITECTUREstring