Skip to content

android-arm64: multi-variant runtime dispatch with dotprod and i8mm#38

Open
thereisnotime wants to merge 3 commits into
undreamai:mainfrom
thereisnotime:android-arm64-dotprod
Open

android-arm64: multi-variant runtime dispatch with dotprod and i8mm#38
thereisnotime wants to merge 3 commits into
undreamai:mainfrom
thereisnotime:android-arm64-dotprod

Conversation

@thereisnotime

@thereisnotime thereisnotime commented May 19, 2026

Copy link
Copy Markdown

What

Proper multi-variant runtime dispatch for Android ARM64, matching the AVX/AVX2/AVX512 pattern already in use on desktop.

Problem with the previous approach: changing the baseline android-arm64 build to use -march=armv8.2-a+dotprod+fp16 would crash on any device without dotprod support (Cortex-A53/A55, ~2015–2018 budget/mid-range hardware).

This PR instead:

  • Keeps android-arm64 on baseline -march=armv8-a (all 64-bit Android, unchanged)
  • Adds android-arm64_dotprod compiled with -march=armv8.2-a+dotprod+fp16 (Cortex-A75+, 2018+ flagships)
  • Adds android-arm64_i8mm compiled with -march=armv8.6-a+dotprod+i8mm+fp16 (Cortex-A510/A710/X2+, 2021+)
  • Adds libllamalib_android-arm64_runtime.so — runtime dispatcher that selects the best available variant

Runtime selection order

On first load, the runtime dispatcher checks CPU features via getauxval and loads the first variant whose .so is present:

  1. libllamalib_android-arm64_i8mm.so — if HWCAP2_I8MM is set
  2. libllamalib_android-arm64_dotprod.so — if HWCAP_ASIMDDP is set
  3. libllamalib_android-arm64.so — baseline fallback

Changes

New files

  • include/archchecker_arm.h / src/archchecker_arm.cpp — detects HWCAP_ASIMDDP (dotprod) and HWCAP2_I8MM at runtime via getauxval(AT_HWCAP[2])

Modified files

  • third_party/CMakeLists.txtandroid-arm64_i8mm uses armv8.6-a+dotprod+i8mm+fp16; android-arm64_dotprod uses armv8.2-a+dotprod+fp16; baseline stays armv8-a
  • .github/workflows/build_library.yaml — adds android-arm64_dotprod and android-arm64_i8mm to the build matrix
  • src/CMakeLists.txt — builds the Android runtime dispatcher; skips static library and desktop OpenSSL merge (not needed on Android)
  • src/LLM_runtime.cppavailable_architectures() checks i8mm then dotprod then baseline on Android AArch64; get_executable_directory() uses dladdr() to locate sibling .so files
  • include/LLM_runtime.h — includes archchecker_arm.h on Android AArch64
  • CMakeLists.txt — OpenSSL path uses platform prefix (android-arm64) not full ARCHITECTURE string

Change -march=armv8-a to -march=armv8.2-a+dotprod+fp16 for the
android-arm64 build path.

armv8-a produces zero sdot instructions; with this flag the same binary
has 1044 sdot instructions. On Cortex-A76+ (all flagship Android since
2018) this translates to roughly 2-4x faster CPU inference on quantised
models. ARMv8.2 is the baseline for Android 10+ on 64-bit hardware and
is listed as a required feature in the Android NDK ABI documentation.

llama.cpp already gates the dot-product kernels behind
ggml_cpu_has_dotprod(), so there is no risk of executing the fast path
on hardware that does not support it.
Adds a proper multi-variant build system for Android ARM64, matching the
existing AVX/AVX2/AVX512 pattern used on desktop:

- New ARCHITECTURE=android-arm64_dotprod variant compiles with
  -march=armv8.2-a+dotprod+fp16 (Cortex-A75+, all flagships since 2018)
- Existing android-arm64 stays on baseline -march=armv8-a (all 64-bit
  Android devices, no regression risk)
- New archchecker_arm.cpp/.h detects HWCAP_ASIMDDP and HWCAP2_I8MM at
  runtime via getauxval(AT_HWCAP) — same approach as the x86 CPUID checker
- Runtime library (libllamalib_android-arm64_runtime.so) now built for
  Android; it calls has_dotprod()/has_i8mm() and dlopens the best
  available sibling .so
- get_executable_directory() uses dladdr() on Android to locate the
  runtime .so's directory, so sibling variants are found correctly
- CMakeLists: OpenSSL path now uses the platform prefix (android-arm64)
  rather than the full ARCHITECTURE string, so it resolves correctly for
  all variants
The runtime dispatcher already detects HWCAP2_I8MM and tries to load
libllamalib_android-arm64_i8mm.so — this adds the missing build target.

- third_party/CMakeLists.txt: android-arm64_i8mm uses -march=armv8.6-a+dotprod+i8mm+fp16
- build_library.yaml: add android-arm64_dotprod and android-arm64_i8mm matrix entries

Targets Cortex-A510/A710/X2 and later (Exynos 2200+, SD 8 Gen 1+).
@thereisnotime thereisnotime changed the title android-arm64: enable dotprod and fp16 extensions android-arm64: multi-variant runtime dispatch with dotprod and i8mm May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant