Skip to content

Add v4 distance kernels#1025

Draft
m3hm3t wants to merge 1 commit intomicrosoft:mainfrom
hakashya:hakashya/avx512/add4bit
Draft

Add v4 distance kernels#1025
m3hm3t wants to merge 1 commit intomicrosoft:mainfrom
hakashya:hakashya/avx512/add4bit

Conversation

@m3hm3t
Copy link
Copy Markdown

@m3hm3t m3hm3t commented May 6, 2026

  • Does this PR have a descriptive title that could go in our release notes?
  • Does this PR add any new dependencies?
  • Does this PR modify any existing APIs?
  • Is the change to the API backwards compatible?
  • Should this result in any changes to our documentation, either updating existing docs or adding new ones?

Reference Issues/PRs

What does this implement/fix? Briefly explain your changes.

Any other comments?

let wx: i16s = (vx & mask).reinterpret_simd();
let wy: i16s = (vy & mask).reinterpret_simd();
let d = wx - wy;
s0 = s0.dot_simd(d, d);
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the initiative! You can do much better by targeting this method. Unlike the dot-product from 16-bit integers, it does the dot product of 64 8-bit numbers and the accumulation with a i32x16 in a single instruction 😄

Ignore me. It's too late in the day. Forgot this was L2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants