-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bf16
complex dot product for NEON
#163
Comments
bf16
complex dot product for NEONbf16
complex dot product for NEON
The complex dot product exists for neon, but we're converting to f32 and want to operate on the bf16 inputs. The complex vector is real, imag, real, imag,,, Original
New looks like this perhaps (altq is an fma of odd entries while albq is even)
|
Indeed, you are right! I suppose the new version must be a lot faster, right? |
10% faster. There are not bf16 versions of the neg and rev32 so we still have to jump through hoops. I confirmed that the new function's output matches the old and the tests pass. Will take a look to see if we can do this better before making a PR. Assembly code: https://godbolt.org/z/4hzr9f943
|
Interestingly, the godbolt.org snippet you've provided breaks Clang 18.1 if you add |
Good catch I was playing around with another compiler on there. Ubuntu 24.04's clang 18.1 sees the same bug when building this code. Issue opened: llvm/llvm-project#107810 I'll try to move code around to avoid this later. Code: MarkReedZ@09e89bb |
Clang is choking on the flipping of the sign bit. I haven't come up with an alternative to these two. No amount of moving code around fixes the clang bug if veorq and vnegq are used to flip the bit.
|
Hi @MarkReedZ! Any chance you have an update in this? |
This is fixed in clang 19.1. I'm not sure what our approach to handling this should be as 18 will remain the default for some time. We could check the clang version number defines though apparently in some cases those may by overridden. |
@MarkReedZ, can you please submit a PR that works with 19, and I'll try a few more ideas around your prototype? |
The
vbfmlaltq_f32
andvbfmlalbq_f32
already have the benefit of skipping odd/even entries.The text was updated successfully, but these errors were encountered: