You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As requested by @vchuravy, this is a copy of my slack question:
Hello, I'm running into an errors with KernelAbstractions.jl, atomic operations using Atomix.jl and complex numbers. Is it possible to somehow perform atomic operations on ComplexF32 ?
As MWE we can just take the atomic operations example from the documentation and create img as an array of ComplexF32:
using CUDA, KernelAbstractions, Atomix
img =zeros(ComplexF32, (50, 50));
img[10:20, 10:20] .=1;
img[35:45, 35:45] .=2;
functionindex_fun_fixed(arr; backend=get_backend(arr))
out =similar(arr)
fill!(out, 0)
kernel! =my_kernel_fixed!(backend)
kernel!(out, arr, ndrange=(size(arr, 1), size(arr, 2)))
return out
end@kernelfunctionmy_kernel_fixed!(out, arr)
i, j =@index(Global, NTuple)
for k in1:size(out, 1)
Atomix.@atomic out[k, i] += arr[i, j]
endendindex_fun_fixed(CuArray(img))
index_fun_fixed(img)
On a GPU I get the error:
out_fixed =Array(index_fun_fixed(CuArray(img)));
ERROR: a error was thrown during kernel execution on thread (65, 1, 1) in block (3, 1, 1).
Stacktrace not available, run Julia on debug level 2for more details (by passing -g2 to the executable).
ERROR: KernelException: exception thrown during kernel execution on device NVIDIA GeForce GTX 1080 Ti
On the issue of safety. This can lead to "torn" updates. E.g. one thread updating re one updating im. Since you are doing an accumulate that should be fine.
We would need to support 16byte wide operations, but that would also turn your accumulate operation into a cmpswap loop.
As requested by @vchuravy, this is a copy of my slack question:
Hello, I'm running into an errors with KernelAbstractions.jl, atomic operations using Atomix.jl and complex numbers. Is it possible to somehow perform atomic operations on ComplexF32 ?
As MWE we can just take the atomic operations example from the documentation and create img as an array of ComplexF32:
On a GPU I get the error:
and on CPU I get:
Accessing the real and imag part individually like this:
results in such an error:
A fairly hacky workaround is reinterpret, but I'm not sure that is safe to do:
The text was updated successfully, but these errors were encountered: