Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic operations on complex numbers #497

Open
nHackel opened this issue Jul 23, 2024 · 1 comment
Open

Atomic operations on complex numbers #497

nHackel opened this issue Jul 23, 2024 · 1 comment

Comments

@nHackel
Copy link

nHackel commented Jul 23, 2024

As requested by @vchuravy, this is a copy of my slack question:

Hello, I'm running into an errors with KernelAbstractions.jl, atomic operations using Atomix.jl and complex numbers. Is it possible to somehow perform atomic operations on ComplexF32 ?

As MWE we can just take the atomic operations example from the documentation and create img as an array of ComplexF32:

using CUDA, KernelAbstractions, Atomix

img = zeros(ComplexF32, (50, 50));
img[10:20, 10:20] .= 1;
img[35:45, 35:45] .= 2;

function index_fun_fixed(arr; backend=get_backend(arr))
	out = similar(arr)
	fill!(out, 0)
	kernel! = my_kernel_fixed!(backend)
	kernel!(out, arr, ndrange=(size(arr, 1), size(arr, 2)))
	return out
end

@kernel function my_kernel_fixed!(out, arr)
	i, j = @index(Global, NTuple)
	for k in 1:size(out, 1)
		Atomix.@atomic out[k, i] += arr[i, j]
	end
end

index_fun_fixed(CuArray(img))
index_fun_fixed(img)

On a GPU I get the error:

out_fixed = Array(index_fun_fixed(CuArray(img)));
ERROR: a error was thrown during kernel execution on thread (65, 1, 1) in block (3, 1, 1).
Stacktrace not available, run Julia on debug level 2 for more details (by passing -g2 to the executable).

ERROR: KernelException: exception thrown during kernel execution on device NVIDIA GeForce GTX 1080 Ti

and on CPU I get:

out_fixed = Array(index_fun_fixed(img));
ERROR: TaskFailedException

    nested task error: MethodError: no method matching modify!(::Ptr{ComplexF32}, ::typeof(+), ::ComplexF32, ::UnsafeAtomics.Internal.LLVMOrdering{:seq_cst})
    
    Closest candidates are:
      modify!(::Ptr{T}, ::typeof(UnsafeAtomics.right), ::T, ::Any) where T
       @ UnsafeAtomics ~/.julia/packages/UnsafeAtomics/ugwrA/src/core.jl:197
      modify!(::Core.LLVMPtr, ::OP, ::Any, ::UnsafeAtomics.Ordering) where OP
       @ UnsafeAtomicsLLVM ~/.julia/packages/UnsafeAtomicsLLVM/tbohS/src/internal.jl:20
      modify!(::Any, ::Any, ::Any)
       @ UnsafeAtomics ~/.julia/packages/UnsafeAtomics/ugwrA/src/core.jl:4
      ...
    
    Stacktrace:
     [1] modify!
       @ ~/.julia/packages/Atomix/F9VIX/src/core.jl:33 [inlined]
     [2] macro expansion
       @ ./REPL[55]:4 [inlined]
     [3] cpu_my_kernel_fixed!
       @ ~/.julia/packages/KernelAbstractions/HAcqg/src/macros.jl:287 [inlined]
     [4] cpu_my_kernel_fixed!(__ctx__::KernelAbstractions.CompilerMetadata{…}, out::Matrix{…}, arr::Matrix{…})

Accessing the real and imag part individually like this:

@kernel function my_kernel_fixed!(out, arr::AbstractArray{<:Complex})
               i, j = @index(Global, NTuple)
               for k in 1:size(out, 1)
                       Atomix.@atomic out[k, i].re += arr[i, j].re
                       Atomix.@atomic out[k, i].im += arr[i, j].im
               end
       end

results in such an error:

ERROR: TaskFailedException

    nested task error: ConcurrencyViolationError("modifyfield!: non-atomic field cannot be written atomically")

A fairly hacky workaround is reinterpret, but I'm not sure that is safe to do:

function index_fun_reinterpret(arr::AbstractArray{<:Complex}; backend=get_backend(arr))
	out = similar(arr)
	fill!(out, 0)
	kernel! = my_kernel_reinterpret!(backend)
	kernel!(reinterpret(reshape, Float32, out), arr, ndrange=(size(arr, 1), size(arr, 2)))
	return out
end

@kernel function my_kernel_reinterpret!(out, arr::AbstractArray{<:Complex})
	i, j = @index(Global, NTuple)
	for k in 1:size(out, 2)
		Atomix.@atomic out[1, k, i] += arr[i, j].re
		Atomix.@atomic out[2, k, i] += arr[i, j].im
	end
end
@vchuravy
Copy link
Member

On the issue of safety. This can lead to "torn" updates. E.g. one thread updating re one updating im. Since you are doing an accumulate that should be fine.
We would need to support 16byte wide operations, but that would also turn your accumulate operation into a cmpswap loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants