GPU testing #61

jmsull · 2022-05-26T02:30:18Z

No description provided.

codecov-commenter · 2022-05-26T02:36:46Z

Codecov Report

Merging #61 (b761078) into main (d97d7c0) will decrease coverage by 25.04%.
The diff coverage is 0.00%.

❗ Current head b761078 differs from pull request most recent head 152f307. Consider uploading reports for the commit 152f307 to get more accurate results

@@             Coverage Diff             @@
##             main      #61       +/-   ##
===========================================
- Coverage   70.74%   45.70%   -25.05%     
===========================================
  Files           7        8        +1     
  Lines         694      698        +4     
===========================================
- Hits          491      319      -172     
- Misses        203      379      +176

Impacted Files	Coverage Δ
src/Bolt.jl	`100.00% <ø> (+100.00%)`	⬆️
src/background.jl	`86.36% <ø> (-6.82%)`	⬇️
src/gpu.jl	`0.00% <0.00%> (ø)`
src/perturbations.jl	`0.00% <0.00%> (-79.27%)`	⬇️
src/spectra.jl	`0.00% <0.00%> (ø)`
src/util.jl	`98.41% <0.00%> (-1.59%)`	⬇️
src/ionization/recfast.jl	`91.85% <0.00%> (+0.07%)`	⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d97d7c0...152f307. Read the comment docs.

jmsull · 2022-05-26T02:42:38Z

Starting to look at running on GPUs. After confirming I can run the readme examples on NERSC, I tried to do a simple parallelization of the boltsolve call to the ODE solver over k, following the DIffEqGPU.jl readme and the docs here.

To do this you should be able to pass the argument EnsembleGPUArray() to the ensemble problem as I tried to do in /examples/gpu_ensemble_boltsolve.jl. This doesn't work, apparently since (as the name indicates) EnsembleGPUArray() actually needs to map over an Array, and I am trying to do it over hierarchy structs (which works fine for EnsembleThreads() syntactically, so nothing obviously wrong there). The error I get when running the commented line in the script is:

ERROR: MethodError: no method matching Array(::Hierarchy{Float64, BasicNewtonian,

Can quickly get around this by wrapping hierarchy! and taking only k as a parameter, so will try that.

jmsull · 2022-06-08T00:35:51Z

The idea I had above did not work. I updated that file (/examples/gpu_ensemble_boltsolve.jl.) to do this wrapping (which appears to work fine for the commented out EnsembleThreads() solve, by the way).

However, I get an error that looks similar to before (though now it is apparently only upset about applying "Array" to a Float32 value rather than to the hierarchy object). What this looks like is it is trying to convert the single float k into an array...as I can reproduce this method error at the top with Array(Float32(1.))

Stack trace (not sure why getting these warnings now also...):
ERROR: LoadError: MethodError: no method matching Array(::Float32) ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 Closest candidates are: Array(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY}) at /global/common/software/m4051/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/qr.jl:442 Array(::Union{LinearAlgebra.Hermitian, LinearAlgebra.Symmetric}) at /global/common/software/m4051/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/symmetric.jl:271 Array(::LinearAlgebra.SVD) at /global/common/software/m4051/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/svd.jl:571 ... Stacktrace: [1] (::DiffEqGPU.var"#24#30")(i::Int64) @ DiffEqGPU ./none:0 [2] MappingRF @ ./reduce.jl:95 [inlined] [3] _foldl_impl(op::Base.MappingRF{DiffEqGPU.var"#24#30", Base.BottomRF{typeof(hcat)}}, init::Base._InitialValue, itr::UnitRange{Int64}) @ Base ./reduce.jl:58 [4] foldl_impl @ ./reduce.jl:48 [inlined] [5] mapfoldl_impl @ ./reduce.jl:44 [inlined] [6] #mapfoldl#244 @ ./reduce.jl:162 [inlined] [7] mapfoldl @ ./reduce.jl:162 [inlined] [8] #mapreduce#248 @ ./reduce.jl:289 [inlined] [9] mapreduce @ ./reduce.jl:289 [inlined] [10] #reduce#250 @ ./reduce.jl:458 [inlined] [11] reduce @ ./reduce.jl:458 [inlined] [12] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Float32, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}}) @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:283 [13] macro expansion @ ./timing.jl:299 [inlined] [14] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Float32, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:saveat, :reltol, :dense), Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}}) @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:201 [15] #solve#42 @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:193 [inlined] [16] top-level scope @ ./timing.jl:220

jmsull · 2022-06-08T00:48:01Z

After perusing the DiffEqGPU.jl github issues I think k must be passed as an Array rather than a scalar value...

jmsull · 2022-06-08T01:06:20Z

Ok so after fixing this I get a new problem that occurs apparently where the solver is actually running (as opposed to just the setup failing). This seems like a problem with the solver itself? I don't actually see it calling "needs_concrete_A()"... Change pushed below, and here is the stack trace

jsull@nid002601:/pscratch/sd/j/jsull/julia/Bolt.jl/examples> julia gpu_ensemble_boltsolve.jl
  Activating project at `/pscratch/sd/j/jsull/julia/Bolt.jl`
pre type of u0, Vector{Float64}
pre type of kgrid, Vector{Float64}
pre type of xi, Float64
post type of u0, Vector{Float32}
post type of kgrid, Matrix{Float32}
Succesfully evaluted gpu derivative
test deriv: nothing
one elementFloat32
ERROR: LoadError: MethodError: no method matching needs_concrete_A(::LinSolveGPUSplitFactorize)
Closest candidates are:
  needs_concrete_A(::LinearSolve.AbstractFactorization) at ~/.julia/packages/LinearSolve/7vwOr/src/LinearSolve.jl:33
  needs_concrete_A(::LinearSolve.AbstractKrylovSubspaceMethod) at ~/.julia/packages/LinearSolve/7vwOr/src/LinearSolve.jl:34
  needs_concrete_A(::LinearSolve.AbstractSolveFunction) at ~/.julia/packages/LinearSolve/7vwOr/src/LinearSolve.jl:35
Stacktrace:
  [1] build_J_W(alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, uprev::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, t::Float64, dt::Float64, f::ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, #unused#::Type{Float32}, #unused#::Val{true})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/derivative_utils.jl:704
  [2] build_nlsolver(alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, nlalg::NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, uprev::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, t::Float64, dt::Float64, f::Function, rate_prototype::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, #unused#::Type{Float32}, #unused#::Type{Float32}, #unused#::Type{Float64}, γ::Float64, c::Float64, α::Int64, #unused#::Val{true})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/nlsolve/utils.jl:149
  [3] build_nlsolver
    @ ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/nlsolve/utils.jl:124 [inlined]
  [4] build_nlsolver(alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, uprev::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, t::Float64, dt::Float64, f::Function, rate_prototype::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, #unused#::Type{Float32}, #unused#::Type{Float32}, #unused#::Type{Float64}, γ::Float64, c::Float64, iip::Val{true})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/nlsolve/utils.jl:118
  [5] alg_cache(alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, rate_prototype::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, #unused#::Type{Float32}, #unused#::Type{Float32}, #unused#::Type{Float64}, uprev::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, uprev2::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, f::ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, t::Float64, dt::Float64, reltol::Float64, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, calck::Bool, #unused#::Val{true})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/caches/kencarp_kvaerno_caches.jl:179
  [6] __init(prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, timeseries_init::Tuple{}, ts_init::Tuple{}, ks_init::Tuple{}, recompile::Type{Val{true}}; saveat::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, tstops::Tuple{}, d_discontinuities::Tuple{}, save_idxs::Nothing, save_everystep::Bool, save_on::Bool, save_start::Bool, save_end::Nothing, callback::Nothing, dense::Bool, calck::Bool, dt::Float64, dtmin::Nothing, dtmax::Float64, force_dtmin::Bool, adaptive::Bool, gamma::Rational{Int64}, abstol::Nothing, reltol::Float64, qmin::Rational{Int64}, qmax::Int64, qsteady_min::Int64, qsteady_max::Rational{Int64}, beta1::Nothing, beta2::Nothing, qoldinit::Rational{Int64}, controller::Nothing, fullnormalize::Bool, failfactor::Int64, maxiters::Int64, internalnorm::typeof(DiffEqGPU.diffeqgpunorm), internalopnorm::typeof(LinearAlgebra.opnorm), isoutofdomain::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), unstable_check::DiffEqGPU.var"#12#18", verbose::Bool, timeseries_errors::Bool, dense_errors::Bool, advance_to_tstop::Bool, stop_at_next_tstop::Bool, initialize_save::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), userdata::Nothing, allow_extrapolation::Bool, initialize_integrator::Bool, alias_u0::Bool, alias_du0::Bool, initializealg::OrdinaryDiffEq.DefaultInit, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/solve.jl:295
  [7] __solve(::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, ::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}; kwargs::Base.Pairs{Symbol, Any, NTuple{6, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense, :callback, :internalnorm), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool, Nothing, typeof(DiffEqGPU.diffeqgpunorm)}}})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/solve.jl:4
  [8] solve_call(_prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, args::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}; merge_callbacks::Bool, kwargs::Base.Pairs{Symbol, Any, NTuple{6, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense, :callback, :internalnorm), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool, Nothing, typeof(DiffEqGPU.diffeqgpunorm)}}})
    @ DiffEqBase ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:152
  [9] solve_up(prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, sensealg::Nothing, u0::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, args::KenCarp4{0, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}; kwargs::Base.Pairs{Symbol, Any, NTuple{7, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense, :callback, :merge_callbacks, :internalnorm), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool, Nothing, Bool, typeof(DiffEqGPU.diffeqgpunorm)}}})
    @ DiffEqBase ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:179
 [10] #solve#40
    @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:165 [inlined]
 [11] batch_solve_up(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, probs::Vector{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}, u0::Matrix{Float32}, p::Matrix{Float32}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:319
 [12] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:284
 [13] macro expansion
    @ ./timing.jl:299 [inlined]
 [14] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:saveat, :reltol, :dense), Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:201
 [15] #solve#42
    @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:193 [inlined]
 [16] top-level scope
    @ ./timing.jl:220
in expression starting at /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:65

jmsull · 2022-06-08T01:17:29Z

I looked to see what I am doing differently from the readme example - I tried Tsit5 instead of KenCarp4. Now I get an error that seems a bit more sinister...stuck on this, will leave it alone for now. Will note that Tsit5 with ensemble threads runs but aborts due to instability so probably wouldn't give good results here anyways - but it should at least run.

ERROR: LoadError: InvalidIRError: compiling kernel gpu_gpu_kernel(Cassette.Context{nametype(CUDACtx), Nothing, Nothing, KernelAbstractions.var"##PassType#291", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, typeof(gpu_ensemble_hierarchy!), CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, Float64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to overdub)
Stacktrace:
 [1] gpu_ensemble_hierarchy!(::SubArray{Float32, 1, CuDeviceMatrix{Float32, 1}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, ::SubArray{Float32, 1, CUDA.Const{Float32, 2, 1}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, ::SubArray{Float32, 1, CUDA.Const{Float32, 2, 1}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, ::Float64)
   @ /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:41
 [2] overdub
   @ /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:41
 [3] macro expansion
   @ ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:20
 [4] overdub
   @ ~/.julia/packages/KernelAbstractions/3ZHln/src/macros.jl:80
 [5] overdub
   @ ~/.julia/packages/Cassette/34vIw/src/overdub.jl:0
Reason: unsupported dynamic function invocation (call to overdub)
Stacktrace:
 [1] overdub
   @ /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:43
 [2] macro expansion
   @ ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:20
 [3] overdub
   @ ~/.julia/packages/KernelAbstractions/3ZHln/src/macros.jl:80
 [4] overdub
   @ ~/.julia/packages/Cassette/34vIw/src/overdub.jl:0
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(Cassette.overdub), Tuple{Cassette.Context{nametype(CUDACtx), Nothing, Nothing, KernelAbstractions.var"##PassType#291", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, typeof(gpu_ensemble_hierarchy!), CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, Float64}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/validation.jl:124
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/1FdJy/src/driver.jl:386 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/2IJ7m/src/TimerOutput.jl:252 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/1FdJy/src/driver.jl:384 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/utils.jl:64
  [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:332
  [7] #260
    @ ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:325 [inlined]
  [8] JuliaContext(f::CUDA.var"#260#261"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(Cassette.overdub), Tuple{Cassette.Context{nametype(CUDACtx), Nothing, Nothing, KernelAbstractions.var"##PassType#291", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, typeof(gpu_ensemble_hierarchy!), CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, Float64}}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/driver.jl:74
  [9] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:324
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/cache.jl:90
 [11] cufunction(f::typeof(Cassette.overdub), tt::Type{Tuple{Cassette.Context{nametype(CUDACtx), Nothing, Nothing, KernelAbstractions.var"##PassType#291", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, typeof(gpu_ensemble_hierarchy!), CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, Float64}}; name::String, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:297
 [12] macro expansion
    @ ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:102 [inlined]
 [13] (::KernelAbstractions.Kernel{CUDAKernels.CUDADevice, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(DiffEqGPU.gpu_gpu_kernel)})(::Function, ::Vararg{Any}; ndrange::Int64, dependencies::CUDAKernels.CudaEvent, workgroupsize::Int64, progress::Function)
    @ CUDAKernels ~/.julia/packages/CUDAKernels/kCOA4/src/CUDAKernels.jl:194
 [14] (::DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)})(du::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, t::Float64)
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:414
 [15] ODEFunction
    @ ~/.julia/packages/SciMLBase/OHiiA/src/scimlfunctions.jl:345 [inlined]
 [16] initialize!(integrator::OrdinaryDiffEq.ODEIntegrator{Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Nothing, Float64, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Float64, Float32, Float32, Float64, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, ODESolution{Float32, 3, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Nothing, Nothing, Vector{Float64}, Vector{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, OrdinaryDiffEq.InterpolationData{ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Vector{Float64}, Vector{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, OrdinaryDiffEq.Tsit5Cache{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float64}, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}}, DiffEqBase.DEStats}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, OrdinaryDiffEq.Tsit5Cache{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float64}, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, OrdinaryDiffEq.DEOptions{Float32, Float64, Float32, Float64, PIController{Rational{Int64}}, typeof(DiffEqGPU.diffeqgpunorm), typeof(LinearAlgebra.opnorm), Nothing, CallbackSet{Tuple{}, Tuple{}}, typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), DiffEqGPU.var"#12#18", DataStructures.BinaryHeap{Float64, DataStructures.FasterForward}, DataStructures.BinaryHeap{Float64, DataStructures.FasterForward}, Nothing, Nothing, Int64, Tuple{}, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Tuple{}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Float32, Nothing, OrdinaryDiffEq.DefaultInit}, cache::OrdinaryDiffEq.Tsit5Cache{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float64}, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/perform_step/low_order_rk_perform_step.jl:627
 [17] __init(prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, timeseries_init::Tuple{}, ts_init::Tuple{}, ks_init::Tuple{}, recompile::Type{Val{true}}; saveat::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, tstops::Tuple{}, d_discontinuities::Tuple{}, save_idxs::Nothing, save_everystep::Bool, save_on::Bool, save_start::Bool, save_end::Nothing, callback::Nothing, dense::Bool, calck::Bool, dt::Float64, dtmin::Nothing, dtmax::Float64, force_dtmin::Bool, adaptive::Bool, gamma::Rational{Int64}, abstol::Nothing, reltol::Float64, qmin::Rational{Int64}, qmax::Int64, qsteady_min::Int64, qsteady_max::Int64, beta1::Nothing, beta2::Nothing, qoldinit::Rational{Int64}, controller::Nothing, fullnormalize::Bool, failfactor::Int64, maxiters::Int64, internalnorm::typeof(DiffEqGPU.diffeqgpunorm), internalopnorm::typeof(LinearAlgebra.opnorm), isoutofdomain::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), unstable_check::DiffEqGPU.var"#12#18", verbose::Bool, timeseries_errors::Bool, dense_errors::Bool, advance_to_tstop::Bool, stop_at_next_tstop::Bool, initialize_save::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), userdata::Nothing, allow_extrapolation::Bool, initialize_integrator::Bool, alias_u0::Bool, alias_du0::Bool, initializealg::OrdinaryDiffEq.DefaultInit, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/solve.jl:456
 [18] #__solve#502
    @ ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/solve.jl:4 [inlined]
 [19] #solve_call#39
    @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:152 [inlined]
 [20] solve_up(prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, sensealg::Nothing, u0::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, args::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}; kwargs::Base.Pairs{Symbol, Any, NTuple{7, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense, :callback, :merge_callbacks, :internalnorm), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool, Nothing, Bool, typeof(DiffEqGPU.diffeqgpunorm)}}})
    @ DiffEqBase ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:179
 [21] #solve#40
    @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:165 [inlined]
 [22] batch_solve_up(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, probs::Vector{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}, u0::Matrix{Float32}, p::Matrix{Float32}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:319
 [23] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:284
 [24] macro expansion
    @ ./timing.jl:299 [inlined]
 [25] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, ensemblealg::EnsembleGPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:saveat, :reltol, :dense), Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:201
 [26] #solve#42
    @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:193 [inlined]
 [27] top-level scope
    @ ./timing.jl:220
in expression starting at /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:66

jmsull · 2022-06-08T01:26:48Z

I get the same error (needs_concrete_A) for the other solver we know works with Bolt - Rodas5(). It looks like we may be out of luck on this since we are using stiff solvers due to the DiffEqGPU.jl "Current limitations" section in the REDAME.jl, at least without trying something (like they show for modelingtoolkit, which I have never used) to get analytic derivatives.

xzackli · 2022-06-08T04:12:29Z

It looks like GPU diffeqs aren't as push-button as we hoped. I suspect our situation is relatively complicated for an ODE problem -- I think I can write up a toy problem analog, and ask for some assistance. At the very least, we can hope for a little more transparent debugging that way.

I remember there broadly being some issue with passing a custom struct for many of the SciML packages. That's a problem since we really don't want to be passing an array of things, we depend on these interpolators for speed.

jmsull · 2022-06-18T00:55:17Z

The Background works on GPU after changing the struct argument types for the quadrature weights. Array is too restrictive since CuArray <: Array = false. Verified that ForwardDiff still works through the kernel.

jmsull · 2022-06-18T00:57:13Z

Added ionization history as well

jmsull added 2 commits May 25, 2022 19:27

first try at gpu with ensembles

63aa3ce

remove loop storage bug

87d04af

jmsull mentioned this pull request May 27, 2022

GPUs #58

Open

k-wrapped gpu ensemble solve

22463c4

try tsit, k as an array

4a929da

Can call bg splines from gpu kernel.

dafd41d

add ih spline example on gpu

688c4a0

jmsull and others added 13 commits June 21, 2022 13:14

not yet working unpack hierarchy eval

4d5b8f7

demo of what doesn't work

95654ea

untested, but running, hierarchy! kernel function

8ba7ec7

correctness check file output

96a0d63

gpu neutrino index bugfix

6a9baa3

gpu correctness at rtol 1e-7

a6238ba

tweak script to allow benchmarking

ea974e2

fix benchmark

04a35cb

batch hierarchy benchmark

a286fe0

test solve eval

21ceded

GPU solver working

3eca224

use ComponentArray{<:CuArray} for u vector

d9edbf6

switch hierarchy! to ComponentArray

606efb5

marius311 and others added 9 commits July 1, 2022 14:24

make u.ℳ a 2D array view

acf7c88

WIP ComponentArray everything

a8f6d31

plin works with ComponentArray

ae8c11b

hierarchy! works on CPU/GPU

b761078

Merge branch 'main' into gpu

9365ecc

fix merge

152f307

fix docs

212d67a

some by-hand updates from easy_features...

347fd8f

adv hmc example, fix nan in P source func

b8738dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU testing #61

GPU testing #61

jmsull commented May 26, 2022

codecov-commenter commented May 26, 2022 •

edited

Loading

jmsull commented May 26, 2022

jmsull commented Jun 8, 2022 •

edited

Loading

jmsull commented Jun 8, 2022

jmsull commented Jun 8, 2022 •

edited

Loading

jmsull commented Jun 8, 2022 •

edited

Loading

jmsull commented Jun 8, 2022 •

edited

Loading

xzackli commented Jun 8, 2022

jmsull commented Jun 18, 2022

jmsull commented Jun 18, 2022

GPU testing #61

Are you sure you want to change the base?

GPU testing #61

Conversation

jmsull commented May 26, 2022

codecov-commenter commented May 26, 2022 • edited Loading

Codecov Report

jmsull commented May 26, 2022

jmsull commented Jun 8, 2022 • edited Loading

jmsull commented Jun 8, 2022

jmsull commented Jun 8, 2022 • edited Loading

jmsull commented Jun 8, 2022 • edited Loading

jmsull commented Jun 8, 2022 • edited Loading

xzackli commented Jun 8, 2022

jmsull commented Jun 18, 2022

jmsull commented Jun 18, 2022

codecov-commenter commented May 26, 2022 •

edited

Loading

jmsull commented Jun 8, 2022 •

edited

Loading

jmsull commented Jun 8, 2022 •

edited

Loading

jmsull commented Jun 8, 2022 •

edited

Loading

jmsull commented Jun 8, 2022 •

edited

Loading