Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU testing #61

Draft
wants to merge 28 commits into
base: main
Choose a base branch
from
Draft

GPU testing #61

wants to merge 28 commits into from

Conversation

jmsull
Copy link
Collaborator

@jmsull jmsull commented May 26, 2022

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented May 26, 2022

Codecov Report

Merging #61 (b761078) into main (d97d7c0) will decrease coverage by 25.04%.
The diff coverage is 0.00%.

❗ Current head b761078 differs from pull request most recent head 152f307. Consider uploading reports for the commit 152f307 to get more accurate results

@@             Coverage Diff             @@
##             main      #61       +/-   ##
===========================================
- Coverage   70.74%   45.70%   -25.05%     
===========================================
  Files           7        8        +1     
  Lines         694      698        +4     
===========================================
- Hits          491      319      -172     
- Misses        203      379      +176     
Impacted Files Coverage Δ
src/Bolt.jl 100.00% <ø> (+100.00%) ⬆️
src/background.jl 86.36% <ø> (-6.82%) ⬇️
src/gpu.jl 0.00% <0.00%> (ø)
src/perturbations.jl 0.00% <0.00%> (-79.27%) ⬇️
src/spectra.jl 0.00% <0.00%> (ø)
src/util.jl 98.41% <0.00%> (-1.59%) ⬇️
src/ionization/recfast.jl 91.85% <0.00%> (+0.07%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d97d7c0...152f307. Read the comment docs.

@jmsull
Copy link
Collaborator Author

jmsull commented May 26, 2022

Starting to look at running on GPUs. After confirming I can run the readme examples on NERSC, I tried to do a simple parallelization of the boltsolve call to the ODE solver over k, following the DIffEqGPU.jl readme and the docs here.

To do this you should be able to pass the argument EnsembleGPUArray() to the ensemble problem as I tried to do in /examples/gpu_ensemble_boltsolve.jl. This doesn't work, apparently since (as the name indicates) EnsembleGPUArray() actually needs to map over an Array, and I am trying to do it over hierarchy structs (which works fine for EnsembleThreads() syntactically, so nothing obviously wrong there). The error I get when running the commented line in the script is:

ERROR: MethodError: no method matching Array(::Hierarchy{Float64, BasicNewtonian,

Can quickly get around this by wrapping hierarchy! and taking only k as a parameter, so will try that.

@jmsull jmsull mentioned this pull request May 27, 2022
@jmsull
Copy link
Collaborator Author

jmsull commented Jun 8, 2022

The idea I had above did not work. I updated that file (/examples/gpu_ensemble_boltsolve.jl.) to do this wrapping (which appears to work fine for the commented out EnsembleThreads() solve, by the way).

However, I get an error that looks similar to before (though now it is apparently only upset about applying "Array" to a Float32 value rather than to the hierarchy object). What this looks like is it is trying to convert the single float k into an array...as I can reproduce this method error at the top with Array(Float32(1.))

Stack trace (not sure why getting these warnings now also...):
ERROR: LoadError: MethodError: no method matching Array(::Float32) ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 ┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable. └ @ SciMLBase ~/.julia/packages/SciMLBase/OHiiA/src/integrator_interface.jl:345 Closest candidates are: Array(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY}) at /global/common/software/m4051/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/qr.jl:442 Array(::Union{LinearAlgebra.Hermitian, LinearAlgebra.Symmetric}) at /global/common/software/m4051/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/symmetric.jl:271 Array(::LinearAlgebra.SVD) at /global/common/software/m4051/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/svd.jl:571 ... Stacktrace: [1] (::DiffEqGPU.var"#24#30")(i::Int64) @ DiffEqGPU ./none:0 [2] MappingRF @ ./reduce.jl:95 [inlined] [3] _foldl_impl(op::Base.MappingRF{DiffEqGPU.var"#24#30", Base.BottomRF{typeof(hcat)}}, init::Base._InitialValue, itr::UnitRange{Int64}) @ Base ./reduce.jl:58 [4] foldl_impl @ ./reduce.jl:48 [inlined] [5] mapfoldl_impl @ ./reduce.jl:44 [inlined] [6] #mapfoldl#244 @ ./reduce.jl:162 [inlined] [7] mapfoldl @ ./reduce.jl:162 [inlined] [8] #mapreduce#248 @ ./reduce.jl:289 [inlined] [9] mapreduce @ ./reduce.jl:289 [inlined] [10] #reduce#250 @ ./reduce.jl:458 [inlined] [11] reduce @ ./reduce.jl:458 [inlined] [12] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Float32, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}}) @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:283 [13] macro expansion @ ./timing.jl:299 [inlined] [14] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Float32, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:saveat, :reltol, :dense), Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}}) @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:201 [15] #solve#42 @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:193 [inlined] [16] top-level scope @ ./timing.jl:220

@jmsull
Copy link
Collaborator Author

jmsull commented Jun 8, 2022

After perusing the DiffEqGPU.jl github issues I think k must be passed as an Array rather than a scalar value...

@jmsull
Copy link
Collaborator Author

jmsull commented Jun 8, 2022

Ok so after fixing this I get a new problem that occurs apparently where the solver is actually running (as opposed to just the setup failing). This seems like a problem with the solver itself? I don't actually see it calling "needs_concrete_A()"... Change pushed below, and here is the stack trace

jsull@nid002601:/pscratch/sd/j/jsull/julia/Bolt.jl/examples> julia gpu_ensemble_boltsolve.jl
  Activating project at `/pscratch/sd/j/jsull/julia/Bolt.jl`
pre type of u0, Vector{Float64}
pre type of kgrid, Vector{Float64}
pre type of xi, Float64
post type of u0, Vector{Float32}
post type of kgrid, Matrix{Float32}
Succesfully evaluted gpu derivative
test deriv: nothing
one elementFloat32
ERROR: LoadError: MethodError: no method matching needs_concrete_A(::LinSolveGPUSplitFactorize)
Closest candidates are:
  needs_concrete_A(::LinearSolve.AbstractFactorization) at ~/.julia/packages/LinearSolve/7vwOr/src/LinearSolve.jl:33
  needs_concrete_A(::LinearSolve.AbstractKrylovSubspaceMethod) at ~/.julia/packages/LinearSolve/7vwOr/src/LinearSolve.jl:34
  needs_concrete_A(::LinearSolve.AbstractSolveFunction) at ~/.julia/packages/LinearSolve/7vwOr/src/LinearSolve.jl:35
Stacktrace:
  [1] build_J_W(alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, uprev::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, t::Float64, dt::Float64, f::ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, #unused#::Type{Float32}, #unused#::Val{true})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/derivative_utils.jl:704
  [2] build_nlsolver(alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, nlalg::NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, uprev::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, t::Float64, dt::Float64, f::Function, rate_prototype::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, #unused#::Type{Float32}, #unused#::Type{Float32}, #unused#::Type{Float64}, γ::Float64, c::Float64, α::Int64, #unused#::Val{true})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/nlsolve/utils.jl:149
  [3] build_nlsolver
    @ ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/nlsolve/utils.jl:124 [inlined]
  [4] build_nlsolver(alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, uprev::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, t::Float64, dt::Float64, f::Function, rate_prototype::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, #unused#::Type{Float32}, #unused#::Type{Float32}, #unused#::Type{Float64}, γ::Float64, c::Float64, iip::Val{true})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/nlsolve/utils.jl:118
  [5] alg_cache(alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, rate_prototype::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, #unused#::Type{Float32}, #unused#::Type{Float32}, #unused#::Type{Float64}, uprev::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, uprev2::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, f::ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, t::Float64, dt::Float64, reltol::Float64, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, calck::Bool, #unused#::Val{true})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/caches/kencarp_kvaerno_caches.jl:179
  [6] __init(prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, alg::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, timeseries_init::Tuple{}, ts_init::Tuple{}, ks_init::Tuple{}, recompile::Type{Val{true}}; saveat::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, tstops::Tuple{}, d_discontinuities::Tuple{}, save_idxs::Nothing, save_everystep::Bool, save_on::Bool, save_start::Bool, save_end::Nothing, callback::Nothing, dense::Bool, calck::Bool, dt::Float64, dtmin::Nothing, dtmax::Float64, force_dtmin::Bool, adaptive::Bool, gamma::Rational{Int64}, abstol::Nothing, reltol::Float64, qmin::Rational{Int64}, qmax::Int64, qsteady_min::Int64, qsteady_max::Rational{Int64}, beta1::Nothing, beta2::Nothing, qoldinit::Rational{Int64}, controller::Nothing, fullnormalize::Bool, failfactor::Int64, maxiters::Int64, internalnorm::typeof(DiffEqGPU.diffeqgpunorm), internalopnorm::typeof(LinearAlgebra.opnorm), isoutofdomain::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), unstable_check::DiffEqGPU.var"#12#18", verbose::Bool, timeseries_errors::Bool, dense_errors::Bool, advance_to_tstop::Bool, stop_at_next_tstop::Bool, initialize_save::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), userdata::Nothing, allow_extrapolation::Bool, initialize_integrator::Bool, alias_u0::Bool, alias_du0::Bool, initializealg::OrdinaryDiffEq.DefaultInit, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/solve.jl:295
  [7] __solve(::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, ::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}; kwargs::Base.Pairs{Symbol, Any, NTuple{6, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense, :callback, :internalnorm), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool, Nothing, typeof(DiffEqGPU.diffeqgpunorm)}}})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/solve.jl:4
  [8] solve_call(_prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, args::KenCarp4{12, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}; merge_callbacks::Bool, kwargs::Base.Pairs{Symbol, Any, NTuple{6, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense, :callback, :internalnorm), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool, Nothing, typeof(DiffEqGPU.diffeqgpunorm)}}})
    @ DiffEqBase ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:152
  [9] solve_up(prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, sensealg::Nothing, u0::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, args::KenCarp4{0, true, LinSolveGPUSplitFactorize, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}; kwargs::Base.Pairs{Symbol, Any, NTuple{7, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense, :callback, :merge_callbacks, :internalnorm), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool, Nothing, Bool, typeof(DiffEqGPU.diffeqgpunorm)}}})
    @ DiffEqBase ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:179
 [10] #solve#40
    @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:165 [inlined]
 [11] batch_solve_up(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, probs::Vector{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}, u0::Matrix{Float32}, p::Matrix{Float32}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:319
 [12] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:284
 [13] macro expansion
    @ ./timing.jl:299 [inlined]
 [14] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::KenCarp4{0, true, Nothing, NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing}, ensemblealg::EnsembleGPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:saveat, :reltol, :dense), Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:201
 [15] #solve#42
    @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:193 [inlined]
 [16] top-level scope
    @ ./timing.jl:220
in expression starting at /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:65

@jmsull
Copy link
Collaborator Author

jmsull commented Jun 8, 2022

I looked to see what I am doing differently from the readme example - I tried Tsit5 instead of KenCarp4. Now I get an error that seems a bit more sinister...stuck on this, will leave it alone for now. Will note that Tsit5 with ensemble threads runs but aborts due to instability so probably wouldn't give good results here anyways - but it should at least run.

ERROR: LoadError: InvalidIRError: compiling kernel gpu_gpu_kernel(Cassette.Context{nametype(CUDACtx), Nothing, Nothing, KernelAbstractions.var"##PassType#291", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, typeof(gpu_ensemble_hierarchy!), CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, Float64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to overdub)
Stacktrace:
 [1] gpu_ensemble_hierarchy!(::SubArray{Float32, 1, CuDeviceMatrix{Float32, 1}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, ::SubArray{Float32, 1, CUDA.Const{Float32, 2, 1}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, ::SubArray{Float32, 1, CUDA.Const{Float32, 2, 1}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, ::Float64)
   @ /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:41
 [2] overdub
   @ /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:41
 [3] macro expansion
   @ ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:20
 [4] overdub
   @ ~/.julia/packages/KernelAbstractions/3ZHln/src/macros.jl:80
 [5] overdub
   @ ~/.julia/packages/Cassette/34vIw/src/overdub.jl:0
Reason: unsupported dynamic function invocation (call to overdub)
Stacktrace:
 [1] overdub
   @ /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:43
 [2] macro expansion
   @ ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:20
 [3] overdub
   @ ~/.julia/packages/KernelAbstractions/3ZHln/src/macros.jl:80
 [4] overdub
   @ ~/.julia/packages/Cassette/34vIw/src/overdub.jl:0
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(Cassette.overdub), Tuple{Cassette.Context{nametype(CUDACtx), Nothing, Nothing, KernelAbstractions.var"##PassType#291", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, typeof(gpu_ensemble_hierarchy!), CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, Float64}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/validation.jl:124
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/1FdJy/src/driver.jl:386 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/2IJ7m/src/TimerOutput.jl:252 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/1FdJy/src/driver.jl:384 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/utils.jl:64
  [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:332
  [7] #260
    @ ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:325 [inlined]
  [8] JuliaContext(f::CUDA.var"#260#261"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(Cassette.overdub), Tuple{Cassette.Context{nametype(CUDACtx), Nothing, Nothing, KernelAbstractions.var"##PassType#291", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, typeof(gpu_ensemble_hierarchy!), CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, Float64}}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/driver.jl:74
  [9] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:324
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1FdJy/src/cache.jl:90
 [11] cufunction(f::typeof(Cassette.overdub), tt::Type{Tuple{Cassette.Context{nametype(CUDACtx), Nothing, Nothing, KernelAbstractions.var"##PassType#291", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, typeof(gpu_ensemble_hierarchy!), CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, CuDeviceMatrix{Float32, 1}, Float64}}; name::String, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:297
 [12] macro expansion
    @ ~/.julia/packages/CUDA/Uurn4/src/compiler/execution.jl:102 [inlined]
 [13] (::KernelAbstractions.Kernel{CUDAKernels.CUDADevice, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(DiffEqGPU.gpu_gpu_kernel)})(::Function, ::Vararg{Any}; ndrange::Int64, dependencies::CUDAKernels.CudaEvent, workgroupsize::Int64, progress::Function)
    @ CUDAKernels ~/.julia/packages/CUDAKernels/kCOA4/src/CUDAKernels.jl:194
 [14] (::DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)})(du::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, u::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, t::Float64)
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:414
 [15] ODEFunction
    @ ~/.julia/packages/SciMLBase/OHiiA/src/scimlfunctions.jl:345 [inlined]
 [16] initialize!(integrator::OrdinaryDiffEq.ODEIntegrator{Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Nothing, Float64, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Float64, Float32, Float32, Float64, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, ODESolution{Float32, 3, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Nothing, Nothing, Vector{Float64}, Vector{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, OrdinaryDiffEq.InterpolationData{ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Vector{Float64}, Vector{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, OrdinaryDiffEq.Tsit5Cache{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float64}, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}}, DiffEqBase.DEStats}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, OrdinaryDiffEq.Tsit5Cache{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float64}, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, OrdinaryDiffEq.DEOptions{Float32, Float64, Float32, Float64, PIController{Rational{Int64}}, typeof(DiffEqGPU.diffeqgpunorm), typeof(LinearAlgebra.opnorm), Nothing, CallbackSet{Tuple{}, Tuple{}}, typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), DiffEqGPU.var"#12#18", DataStructures.BinaryHeap{Float64, DataStructures.FasterForward}, DataStructures.BinaryHeap{Float64, DataStructures.FasterForward}, Nothing, Nothing, Int64, Tuple{}, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Tuple{}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Float32, Nothing, OrdinaryDiffEq.DefaultInit}, cache::OrdinaryDiffEq.Tsit5Cache{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float64}, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/perform_step/low_order_rk_perform_step.jl:627
 [17] __init(prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, timeseries_init::Tuple{}, ts_init::Tuple{}, ks_init::Tuple{}, recompile::Type{Val{true}}; saveat::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, tstops::Tuple{}, d_discontinuities::Tuple{}, save_idxs::Nothing, save_everystep::Bool, save_on::Bool, save_start::Bool, save_end::Nothing, callback::Nothing, dense::Bool, calck::Bool, dt::Float64, dtmin::Nothing, dtmax::Float64, force_dtmin::Bool, adaptive::Bool, gamma::Rational{Int64}, abstol::Nothing, reltol::Float64, qmin::Rational{Int64}, qmax::Int64, qsteady_min::Int64, qsteady_max::Int64, beta1::Nothing, beta2::Nothing, qoldinit::Rational{Int64}, controller::Nothing, fullnormalize::Bool, failfactor::Int64, maxiters::Int64, internalnorm::typeof(DiffEqGPU.diffeqgpunorm), internalopnorm::typeof(LinearAlgebra.opnorm), isoutofdomain::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), unstable_check::DiffEqGPU.var"#12#18", verbose::Bool, timeseries_errors::Bool, dense_errors::Bool, advance_to_tstop::Bool, stop_at_next_tstop::Bool, initialize_save::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), userdata::Nothing, allow_extrapolation::Bool, initialize_integrator::Bool, alias_u0::Bool, alias_du0::Bool, initializealg::OrdinaryDiffEq.DefaultInit, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/solve.jl:456
 [18] #__solve#502
    @ ~/.julia/packages/OrdinaryDiffEq/ZBye7/src/solve.jl:4 [inlined]
 [19] #solve_call#39
    @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:152 [inlined]
 [20] solve_up(prob::ODEProblem{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Float64, Float64}, true, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(gpu_ensemble_hierarchy!), typeof(DiffEqGPU.gpu_kernel)}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, sensealg::Nothing, u0::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, p::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, args::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}; kwargs::Base.Pairs{Symbol, Any, NTuple{7, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense, :callback, :merge_callbacks, :internalnorm), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool, Nothing, Bool, typeof(DiffEqGPU.diffeqgpunorm)}}})
    @ DiffEqBase ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:179
 [21] #solve#40
    @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:165 [inlined]
 [22] batch_solve_up(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, probs::Vector{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}, u0::Matrix{Float32}, p::Matrix{Float32}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:319
 [23] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :saveat, :reltol, :dense), Tuple{DiffEqGPU.var"#12#18", StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:284
 [24] macro expansion
    @ ./timing.jl:299 [inlined]
 [25] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{Float32}, Tuple{Float64, Float64}, true, Vector{Float32}, ODEFunction{true, typeof(gpu_ensemble_hierarchy!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, var"#1#2", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, ensemblealg::EnsembleGPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:saveat, :reltol, :dense), Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Float64, Bool}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/EbOrc/src/DiffEqGPU.jl:201
 [26] #solve#42
    @ ~/.julia/packages/DiffEqBase/ziNGu/src/solve.jl:193 [inlined]
 [27] top-level scope
    @ ./timing.jl:220
in expression starting at /pscratch/sd/j/jsull/julia/Bolt.jl/examples/gpu_ensemble_boltsolve.jl:66

@jmsull
Copy link
Collaborator Author

jmsull commented Jun 8, 2022

I get the same error (needs_concrete_A) for the other solver we know works with Bolt - Rodas5(). It looks like we may be out of luck on this since we are using stiff solvers due to the DiffEqGPU.jl "Current limitations" section in the REDAME.jl, at least without trying something (like they show for modelingtoolkit, which I have never used) to get analytic derivatives.

@xzackli
Copy link
Owner

xzackli commented Jun 8, 2022

It looks like GPU diffeqs aren't as push-button as we hoped. I suspect our situation is relatively complicated for an ODE problem -- I think I can write up a toy problem analog, and ask for some assistance. At the very least, we can hope for a little more transparent debugging that way.

I remember there broadly being some issue with passing a custom struct for many of the SciML packages. That's a problem since we really don't want to be passing an array of things, we depend on these interpolators for speed.

@jmsull
Copy link
Collaborator Author

jmsull commented Jun 18, 2022

The Background works on GPU after changing the struct argument types for the quadrature weights. Array is too restrictive since CuArray <: Array = false. Verified that ForwardDiff still works through the kernel.

@jmsull
Copy link
Collaborator Author

jmsull commented Jun 18, 2022

Added ionization history as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants