Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in expression starting at none:0 #129

Open
slamander opened this issue Apr 12, 2023 · 3 comments
Open

Segmentation fault in expression starting at none:0 #129

slamander opened this issue Apr 12, 2023 · 3 comments

Comments

@slamander
Copy link

Hi OS community,

I've been troubleshooting my analyses for some time now--getting similar issues to those described in this issue section (e.g., "task failed on specific row and column", and "AssertionError: norm(G * v .- curr) / norm(curr) < 1.0e-6").

Now that I've found a solution to these errors, my analyses are finally running longer than before--in some cases up to 90%. But now half of them are failing with the message below, and creating a core dump. I've found some mention of this on the general Julia boards, with talk of rather complicated memory allocation issues (I couldn't make any sense of it). I'm running these analyses on a computing cluster (4 cpus, 32gb ram / cpu).

signal (11): Segmentation fault
in expression starting at none:0

Here's the full stack trace:

Stacktrace:
 [1] wait
   @ ./task.jl:345 [inlined]
 [2] threading_run(fun::Omniscape.var"#161#threadsfor_fun#12"{Omniscape.var"#161#threadsfor_fun#11#13"{Int64, ProgressMeter.Progress, Int64, Dict{String, String}, Omniscape.ConditionLayers{Float64, 2}, Omniscape.Conditions, Omniscape.OmniscapeFlags, DataType, Dict{String, Int64}, UnitRange{Int64}}}, static::Bool)
   @ Base.Threads ./threadingconstructs.jl:38
 [3] macro expansion
   @ ./threadingconstructs.jl:89 [inlined]
 [4] run_omniscape(cfg::Dict{String, String}, resistance::Matrix{Union{Missing, Float64}}; reclass_table::Matrix{Union{Missing, Float64}}, source_strength::Matrix{Union{Missing, Float64}}, condition1::Matrix{Union{Missing, Float64}}, condition2::Matrix{Union{Missing, Float64}}, condition1_future::Matrix{Union{Missing, Float64}}, condition2_future::Matrix{Union{Missing, Float64}}, wkt::String, geotransform::Vector{Float64}, write_outputs::Bool)
   @ Omniscape ~/.julia/packages/Omniscape/9gHf2/src/main.jl:257
 [5] run_omniscape(path::String)
   @ Omniscape ~/.julia/packages/Omniscape/9gHf2/src/main.jl:536
 [6] top-level scope
   @ /blue/scheffers/jbaecher/global_connectivity/julia_scripts/hpg_Asia_Europe.jl:7

    nested task error: 
Progress:   9%|████�                                             |  ETA: 14:22:01�[K
signal (11): Segmentation fault
in expression starting at none:0
__GI_memset at /lib64/libc.so.6 (unknown line)
cholmod_l_super_numeric at /apps/julia/1.8.2/lib/julia/libcholmod.so (unknown line)
cholmod_l_factorize_p at /apps/julia/1.8.2/lib/julia/libcholmod.so (unknown line)
cholmod_l_factorize_p at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/SuiteSparse/lib/x86_64-linux-gnu.jl:1116
unknown function (ip: 0x2abb91df2a00)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
factorize_p! at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/SuiteSparse/src/cholmod.jl:616
#cholesky!#6 at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/SuiteSparse/src/cholmod.jl:1147
cholesky!##kw at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/SuiteSparse/src/cholmod.jl:1143 [inlined]
#cholesky#8 at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/SuiteSparse/src/cholmod.jl:1185 [inlined]
cholesky at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/SuiteSparse/src/cholmod.jl:1178 [inlined]
#cholesky#9 at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/SuiteSparse/src/cholmod.jl:1297 [inlined]
cholesky at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/SuiteSparse/src/cholmod.jl:1297 [inlined]
macro expansion at ./timing.jl:382 [inlined]
construct_cholesky_factor at /home/jbaecher/.julia/packages/Circuitscape/33lUW/src/core.jl:496
multiple_solve at /home/jbaecher/.julia/packages/Circuitscape/33lUW/src/raster/advanced.jl:319
multiple_solver at /home/jbaecher/.julia/packages/Circuitscape/33lUW/src/raster/advanced.jl:291
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
compute_omniscape_current at /home/jbaecher/.julia/packages/Circuitscape/33lUW/src/utils.jl:529
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
solve_target! at /home/jbaecher/.julia/packages/Omniscape/9gHf2/src/utils.jl:332
unknown function (ip: 0x2abb91dfe3b0)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
macro expansion at /home/jbaecher/.julia/packages/Omniscape/9gHf2/src/main.jl:264 [inlined]
#161#threadsfor_fun#11 at ./threadingconstructs.jl:84
#161#threadsfor_fun at ./threadingconstructs.jl:51 [inlined]
#1 at ./threadingconstructs.jl:30
unknown function (ip: 0x2abb91df9b2f)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gf.c:2549
jl_apply at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/julia.h:1839 [inlined]
start_task at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/task.c:931
Allocations: 194969686 (Pool: 192768960; Big: 2200726); GC: 702
/tmp/slurmd/job61564683/slurm_script: line 24: 13146 Segmentation fault      (core dumped) julia -p ${SLURM_CPUS_ON_NODE} julia_scripts/hpg_Asia_Europe.jl
Tue Apr 11 17:36:24 EDT 2023

@slamander
Copy link
Author

On another set of input data, I received the following stacktrace:

Stacktrace:
 [1] wait
   @ ./task.jl:345 [inlined]
 [2] threading_run(fun::Omniscape.var"#161#threadsfor_fun#12"{Omniscape.var"#161#threadsfor_fun#11#13"{Int64, ProgressMeter.Progress, Int64, Dict{String, String}, Omniscape.ConditionLayers{Float64, 2}, Omniscape.Conditions, Omniscape.OmniscapeFlags, DataType, Dict{String, Int64}, UnitRange{Int64}}}, static::Bool)
   @ Base.Threads ./threadingconstructs.jl:38
 [3] macro expansion
   @ ./threadingconstructs.jl:89 [inlined]
 [4] run_omniscape(cfg::Dict{String, String}, resistance::Matrix{Union{Missing, Float64}}; reclass_table::Matrix{Union{Missing, Float64}}, source_strength::Matrix{Union{Missing, Float64}}, condition1::Matrix{Union{Missing, Float64}}, condition2::Matrix{Union{Missing, Float64}}, condition1_future::Matrix{Union{Missing, Float64}}, condition2_future::Matrix{Union{Missing, Float64}}, wkt::String, geotransform::Vector{Float64}, write_outputs::Bool)
   @ Omniscape ~/.julia/packages/Omniscape/9gHf2/src/main.jl:257
 [5] run_omniscape(path::String)
   @ Omniscape ~/.julia/packages/Omniscape/9gHf2/src/main.jl:536
 [6] top-level scope
   @ /blue/scheffers/jbaecher/global_connectivity/julia_scripts/hpg_australia.jl:5

    nested task error: 
Progress:  26%|█████████████�                                    |  ETA: 0:56:30�[K
signal (11): Segmentation fault
in expression starting at none:0
dgemv_kernel_4x4 at /apps/julia/1.8.2/bin/../lib/julia/libopenblas64_.so (unknown line)
dgemv_t_ZEN at /apps/julia/1.8.2/bin/../lib/julia/libopenblas64_.so (unknown line)
dgemv_64_ at /apps/julia/1.8.2/bin/../lib/julia/libopenblas64_.so (unknown line)
/tmp/slurmd/job61477046/slurm_script: line 24: 23882 Segmentation fault      (core dumped) julia -p ${SLURM_CPUS_ON_NODE} julia_scripts/hpg_australia.jl
Mon Apr 10 15:58:11 EDT 2023

@vlandau
Copy link
Member

vlandau commented Sep 4, 2023

Sorry for the incredibly late reply. Oof, this one might be beyond me. If it's giving a core dump, then it could be that the only way to get to the bottom of it is to actually inspect that, but that's something I'm not well versed at. Did you ever get things working?

One thing you might test (even though it will of course take longer) would be to run it in serial. At least this way we could determine if the issue lies with multithreading.

@slamander
Copy link
Author

slamander commented Oct 2, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants