Offer a "low memory mode" and multiprocessing of array chunks #79

vlandau · 2020-10-02T18:36:25Z

Often grids are so large that allocating arrays requires tons of memory. Each moving window solve is independent, so a low-memory mode may be relatively simple.

Two options come to mind:

1. Split the landscape into chunks, and solve the chunks in serial

The landscape could be split into adjacent, non-overlapping rectangular or square chunks. Each chunk (and corresponding inputs) would need to be buffered by radius (omniscape argument) so that the pixels at the edge of the unbuffered chunk aren't missing any data in their moving windows. Then, results for each chunk could be stored and mosaicked, taking the max value. If done correctly, his would result in the same exact map as would be produced in Omniscape v0.4.2.

2. Lazy reading and writing for each moving window solve

A given moving window solve only needs the data directly pertinent to it, so there is no need to store the entire raster inputs in-memory. Just load what is needed for each moving window (GeoData.jl has functionality to do this), solve it, lazily-write to an output raster then remove/garbage collect data and move on to the next solve. Thread safety could be a little tricky. Each thread could write to its own output component, then output components could be summed, but this would need to be done lazily/by chunk as well, otherwise you'd end up loading all the components in memory, which would take as much memory as the current method of Omniscape. Another consideration is computing the coordinates and source strengths of target pixels (the ones on which the moving windows are centered). The biggest memory hog in Omniscape is the allocation of the array(s) that store(s) current values. They have the same number of rows and columns asn the resistance surface, but have a 3rd dimension equal to the number of parallel workers. If computing normalized current flow in addition to cumulative current, there are two of them! Because of this, even if the source strength layer needs to be loaded in full into memory, memory savings would still be substantial.

I tend to like option 2, it's the slickest, and if done in a thread safe way, I think it would be more straightforward to implement in a way such that errors from edge cases would be less likely, but IO overhead might be significant -- not sure.

The text was updated successfully, but these errors were encountered:

glaroc · 2021-02-16T16:31:36Z

This would be a huge improvement for the scalability of Omniscape. I feel like solution 1 would be infinitely scalable, which isn't the case for solution 2, no?

vlandau · 2021-02-17T16:38:47Z

Yeah, on a second read after leaving this for a while, I'm preferring option 1 above... I think it would be faster and also easy to test for correctness.

glaroc · 2021-02-17T16:58:23Z

But thinking more about it, option 2 if implemented well could be more suited for HPC where each thread is using very little memory. Ultimately, this might be the best solution.

vlandau · 2021-02-17T17:13:33Z

Yeah, I think option 1 has the opportunity to be much faster though (lazy read/write plus associated lock/unlocks in the code for thread safety for that many operations could involve massive overhead).

Also, if in option 2 there were N output GeoTiffs that were written to separately for thread safety, where N is the number of threads, then each of those N tiles would be the full size of the landscape, which could introduce additional memory bottlenecks when adding them together.

glaroc · 2021-03-11T18:54:44Z

I tested option 1 by splitting the input resistance map in Bash+GDAL first into overlapping tiles, then running Omniscape on each tile, and finally remosaic the tiles in Python using rasterio.merge(method="max). It works really well and you don't see any stitch marks in the final output. I looked at the Omniscape code and I see it would require some significant refactoring to do this within Omniscape. It also seems quite difficult to do the final merge in GDAL only using the max pixel value. I don't know if there is an equivalent merge function in Julia.

vlandau · 2021-03-11T19:17:03Z

Awesome! Glad it worked well. I think there might be a way to mosiac the TIFFs using GDAL.jl or ArchGDAL.jl? But actually, I use gdal_merge.py from the command line when merging rasters, and maybe that's not available in Julia...

We could possibly do it with GeoData.jl. There may be a way to create an empty GDALarray, then write to it lazily using GeoData.jl. The key is lazy writing, to ensure the most efficient use of RAM.

cc @rafaqz, who might have some ideas/thoughts. I think GeoData.jl (or maybe ArchGDAL?) might have what we need.

I see the workflow as having these steps:

Load in the raster inputs as GeoData.GDALarrays so they're not fully loaded into memory
Identify tile extents (such that they overlap enough, and based on user-provided params), with each "extent" being an Array{Number, 1} of length 4 (North, South, East, West coords), so we'd have an Array{Array{Number, 1}, 1}
Loop through the Array{Array{Number, 1}, 1} and load that extent of each input raster into memory, and for each:
- Run Omniscape
- Write that Omniscape output to disk
- Clear outputs from memory/garbage collect
Create an empty GDALarray and iteratively add each Omniscape output to it lazily

I'd like to start working on these components when I have time -- It might be a bit until I have a chunk of time to do it, but I think we're starting to get a somewhat clear plan of action at this point.

glaroc · 2021-03-11T19:20:18Z

Just a note to mention that gdal_merge.py doesn't allow to take the maximum value of overlapping pixels. If you don't do this, there will be some serious stitch marks.

vlandau · 2021-03-11T19:23:11Z

Also, the tiling of the landscape could end up giving us a solid foundation from which to build out functionality to enable hierarchical parallelism.

rafaqz · 2021-03-12T00:37:58Z

GeoData.jl has the open method the you use with a do block to write to the open ArchGDAL.jl/DiskArrays.jl dataset underneath. I havent really tested it for writing, more just for reading larger-than-memory files. But this should be doable and easy in Julia, I havent petsonally been pushed to ensure that because my datasets are only ever that big along the time dimension. I would be happy to help out - I think this being easy should be a goal of the julia geo ecosystem.

vlandau · 2021-03-12T01:06:34Z

Thanks @rafaqz! I'll keep you in the loop when I find time to work on this, and may solicit some feedback as I go 🙂

vlandau · 2021-09-08T20:09:12Z

Version 1 will be the way I go, as it has several benefits, including making multiprocessing easy to do, which will enable hierarchical parallel processing (e.g. see #106)

vlandau added enhancement New feature or request help wanted Extra attention from other devs/contributors requested performance Related to compute and memory efficiency labels Oct 2, 2020

vlandau self-assigned this Feb 17, 2021

This was referenced Sep 8, 2021

Use @lock for adding moving window results to cumulative current array #106

Open

Compute flow potential and current flow in separate loops to save memory #95

Open

vlandau changed the title ~~Offer a "low memory mode"~~ Offer a "low memory mode" and multiprocessing of array chunks Sep 8, 2021

vlandau added this to the Version 1.0 milestone Sep 8, 2021

vlandau removed the help wanted Extra attention from other devs/contributors requested label Oct 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offer a "low memory mode" and multiprocessing of array chunks #79

Offer a "low memory mode" and multiprocessing of array chunks #79

vlandau commented Oct 2, 2020 •

edited

Loading

glaroc commented Feb 16, 2021

vlandau commented Feb 17, 2021

glaroc commented Feb 17, 2021

vlandau commented Feb 17, 2021

glaroc commented Mar 11, 2021

vlandau commented Mar 11, 2021

glaroc commented Mar 11, 2021

vlandau commented Mar 11, 2021

rafaqz commented Mar 12, 2021

vlandau commented Mar 12, 2021

vlandau commented Sep 8, 2021 •

edited

Loading

Offer a "low memory mode" and multiprocessing of array chunks #79

Offer a "low memory mode" and multiprocessing of array chunks #79

Comments

vlandau commented Oct 2, 2020 • edited Loading

1. Split the landscape into chunks, and solve the chunks in serial

2. Lazy reading and writing for each moving window solve

glaroc commented Feb 16, 2021

vlandau commented Feb 17, 2021

glaroc commented Feb 17, 2021

vlandau commented Feb 17, 2021

glaroc commented Mar 11, 2021

vlandau commented Mar 11, 2021

glaroc commented Mar 11, 2021

vlandau commented Mar 11, 2021

rafaqz commented Mar 12, 2021

vlandau commented Mar 12, 2021

vlandau commented Sep 8, 2021 • edited Loading

vlandau commented Oct 2, 2020 •

edited

Loading

vlandau commented Sep 8, 2021 •

edited

Loading