-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Multipart Range Requests in S3Transfer's download_file #3466
Comments
Hi @forrestfwilliams thanks for reaching out. It looks like this may be a duplicate of #1215. (Also the s3transfer repo may be the best place to track these requests.) I brought this up for discussion with the team and they weren't sure about supporting multi-part download over a specific range. It seems like there was some debate on that StackOverflow post as well, although there may be some workarounds. Have you tried any workarounds and if so what has worked for you? |
Hi @tim-finnigan, thanks for your reply. Yes, this does look like the same issue as #1215. So far I have tried solutions using both python's Overall, this is a non-trivial difference in performance for our use case, and it would be great to work towards adding this functionality. I'm also happy to move this discussion to the s3transfer repo if that is more appropriate. Is there an open issue there along these lines? |
Hi @forrestfwilliams thanks for your patience. I'll go ahead and close this issue so we can continue tracking #1215 in the boto3 repo and boto/s3transfer#248 which you opened in the s3transfer repo. I plan to bring this feature request up with the team soon for further review and feedback. |
@tim-finnigan thank you. This feature will be a major feature improvement for my organization, as well as anyone trying to access subsets of data files in AWS. |
Describe the feature
Boto3 supports ranged get requests and multipart downloads, however it is not possible to perform a multi-part download over a specific range. This results in slow download times when you are trying to download a 1GB range of data from a 4GB file in S3. It would be great if a range argument were added to
TransferConfig
, that could then be passed to adownload_file
call. This would download the range of data specified, but would use multipart downloading if the range size exceed themultipart_threshold
.Use Case
I work at the Alaska Satellite Facility, where we distribute large amounts of remote sensing data to users across the globe via AWS. Many of these datasets come in legacy formats, such as zip files, that are not cloud-friendly. Due to the highly structured nature of these datasets, we can identify byte ranges that contain subsets of data that our users would be interested in downloading directly. However, since these datasets are still large (~1GB within a larger 4GB zip file), and multipart downloads are not supported for range requests, we cannot offer extraction of these dataset with low latency.
Proposed Solution
I have developed a workaround that involves using
aiobotocore
to set up threaded get requests for the range of data desired. This can be found within this benchmarking script. This is still much slower than the native multipart read.Other Information
I have also started a discussion concerning this issue on stackOverflow, but no one has found a good solution.
Acknowledgements
SDK version used
1.24.59
Environment details (OS name and version, etc.)
r5d.xlarge EC2 instance running the latest Amazon Linux (same region as S3 bucket)
The text was updated successfully, but these errors were encountered: