Amazon S3 Transfer Manager for Python

Overview

s3transfer - An Amazon S3 Transfer Manager for Python

S3transfer is a Python library for managing Amazon S3 transfers.

Note

This project is not currently GA. If you are planning to use this code in production, make sure to lock to a minor version as interfaces may break from minor version to minor version. For a basic, stable interface of s3transfer, try the interfaces exposed in boto3

Comments
  • readable() is inconsistent across python versions

    readable() is inconsistent across python versions

    I was looking to copy what readable() does in order to add a similar writable() function, but it has inconsistent behavior across python versions. readable() will tell you if the file was opened readable (or rather if read() will actually succeed), whereas the code path in python2 will simply tell you if the fileobj has a read method, without any comment on whether or not read will fail. Consider:

    Python 2.7.11 (default, Jan 22 2016, 08:29:18)
    >>> from s3transfer.compat import readable
    >>> readable(open('/tmp/asdfafgafdsfd', 'w'))
    True
    
    Python 3.5.1 (default, Jan 22 2016, 08:54:32)
    >>> from s3transfer.compat import readable
    >>> readable(open('/tmp/adfasfasdfd', 'w'))
    False
    

    I would expect us to try, within reason, to have consistent behavior across python versions.

    enhancement 
    opened by jamesls 17
  • Implement first method transfer manager

    Implement first method transfer manager

    Has the ability to upload files to s3 by providing the filename.

    Sorry that it is really long. I am hoping that with the way in which I designed the interface of the internals, it is going to take a significant less amount of effort and code to add the other methods/functionality. It just did not make sense to me break the internals up into separate PR's without any knowledge/reference of how I was going to use them. I would be more than happy to explain how everything fits together in person to help with the review.

    cc @jamesls @mtdowling @rayluo @JordonPhillips

    opened by kyleknap 15
  • Implement download method for TransferManager

    Implement download method for TransferManager

    cc @jamesls @JordonPhillips

    Review Progress

    • [x] s3transfer/compat.py
    • [x] s3transfer/download.py
    • [x] s3transfer/exceptions.py
    • [x] s3transfer/futures.py
    • [x] s3transfer/manager.py
    • [x] s3transfer/utils.py
    • [x] tests/__init__.py
    • [x] tests/functional/test_download.py
    • [x] tests/integration/__init__.py
    • [x] tests/integration/test_download.py
    • [x] tests/unit/test_download.py
    • [x] tests/unit/test_upload.py
    • [x] tests/unit/test_utils.py
    opened by kyleknap 11
  • Allow using ExpectedBucketOwner

    Allow using ExpectedBucketOwner

    Adds ExpectedBucketOwner to the allowed ExtraArgs for uploads to protect against bucket sniping when uploading files.

    The underlying APIs already complain if you give them args that don't match, so I'm a little curious why this extra layer/check helps, but regardless adding this allows me to use ExpectedBucketOwner without issue.

    opened by liquidpele 10
  • Adjust chunksize if necessary

    Adjust chunksize if necessary

    This will adjust the chunksize of an upload or copy if it is too small, too large, or if it looks like max parts will be exceeded. This is unnecessary for downloads since there are no such restrictions for ranged downloading.

    cc @kyleknap @jamesls

    needs-review 
    opened by JordonPhillips 10
  • Latest Release Causes `futures` package to be included in Python3 installs

    Latest Release Causes `futures` package to be included in Python3 installs

    Hey,

    You might see some more traffic here soon. It looks like the latest release, 0.3.5 fixed a type in setup.cfg that caused the futures package to be installed for all installs, not just Python 2.7. aws-cli pins this library to >=0.3.0, <0.4.0 so new installs will automatically grab this version and error out with something similar to the following.

    pip, install, awscli) exited with code 1.     ERROR: Command errored out with exit status 1:
         command: /python3/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-dv9sw2br/futures/setup.py'"'"'; __file__='"'"'/tmp/pip-install-dv9sw2br/futures/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-dv9sw2br/futures/pip-egg-info
             cwd: /tmp/pip-install-dv9sw2br/futures/
        Complete output (4 lines):
        This backport is meant only for Python 2.
        It does not work on Python 3, and Python 3 users do not need it as the concurrent.futures package is available in the standard library.
        For projects that work on both Python 2 and 3, the dependency needs to be conditional on the Python version, like so:
        extras_require={':python_version == "2.7"': ['futures']}
        ----------------------------------------
    ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
    

    To fix this issue we had to manually pin s3transfer==0.3.4 for now. Would be awesome if a fix could be released so installing s3transfer on python3 envs works again.

    opened by andrewgross 9
  • Fix hanging issue when hanging on many files

    Fix hanging issue when hanging on many files

    This PR fixes an issue where if there is a bunch of transfer queued up in the submission executor and a Cntrl-C cancels the entire transfer, the entire transfer will hang. Essentially it boils down to the fact in the TransferManager.__exit__ we will call TransferFutures.result() for all transfers that have not completed. Then if a transfer has been queued but not processed by the submission executor, the submission task's _main() will not be called because it will see that the transfer is cancelled. The problem though is that done needs to be announced to unblock TransferFutures.result() and the submission task relies on the tasks it submits to announce done. So if the submission task never submits tasks, done will never be announced.

    The fix is to announce done in the submission task if we know its _main() is going to be skipped over.

    Among this change I did the following to compliment it:

    • I added integration tests to catch this. I added them before implementing the fix and they were failing pretty much every time so that was good.
    • I added transfer_id to the repr of Task and TransferCoordinator. This will make it easier for us to determine what tasks are coordinators are related to a specific transfer and debugging for the future.
    • Added some debugging in the wait() where if the transfer was hanging it will tell us what transfer it was waiting on before the user hit Cntrl-C and give us a better indication on what may be hanging in the future.

    cc @jamesls @JordonPhillips

    needs-additional-reviewer 
    opened by kyleknap 9
  • Add support for scripts that spin up processes

    Add support for scripts that spin up processes

    Before was not able to accurately track aws cli streaming operations relative to memory usage and cpu usage because the streaming tasks spin off processes in piping inputs and outputs. Now with this PR it is possible to track these numbers.

    cc @jamesls @JordonPhillips

    opened by kyleknap 9
  • Add support for non-seekable streaming uploads

    Add support for non-seekable streaming uploads

    The tricky part of this change is that existing code relies on the ability to get the length of the object being uploaded, which is not possible for non-seekable objects. To work around that issue, this will read data into memory up to the configured multipart threshold to determine if it requires a multipart upload. That data will then be used to generate the initial parts.

    cc @kyleknap @jamesls

    needs-review 
    opened by JordonPhillips 9
  • Support all seek whence values

    Support all seek whence values

    This supersedes #88 and takes into account the comments in https://github.com/boto/s3transfer/pull/88#pullrequestreview-45828413. In particular, it handles the ReadFileChunk abstraction as suggested by @kyleknap and adds some tests for seeking with non-0 whence values. I squashed my ReadFileChunk changes in @kamalmostafa's patch.

    opened by dbnicholson 8
  • Implement seek whence param for zero length files

    Implement seek whence param for zero length files

    Bug: https://github.com/aws/aws-cli/issues/2403 Bug-Ubuntu: https://bugs.launchpad.net/bugs/1696800

    As described in the bug reports referenced above, trying to copy a zero-length file to S3 fails (with some version combinations of python3, awscli, s3transfer, and requests) like this:

    $ aws s3 cp emptyfile s3://somebucket
    upload failed: ./emptyfile to s3://somebucket/emptyfile seek() takes 2 positional arguments but 3 were given
    

    ... due to the incomplete implementation of seek() -- missing the optional 'whence' argument -- in s3transfer's ReadFileChunk method. This patch adds the whence argument to all of s3transfer's seek implementations.

    incorporating-feedback 
    opened by kamalmostafa 8
  • Test: TestBaseManager no longer works under python 3.8+

    Test: TestBaseManager no longer works under python 3.8+

    Issue #229

    This test is no longer needed:

    • Python 3.8+ pickle can not serialize local classes
    • Python 3.8+ can no longer correctly handle manager.start(signal.signal, (signal.SIGINT, signal.SIG_IGN)). It will cause the error PicklingError: Can't pickle <function signal at ...>: it's not the same object as _signal.signal
    • s3transfer no longer calls manager.start as above.
    opened by cfxegbert 0
  • extra_args not being passed to get_object in MultipartDownloader

    extra_args not being passed to get_object in MultipartDownloader

    s3transfer 0.6.0

    In S3Transfer.download if the object is over a predetermined size a multipart download is used. extra_args is being passed to MultipartDownloader.download_file but download_file does not pass it to self.client.get_object. This causes failures when using any of the ALLOWED_DOWNLOAD_ARGS.

    opened by cfxegbert 0
  • SlidingWindowSemaphore uses TaskSemaphore as a base class

    SlidingWindowSemaphore uses TaskSemaphore as a base class

    SlidingWindowSemaphore inherits TaskSemaphore but overrides all of its methods. TaskSemaphore has a _semaphore member variable which is not used by SlidingWindowSemaphore. __init__ is never called on the base class so the member is not created but could cause future maintenance headaches.

    A new base interface class needs to be created or just use duck typing for the semaphore classes.

    opened by cfxegbert 0
Owner
the boto project
the boto project
A simple password manager I typed with python using MongoDB .

Python with MongoDB A simple python code example using MongoDB. How do i run this code • First of all you need to have a python on your computer. If y

null 31 Dec 6, 2022
A simple Python tool to transfer data from MySQL to SQLite 3.

MySQL to SQLite3 A simple Python tool to transfer data from MySQL to SQLite 3. This is the long overdue complimentary tool to my SQLite3 to MySQL. It

Klemen Tusar 126 Jan 3, 2023
Creating a python package to convert /transfer excelsheet data to a mysql Database Table

Creating a python package to convert /transfer excelsheet data to a mysql Database Table

Odiwuor Lameck 1 Jan 7, 2022
MySQL database connector for Python (with Python 3 support)

mysqlclient This project is a fork of MySQLdb1. This project adds Python 3 support and fixed many bugs. PyPI: https://pypi.org/project/mysqlclient/ Gi

PyMySQL 2.2k Dec 25, 2022
MySQL database connector for Python (with Python 3 support)

mysqlclient This project is a fork of MySQLdb1. This project adds Python 3 support and fixed many bugs. PyPI: https://pypi.org/project/mysqlclient/ Gi

PyMySQL 2.2k Dec 25, 2022
Python interface to Oracle Database conforming to the Python DB API 2.0 specification.

cx_Oracle version 8.2 (Development) cx_Oracle is a Python extension module that enables access to Oracle Database. It conforms to the Python database

Oracle 841 Dec 21, 2022
PubMed Mapper: A Python library that map PubMed XML to Python object

pubmed-mapper: A Python Library that map PubMed XML to Python object 中文文档 1. Philosophy view UML Programmatically access PubMed article is a common ta

灵魂工具人 33 Dec 8, 2022
python-beryl, a Python driver for BerylDB.

python-beryl, a Python driver for BerylDB.

BerylDB 3 Nov 24, 2021
Pure Python MySQL Client

PyMySQL Table of Contents Requirements Installation Documentation Example Resources License This package contains a pure-Python MySQL client library,

PyMySQL 7.2k Jan 9, 2023
A supercharged SQLite library for Python

SuperSQLite: a supercharged SQLite library for Python A feature-packed Python package and for utilizing SQLite in Python by Plasticity. It is intended

Plasticity 703 Dec 30, 2022
ClickHouse Python Driver with native interface support

ClickHouse Python Driver ClickHouse Python Driver with native (TCP) interface support. Asynchronous wrapper is available here: https://github.com/myma

Marilyn System 957 Dec 30, 2022
DataStax Python Driver for Apache Cassandra

DataStax Driver for Apache Cassandra A modern, feature-rich and highly-tunable Python client library for Apache Cassandra (2.1+) and DataStax Enterpri

DataStax 1.3k Dec 25, 2022
Python client for Apache Kafka

Kafka Python client Python client for the Apache Kafka distributed stream processing system. kafka-python is designed to function much like the offici

Dana Powers 5.1k Jan 8, 2023
PyMongo - the Python driver for MongoDB

PyMongo Info: See the mongo site for more information. See GitHub for the latest source. Documentation: Available at pymongo.readthedocs.io Author: Mi

mongodb 3.7k Jan 8, 2023
Redis Python Client

redis-py The Python interface to the Redis key-value store. Python 2 Compatibility Note redis-py 3.5.x will be the last version of redis-py that suppo

Andy McCurdy 11k Dec 29, 2022
Motor - the async Python driver for MongoDB and Tornado or asyncio

Motor Info: Motor is a full-featured, non-blocking MongoDB driver for Python Tornado and asyncio applications. Documentation: Available at motor.readt

mongodb 2.1k Dec 26, 2022
A fast PostgreSQL Database Client Library for Python/asyncio.

asyncpg -- A fast PostgreSQL Database Client Library for Python/asyncio asyncpg is a database interface library designed specifically for PostgreSQL a

magicstack 5.8k Dec 31, 2022
Motor - the async Python driver for MongoDB and Tornado or asyncio

Motor Info: Motor is a full-featured, non-blocking MongoDB driver for Python Tornado and asyncio applications. Documentation: Available at motor.readt

mongodb 1.6k Feb 6, 2021
Redis client for Python asyncio (PEP 3156)

Redis client for Python asyncio. Redis client for the PEP 3156 Python event loop. This Redis library is a completely asynchronous, non-blocking client

Jonathan Slenders 554 Dec 4, 2022