Super S3 command line tool

Overview

s4cmd

Super S3 command line tool

Build Status Join the chat at https://gitter.im/bloomreach/     s4cmd


Author: Chou-han Yang (@chouhanyang)

Current Maintainers: Debodirno Chandra (@debodirno) | Naveen Vardhi (@rozuur) | Navin Pai (@navinpai)


What's New in s4cmd 2.x

  • Fully migrated from old boto 2.x to new boto3 library, which provides more reliable and up-to-date S3 backend.
  • Support S3 --API-ServerSideEncryption along with 36 new API pass-through options. See API pass-through options section for complete list.
  • Support batch delete (with delete_objects API) to delete up to 1000 files with single call. 100+ times faster than sequential deletion.
  • Support S4CMD_OPTS environment variable for commonly used options such as --API-ServerSideEncryption across all your s4cmd operations.
  • Support moving files larger than 5GB with multipart upload. 20+ times faster then sequential move operation when moving large files.
  • Support timestamp filtering with --last-modified-before and --last-modified-after options for all operations. Human friendly timestamps are supported, e.g. --last-modified-before='2 months ago'
  • Faster upload with lazy evaluation of md5 hash.
  • Listing large number of files with S3 pagination, with memory is the limit.
  • New directory to directory dsync command is better and standalone implementation to replace old sync command, which is implemented based on top of get/put/mv commands. --delete-removed work for all cases including local to s3, s3 to local, and s3 to s3. sync command preserves the old behavior in this version for compatibility.
  • Support for S3 compatible storage services such as DreamHost and Cloudian using --endpoint-url (Community Supported Beta Feature).
  • Tested on both Python 2.7, 3.6, 3.7, 3.8, 3.9 and nightly.
  • Special thanks to onera.com for supporting s4cmd.

Motivation

S4cmd is a command-line utility for accessing Amazon S3, inspired by s3cmd.

We have used s3cmd heavily for a number of scripted, data-intensive applications. However as the need for a variety of small improvements arose, we created our own implementation, s4cmd. It is intended as an alternative to s3cmd for enhanced performance and for large files, and with a number of additional features and fixes that we have found useful.

It strives to be compatible with the most common usage scenarios for s3cmd. It does not offer exact drop-in compatibility, due to a number of corner cases where different behavior seems preferable, or for bugfixes.

Features

S4cmd supports the regular commands you might expect for fetching and storing files in S3: ls, put, get, cp, mv, sync, del, du.

The main features that distinguish s4cmd are:

  • Simple (less than 1500 lines of code) and implemented in pure Python, based on the widely used Boto3 library.
  • Multi-threaded/multi-connection implementation for enhanced performance on all commands. As with many network-intensive applications (like web browsers), accessing S3 in a single-threaded way is often significantly less efficient than having multiple connections actively transferring data at once. In general, we get a 2X boost to upload/download speeds from this.
  • Path handling: S3 is not a traditional filesystem with built-in support for directory structure: internally, there are only objects, not directories or folders. However, most people use S3 in a hierarchical structure, with paths separated by slashes, to emulate traditional filesystems. S4cmd follows conventions to more closely replicate the behavior of traditional filesystems in certain corner cases. For example, "ls" and "cp" work much like in Unix shells, to avoid odd surprises. (For examples see compatibility notes below.)
  • Wildcard support: Wildcards, including multiple levels of wildcards, like in Unix shells, are handled. For example: s3://my-bucket/my-folder/20120512/*/*chunk00?1?
  • Automatic retry: Failure tasks will be executed again after a delay.
  • Multi-part upload support for files larger than 5GB.
  • Handling of MD5s properly with respect to multi-part uploads (for the sordid details of this, see below).
  • Miscellaneous enhancements and bugfixes:
    • Partial file creation: Avoid creating empty target files if source does not exist. Avoid creating partial output files when commands are interrupted.
    • General thread safety: Tool can be interrupted or killed at any time without being blocked by child threads or leaving incomplete or corrupt files in place.
    • Ensure exit code is nonzero on all failure scenarios (a very important feature in scripts).
    • Expected handling of symlinks (they are followed).
    • Support both s3:// and s3n:// prefixes (the latter is common with Amazon Elastic Mapreduce).

Limitations:

  • No CloudFront or other feature support.
  • Currently, we simulate sync with get and put with --recursive --force --sync-check.

Installation and Setup

You can install s4cmd PyPI.

pip install s4cmd
  • Copy or create a symbolic link so you can run s4cmd.py as s4cmd. (It is just a single file!)
  • If you already have a ~/.s3cfg file from configuring s3cmd, credentials from this file will be used. Otherwise, set the S3_ACCESS_KEY and S3_SECRET_KEY environment variables to contain your S3 credentials.
  • If no keys are provided, but an IAM role is associated with the EC2 instance, it will be used transparently.

s4cmd Commands

s4cmd ls [path]

List all contents of a directory.

  • -r/--recursive: recursively display all contents including subdirectories under the given path.
  • -d/--show-directory: show the directory entry instead of its content.

s4cmd put [source] [target]

Upload local files up to S3.

  • -r/--recursive: also upload directories recursively.
  • -s/--sync-check: check md5 hash to avoid uploading the same content.
  • -f/--force: override existing file instead of showing error message.
  • -n/--dry-run: emulate the operation without real upload.

s4cmd get [source] [target]

Download files from S3 to local filesystem.

  • -r/--recursive: also download directories recursively.
  • -s/--sync-check: check md5 hash to avoid downloading the same content.
  • -f/--force: override existing file instead of showing error message.
  • -n/--dry-run: emulate the operation without real download.

s4cmd dsync [source dir] [target dir]

Synchronize the contents of two directories. The directory can either be local or remote, but currently, it doesn't support two local directories.

  • -r/--recursive: also sync directories recursively.
  • -s/--sync-check: check md5 hash to avoid syncing the same content.
  • -f/--force: override existing file instead of showing error message.
  • -n/--dry-run: emulate the operation without real sync.
  • --delete-removed: delete files not in source directory.

s4cmd sync [source] [target]

(Obsolete, use dsync instead) Synchronize the contents of two directories. The directory can either be local or remote, but currently, it doesn't support two local directories. This command simply invoke get/put/mv commands.

  • -r/--recursive: also sync directories recursively.
  • -s/--sync-check: check md5 hash to avoid syncing the same content.
  • -f/--force: override existing file instead of showing error message.
  • -n/--dry-run: emulate the operation without real sync.
  • --delete-removed: delete files not in source directory. Only works when syncing local directory to s3 directory.

s4cmd cp [source] [target]

Copy a file or a directory from a S3 location to another.

  • -r/--recursive: also copy directories recursively.
  • -s/--sync-check: check md5 hash to avoid copying the same content.
  • -f/--force: override existing file instead of showing error message.
  • -n/--dry-run: emulate the operation without real copy.

s4cmd mv [source] [target]

Move a file or a directory from a S3 location to another.

  • -r/--recursive: also move directories recursively.
  • -s/--sync-check: check md5 hash to avoid moving the same content.
  • -f/--force: override existing file instead of showing error message.
  • -n/--dry-run: emulate the operation without real move.

s4cmd del [path]

Delete files or directories on S3.

  • -r/--recursive: also delete directories recursively.
  • -n/--dry-run: emulate the operation without real delete.

s4cmd du [path]

Get the size of the given directory.

Available parameters:

  • -r/--recursive: also add sizes of sub-directories recursively.

s4cmd Control Options

-p S3CFG, --config=[filename]

path to s3cfg config file

-f, --force

force overwrite files when download or upload

-r, --recursive

recursively checking subdirectories

-s, --sync-check

check file md5 before download or upload

-n, --dry-run

trial run without actual download or upload

-t RETRY, --retry=[integer]

number of retries before giving up

--retry-delay=[integer]

seconds to sleep between retries

-c NUM_THREADS, --num-threads=NUM_THREADS

number of concurrent threads

--endpoint-url

endpoint url used in boto3 client

-d, --show-directory

show directory instead of its content

--ignore-empty-source

ignore empty source from s3

--use-ssl

(obsolete) use SSL connection to S3

--verbose

verbose output

--debug

debug output

--validate

(obsolete) validate lookup operation

-D, --delete-removed

delete remote files that do not exist in source after sync

--multipart-split-size=[integer]

size in bytes to split multipart transfers

--max-singlepart-download-size=[integer]

files with size (in bytes) greater than this will be downloaded in multipart transfers

--max-singlepart-upload-size=[integer]

files with size (in bytes) greater than this will be uploaded in multipart transfers

--max-singlepart-copy-size=[integer]

files with size (in bytes) greater than this will be copied in multipart transfers

--batch-delete-size=[integer]

Number of files (<1000) to be combined in batch delete.

--last-modified-before=[datetime]

Condition on files where their last modified dates are before given parameter.

--last-modified-after=[datetime]

Condition on files where their last modified dates are after given parameter.

S3 API Pass-through Options

Those options are directly translated to boto3 API commands. The options provided will be filtered by the APIs that are taking parameters. For example, --API-ServerSideEncryption is only needed for put_object, create_multipart_upload but not for list_buckets and get_objects for example. Therefore, providing --API-ServerSideEncryption for s4cmd ls has no effect.

For more information, please see boto3 s3 documentations http://boto3.readthedocs.io/en/latest/reference/services/s3.html

--API-ACL=[string]

The canned ACL to apply to the object.

--API-CacheControl=[string]

Specifies caching behavior along the request/reply chain.

--API-ContentDisposition=[string]

Specifies presentational information for the object.

--API-ContentEncoding=[string]

Specifies what content encodings have been applied to the object and thus what decoding mechanisms must be applied to obtain the media-type referenced by the Content-Type header field.

--API-ContentLanguage=[string]

The language the content is in.

--API-ContentMD5=[string]

The base64-encoded 128-bit MD5 digest of the part data.

--API-ContentType=[string]

A standard MIME type describing the format of the object data.

--API-CopySourceIfMatch=[string]

Copies the object if its entity tag (ETag) matches the specified tag.

--API-CopySourceIfModifiedSince=[datetime]

Copies the object if it has been modified since the specified time.

--API-CopySourceIfNoneMatch=[string]

Copies the object if its entity tag (ETag) is different than the specified ETag.

--API-CopySourceIfUnmodifiedSince=[datetime]

Copies the object if it hasn't been modified since the specified time.

--API-CopySourceRange=[string]

The range of bytes to copy from the source object. The range value must use the form bytes=first-last, where the first and last are the zero-based byte offsets to copy. For example, bytes=0-9 indicates that you want to copy the first ten bytes of the source. You can copy a range only if the source object is greater than 5 GB.

--API-CopySourceSSECustomerAlgorithm=[string]

Specifies the algorithm to use when decrypting the source object (e.g., AES256).

--API-CopySourceSSECustomerKeyMD5=[string]

Specifies the 128-bit MD5 digest of the encryption key according to RFC 1321. Amazon S3 uses this header for a message integrity check to ensure the encryption key was transmitted without error. Please note that this parameter is automatically populated if it is not provided. Including this parameter is not required

--API-CopySourceSSECustomerKey=[string]

Specifies the customer-provided encryption key for Amazon S3 to use to decrypt the source object. The encryption key provided in this header must be one that was used when the source object was created.

--API-ETag=[string]

Entity tag returned when the part was uploaded.

--API-Expires=[datetime]

The date and time at which the object is no longer cacheable.

--API-GrantFullControl=[string]

Gives the grantee READ, READ_ACP, and WRITE_ACP permissions on the object.

--API-GrantReadACP=[string]

Allows grantee to read the object ACL.

--API-GrantRead=[string]

Allows grantee to read the object data and its metadata.

--API-GrantWriteACP=[string]

Allows grantee to write the ACL for the applicable object.

--API-IfMatch=[string]

Return the object only if its entity tag (ETag) is the same as the one specified, otherwise return a 412 (precondition failed).

--API-IfModifiedSince=[datetime]

Return the object only if it has been modified since the specified time, otherwise return a 304 (not modified).

--API-IfNoneMatch=[string]

Return the object only if its entity tag (ETag) is different from the one specified, otherwise return a 304 (not modified).

--API-IfUnmodifiedSince=[datetime]

Return the object only if it has not been modified since the specified time, otherwise return a 412 (precondition failed).

--API-Metadata=[dict]

A map (in json string) of metadata to store with the object in S3

--API-MetadataDirective=[string]

Specifies whether the metadata is copied from the source object or replaced with metadata provided in the request.

--API-MFA=[string]

The concatenation of the authentication device's serial number, a space, and the value that is displayed on your authentication device.

--API-RequestPayer=[string]

Confirms that the requester knows that she or he will be charged for the request. Bucket owners need not specify this parameter in their requests. Documentation on downloading objects from requester pays buckets can be found at http://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectsinRequesterPaysBuckets.html

--API-ServerSideEncryption=[string]

The Server-side encryption algorithm used when storing this object in S3 (e.g., AES256, aws:kms).

--API-SSECustomerAlgorithm=[string]

Specifies the algorithm to use to when encrypting the object (e.g., AES256).

--API-SSECustomerKeyMD5=[string]

Specifies the 128-bit MD5 digest of the encryption key according to RFC 1321. Amazon S3 uses this header for a message integrity check to ensure the encryption key was transmitted without error. Please note that this parameter is automatically populated if it is not provided. Including this parameter is not required

--API-SSECustomerKey=[string]

Specifies the customer-provided encryption key for Amazon S3 to use in encrypting data. This value is used to store the object and then it is discarded; Amazon does not store the encryption key. The key must be appropriate for use with the algorithm specified in the x-amz-server-side-encryption-customer-algorithm header.

--API-SSEKMSKeyId=[string]

Specifies the AWS KMS key ID to use for object encryption. All GET and PUT requests for an object protected by AWS KMS will fail if not made via SSL or using SigV4. Documentation on configuring any of the officially supported AWS SDKs and CLI can be found at http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingAWSSDK.html#specify-signature-version

--API-StorageClass=[string]

The type of storage to use for the object. Defaults to 'STANDARD'.

--API-VersionId=[string]

VersionId used to reference a specific version of the object.

--API-WebsiteRedirectLocation=[string]

If the bucket is configured as a website, redirects requests for this object to another object in the same bucket or to an external URL. Amazon S3 stores the value of this header in the object metadata.

Debugging Tips

Simply enable --debug option to see the full log of s4cmd. If you even need to check what APIs are invoked from s4cmd to boto3, you can run:

s4cmd --debug [op] .... 2>&1 >/dev/null | grep S3APICALL

To see all the parameters sending to S3 API.

Compatibility between s3cmd and s4cmd

Prefix matching: In s3cmd, unlike traditional filesystems, prefix names match listings:

>> s3cmd ls s3://my-bucket/ch
s3://my-bucket/charlie/
s3://my-bucket/chyang/

In s4cmd, behavior is the same as with a Unix shell:

>>s4cmd ls s3://my-bucket/ch
>(empty)

To get prefix behavior, use explicit wildcards instead: s4cmd ls s3://my-bucket/ch*

Similarly, sync and cp commands emulate the Unix cp command, so directory to directory sync use different syntax:

>> s3cmd sync s3://bucket/path/dirA s3://bucket/path/dirB/

will copy contents in dirA to dirB.

>> s4cmd sync s3://bucket/path/dirA s3://bucket/path/dirB/

will copy dirA into dirB.

To achieve the s3cmd behavior, use wildcards:

s4cmd sync s3://bucket/path/dirA/* s3://bucket/path/dirB/

Note s4cmd doesn't support dirA without trailing slash indicating dirA/* as what rsync supported.

No automatic override for put command: s3cmd put fileA s3://bucket/path/fileB will return error if fileB exists. Use -f as well as get command.

Bugfixes for handling of non-existent paths: Often s3cmd creates empty files when specified paths do not exist: s3cmd get s3://my-bucket/no_such_file downloads an empty file. s4cmd get s3://my-bucket/no_such_file returns an error. s3cmd put no_such_file s3://my-bucket/ uploads an empty file. s4cmd put no_such_file s3://my-bucket/ returns an error.

Additional technical notes

Etags, MD5s and multi-part uploads: Traditionally, the etag of an object in S3 has been its MD5. However, this changed with the introduction of S3 multi-part uploads; in this case the etag is still a unique ID, but it is not the MD5 of the file. Amazon has not revealed the definition of the etag in this case, so there is no way we can calculate and compare MD5s based on the etag header in general. The workaround we use is to upload the MD5 as a supplemental content header (called "md5", instead of "etag"). This enables s4cmd to check the MD5 hash before upload or download. The only limitation is that this only works for files uploaded via s4cmd. Programs that do not understand this header will still have to download and verify the MD5 directly.

Unimplemented features

  • CloudFront or other feature support beyond basic S3 access.

Credits

Comments
  • v2.0: dies on out-of-memory on single file PUT

    v2.0: dies on out-of-memory on single file PUT

    Hi,

    while trying this tool (for the first time) to dsync 80GB of data (in 50 files) from local to S3, it consumed all the memory and was killed by OOM killer.

    s4cmd dsync -r . s3://xxx/normalize_exports/
    

    Then, I tried to PUT individual files one-by-one and it also run out of memory :disappointed:

    $ ls -lh 5s/2015-07.jsonl.gz
    -rw-rw-r-- 1 ubuntu ubuntu 808M May 25 06:39 5s/2015-07.jsonl.gz
    $ s4cmd put 5s/2015-07.jsonl.gz s3://xxx/normalize_exports/5s/2015-07.jsonl.gz
    

    Versions:

    $ pip list
    awscli (1.10.33)
    boto3 (1.3.1)
    botocore (1.4.23)
    colorama (0.3.3)
    docutils (0.12)
    futures (3.0.5)
    jmespath (0.9.0)
    pip (8.1.2)
    pyasn1 (0.1.9)
    python-dateutil (2.5.3)
    pytz (2016.4)
    rsa (3.4.2)
    s3transfer (0.0.1)
    s4cmd (2.0.1)
    setuptools (2.2)
    six (1.10.0)
    
    info-needed 
    opened by zarnovican 10
  • V4 Auth not working

    V4 Auth not working

    V4 Auth requiring regions such as eu-central-1 is not working with the following error

    [Thread Failure] S3ResponseError: 400 Bad Request
    <?xml version="1.0" encoding="UTF-8"?>
    <Error><Code>InvalidRequest</Code><Message>The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.</Message><RequestId>02FA63493DA8A72F</RequestId><HostId>XANdyiE6GccqIuN9bgHmY8wXcY8ge65pzC/OWnXRxmpsU9oAmX9xfjF4A+57Kw0ue5DMgMHyIGU=</HostId></Error>
    
    opened by vedit 10
  • Create setup.py

    Create setup.py

    In order to make s4cmd easier to install and capable of being submitted to PyPI, we need a setup.py.

    There are many examples out there, but here's the official documentation: https://docs.python.org/2/distutils/setupscript.html

    opened by woodb 9
  • allow for s3 endpoint param

    allow for s3 endpoint param

    I want to be able to use s3 transfer accelerated buckets with s4cmd. Boto3 allows for this by configuring a endpoint_url in the s3 client. With this change, I can enable s3 transfer acceleration by using --endpoint-url=http://s3-accelerate.amazonaws.com

    opened by apbodnar 8
  • Files that equal 0 bytes break this program

    Files that equal 0 bytes break this program

    /cassandra_backups/data/cassandra/data/clearcore/transactions/clearcore-transactions-jb-15105-Summary.db => s3://cassandra-backups.us-east[Runtime Failure] Unable to read data from source: /cassandra_backups/data/cassandra/data/clearcore/transactions/clearcore-transactions-jb-17476-Statistics.db [Runtime Failure] Unable to read data from source: /cassandra_backups/data/cassandra/data/clearcore/transactions/clearcore-transactions-jb-17476-Data.db [Runtime Failure] Unable to read data from source: /cassandra_backups/data/cassandra/data/clearcore/transactions/clearcore-transactions-jb-17476-CompressionInfo.db [Runtime Failure] Unable to read data from source: /cassandra_backups/data/cassandra/data/clearcore/transactions/clearcore-transactions-jb-17476-Index.db [Runtime Failure] Unable to read data from source: /cassandra_backups/data/cassandra/data/clearcore/transactions/clearcore-transactions-jb-17476-Filter.db

    The thing all of these files share is that they all (for whatever reason) were 0 bytes. After testing on a single 0 file created by dd sure enough this breaks the program.

    Specific flags I am using

    s4cmd dsync --verbose --recursive --delete-removed -c 7 $dir_src/ "$s3_url/$dir_target/"

    I will look in the code tomorrow to see if anything pops out that I can help on in looking at this issue.

    duplicate 
    opened by lanmalkieri 8
  • error: can't copy 'data/bash-completion/s4cmd': doesn't exist or not a regular file

    error: can't copy 'data/bash-completion/s4cmd': doesn't exist or not a regular file

    Installing:

    • using root user
    • on Ubuntu 12.04.5 LTS
    • on AWS 64-bit machine
    • Using pip install s4cmd

    During install I'm getting this error:

    Installing collected packages: s4cmd, boto3, pytz, botocore, jmespath, futures, python-dateutil, docutils, six
      Running setup.py install for s4cmd
        Running command /usr/bin/python -c "import setuptools;__file__='/tmp/build/s4cmd/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --single-version-externally-managed --record /tmp/pip-Zvc7O7-record/install-record.txt
        running install
        running build
        running build_py
        running build_scripts
        running install_lib
        running install_data
        error: can't copy 'data/bash-completion/s4cmd': doesn't exist or not a regular file
        Complete output from command /usr/bin/python -c "import setuptools;__file__='/tmp/build/s4cmd/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --single-version-externally-managed --record /tmp/pip-Zvc7O7-record/install-record.txt:
        running install
    
    running build
    
    running build_py
    
    running build_scripts
    
    running install_lib
    
    running install_data
    
    error: can't copy 'data/bash-completion/s4cmd': doesn't exist or not a regular file
    
    ----------------------------------------
    Command /usr/bin/python -c "import setuptools;__file__='/tmp/build/s4cmd/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --single-version-externally-managed --record /tmp/pip-Zvc7O7-record/install-record.txt failed with error code 1
    Exception information:
    Traceback (most recent call last):
      File "/usr/lib/python2.7/dist-packages/pip/basecommand.py", line 126, in main
        self.run(options, args)
      File "/usr/lib/python2.7/dist-packages/pip/commands/install.py", line 228, in run
        requirement_set.install(install_options, global_options)
      File "/usr/lib/python2.7/dist-packages/pip/req.py", line 1093, in install
        requirement.install(install_options, global_options)
      File "/usr/lib/python2.7/dist-packages/pip/req.py", line 566, in install
        cwd=self.source_dir, filter_stdout=self._filter_install, show_stdout=False)
      File "/usr/lib/python2.7/dist-packages/pip/__init__.py", line 255, in call_subprocess
        % (command_desc, proc.returncode))
    InstallationError: Command /usr/bin/python -c "import setuptools;__file__='/tmp/build/s4cmd/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --single-version-externally-managed --record /tmp/pip-Zvc7O7-record/install-record.txt failed with error code 1
    "~/.pip/pip.log" 162L, 10020C
    
    installation 
    opened by timh5 8
  • dsync error:

    dsync error: "Bucket name must match the regex"

    I am trying to sync a large number of small local files to a bucket, but I get an error that Bucket name must match the regex.

    Am I writing the S3 url incorrectly?

    $ s4cmd dsync --debug --verbose -r -c4 50591 s3://arn:aws:s3:eu-we´st-1:000000000000:accesspoint/bucket/2/50591
      (D)s4cmd:129  >> dsync_handler(<__main__.CommandHandler object at 0x7f9abac81150>, ['dsync', '50591', 's3://arn:aws:s3:eu-west-1:000000000000:accesspoint/bucket/2/50591'])
      (D)s4cmd:129  >> validate(<__main__.CommandHandler object at 0x7f9abac81150>, 'cmd|s3,local|s3,local', ['dsync', '50591', 's3://arn:aws:s3:eu-west-1:000000000000:accesspoint/bucket/2/50591'])
      (D)s4cmd:131  << validate(<__main__.CommandHandler object at 0x7f9abac81150>, 'cmd|s3,local|s3,local', ['dsync', '50591', 's3://arn:aws:s3:eu-west-1:000000000000:accesspoint/bucket/2/50591']): None
      (D)s4cmd:129  >> dsync_files(<__main__.S3Handler object at 0x7f9abac81690>, '50591', 's3://arn:aws:s3:eu-west-1:000000000000:accesspoint/bucket/2/50591')
      (D)s4cmd:129  >> relative_dir_walk(<__main__.S3Handler object at 0x7f9abac81690>, '50591')
      (D)s4cmd:129  >> local_walk(<__main__.S3Handler object at 0x7f9abac81690>, '50591')
      (D)s4cmd:131  << local_walk(<__main__.S3Handler object at 0x7f9abac81690>, '50591'): ['50591/50591.dzi', <...snip...>]
      (D)s4cmd:131  << relative_dir_walk(<__main__.S3Handler object at 0x7f9abac81690>, '50591'): ['50591.dzi', <...snip...>]
      (D)s4cmd:129  >> upload(<ThreadUtil(Thread-2, started daemon 140302513305344)>, '50591/50591.dzi', 's3://arn:aws:s3:eu-west-1:000000000000:accesspoint/bucket/2/50591/50591.dzi')
      (D)s4cmd:129  >> upload(<ThreadUtil(Thread-1, started daemon 140302522484480)>, '50591/50591_files/1/0_0.jpeg', 's3://arn:aws:s3:eu-west-1:000000000000:accesspoint/bucket/2/50591/50591_files/1/0_0.jpeg')
      (D)s4cmd:129  >> lookup(<ThreadUtil(Thread-2, started daemon 140302513305344)>, <__main__.S3URL instance at 0x7f9ab86453f8>)
      (D)s4cmd:129  >> lookup(<ThreadUtil(Thread-1, started daemon 140302522484480)>, <__main__.S3URL instance at 0x7f9ab8645758>)
      (D)s4cmd:129  >> upload(<ThreadUtil(Thread-3, started daemon 140302504388352)>, '50591/50591_files/0/0_0.jpeg', 's3://arn:aws:s3:eu-west-1:000000000000:accesspoint/bucket/2/50591/50591_files/0/0_0.jpeg')
      (D)s4cmd:129  >> upload(<ThreadUtil(Thread-4, started daemon 140302287369984)>, '50591/50591_files/10/0_0.jpeg', 's3://arn:aws:s3:eu-west-1:000000000000:accesspoint/bucket/2/50591/50591_files/10/0_0.jpeg')
      (D)s4cmd:402  >> S3APICALL head_object(Bucket='arn:aws:s3:eu-west-1:000000000000:accesspoint', Key='bucket/2/50591/50591.dzi')
      (D)s4cmd:129  >> lookup(<ThreadUtil(Thread-3, started daemon 140302504388352)>, <__main__.S3URL instance at 0x7f9ab86451b8>)
      (D)s4cmd:402  >> S3APICALL head_object(Bucket='arn:aws:s3:eu-west-1:000000000000:accesspoint', Key='bucket/2/50591/50591_files/1/0_0.jpeg')
      (D)s4cmd:129  >> lookup(<ThreadUtil(Thread-4, started daemon 140302287369984)>, <__main__.S3URL instance at 0x7f9ab8642b90>)
      (D)s4cmd:402  >> S3APICALL head_object(Bucket='arn:aws:s3:eu-west-1:000000000000:accesspoint', Key='bucket/2/50591/50591_files/10/0_0.jpeg')
      (D)s4cmd:402  >> S3APICALL head_object(Bucket='arn:aws:s3:eu-west-1:000000000000:accesspoint', Key='bucket/2/50591/50591_files/0/0_0.jpeg')
      (E)s4cmd:183  [Exception] Parameter validation failed:
    Invalid bucket name "arn:aws:s3:eu-west-1:000000000000:accesspoint": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$"
      (E)s4cmd:183  [Exception] Parameter validation failed:
    Invalid bucket name "arn:aws:s3:eu-west-1:000000000000:accesspoint": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$"
      (E)s4cmd:183  [Exception] Parameter validation failed:
    Invalid bucket name "arn:aws:s3:eu-west-1:000000000000:accesspoint": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$"
      (E)s4cmd:183  [Exception] Parameter validation failed:
    Invalid bucket name "arn:aws:s3:eu-west-1:000000000000:accesspoint": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$"
    ^C  (E)s4cmd:183  [Runtime Failure] Interrupted by user
    
    opened by mikkelee 6
  • Unable to connect to s3: 'utf8' codec can't decode byte 0xef in position 8213: invalid continuation byte

    Unable to connect to s3: 'utf8' codec can't decode byte 0xef in position 8213: invalid continuation byte

    Hi, recently I'm getting this exception when calling s4cmd, any ideas? It was working few weeks ago, nothing was changed (packages installed, etc).

    [Runtime Exception] Unable to connect to s3: 'utf8' codec can't decode byte 0xef in position 8213: invalid continuation byte Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/s4cmd.py", line 1981, in CommandHandler(opt).run(args) File "/usr/local/lib/python2.7/dist-packages/s4cmd.py", line 1605, in run CommandHandler.dict[cmd + '_handler'](self, args) File "/usr/local/lib/python2.7/dist-packages/s4cmd.py", line 134, in wrapper ret = func(*args, **kargs) File "/usr/local/lib/python2.7/dist-packages/s4cmd.py", line 1738, in dsync_handler self.s3handler().dsync_files(source, target, args) File "/usr/local/lib/python2.7/dist-packages/s4cmd.py", line 1611, in s3handler return S3Handler(self.opt) File "/usr/local/lib/python2.7/dist-packages/s4cmd.py", line 701, in init self.connect() File "/usr/local/lib/python2.7/dist-packages/s4cmd.py", line 715, in connect raise RetryFailure('Unable to connect to s3: %s' % e) RetryFailure: Unable to connect to s3: 'utf8' codec can't decode byte 0xef in position 8213: invalid continuation byte

    info-needed 
    opened by jackstuard 6
  • Adding arguments for some hardcoded defaults

    Adding arguments for some hardcoded defaults

    To achieve this, I restructured the option handling code, which involved restructuring the initialization code. Over all, this should do the same thing but the code should be more clear, with fewer repetitions when modifying the option handling code in the future. Over all, the code is shorter and more clear.

    Added arguments for SINGLEPART_UPLOAD_MAX, SINGLEPART_DOWNLOAD_MAX, DEFAULT_SPLIT, and RETRY_DELAY.

    opened by linsomniac 6
  • Fix for installation issues requiring sudo and no-access to etc/bash_completion.d

    Fix for installation issues requiring sudo and no-access to etc/bash_completion.d

    • Create main method and update setup.py to call main as a script (which enables autocomplete)

    • Removed the now unnecessary entry in etc/bash_completion.d/


    Tested on a fresh VM and in virtualenv

    opened by navinpai 5
  • Cannot dsync s3 folders to local

    Cannot dsync s3 folders to local

    i get error messages, when try to dsync s3 folders(empty) to local.

    "[Runtime Failure] The obj "images" does not exists." The images is a folder in s3 and not exist in local.

    But, with sync not dsync operation it copies the folder.

    opened by dogenius01 5
  • Bump boto3 from 1.7.28 to 1.26.41

    Bump boto3 from 1.7.28 to 1.26.41

    Bumps boto3 from 1.7.28 to 1.26.41.

    Changelog

    Sourced from boto3's changelog.

    1.26.41

    • api-change:cloudfront: [botocore] Extend response headers policy to support removing headers from viewer responses
    • api-change:iotfleetwise: [botocore] Update documentation - correct the epoch constant value of default value for expiryTime field in CreateCampaign request.

    1.26.40

    • api-change:apigateway: [botocore] Documentation updates for Amazon API Gateway
    • api-change:emr: [botocore] Update emr client to latest version
    • api-change:secretsmanager: [botocore] Added owning service filter, include planned deletion flag, and next rotation date response parameter in ListSecrets.
    • api-change:wisdom: [botocore] This release extends Wisdom CreateContent and StartContentUpload APIs to support PDF and MicrosoftWord docx document uploading.

    1.26.39

    • api-change:elasticache: [botocore] This release allows you to modify the encryption in transit setting, for existing Redis clusters. You can now change the TLS configuration of your Redis clusters without the need to re-build or re-provision the clusters or impact application availability.
    • api-change:network-firewall: [botocore] AWS Network Firewall now provides status messages for firewalls to help you troubleshoot when your endpoint fails.
    • api-change:rds: [botocore] This release adds support for Custom Engine Version (CEV) on RDS Custom SQL Server.
    • api-change:route53-recovery-control-config: [botocore] Added support for Python paginators in the route53-recovery-control-config List* APIs.

    1.26.38

    • api-change:memorydb: [botocore] This release adds support for MemoryDB Reserved nodes which provides a significant discount compared to on-demand node pricing. Reserved nodes are not physical nodes, but rather a billing discount applied to the use of on-demand nodes in your account.
    • api-change:transfer: [botocore] Add additional operations to throw ThrottlingExceptions

    1.26.37

    • api-change:connect: [botocore] Support for Routing Profile filter, SortCriteria, and grouping by Routing Profiles for GetCurrentMetricData API. Support for RoutingProfiles, UserHierarchyGroups, and Agents as filters, NextStatus and AgentStatusName for GetCurrentUserData. Adds ApproximateTotalCount to both APIs.
    • api-change:connectparticipant: [botocore] Amazon Connect Chat introduces the Message Receipts feature. This feature allows agents and customers to receive message delivered and read receipts after they send a chat message.
    • api-change:detective: [botocore] This release adds a missed AccessDeniedException type to several endpoints.
    • api-change:fsx: [botocore] Fix a bug where a recent release might break certain existing SDKs.
    • api-change:inspector2: [botocore] Amazon Inspector adds support for scanning NodeJS 18.x and Go 1.x AWS Lambda function runtimes.

    1.26.36

    • api-change:compute-optimizer: [botocore] This release enables AWS Compute Optimizer to analyze and generate optimization recommendations for ecs services running on Fargate.
    • api-change:connect: [botocore] Amazon Connect Chat introduces the Idle Participant/Autodisconnect feature, which allows users to set timeouts relating to the activity of chat participants, using the new UpdateParticipantRoleConfig API.
    • api-change:iotdeviceadvisor: [botocore] This release adds the following new features: 1) Documentation updates for IoT Device Advisor APIs. 2) Updated required request parameters for IoT Device Advisor APIs. 3) Added new service feature: ability to provide the test endpoint when customer executing the StartSuiteRun API.
    • api-change:kinesis-video-webrtc-storage: [botocore] Amazon Kinesis Video Streams offers capabilities to stream video and audio in real-time via WebRTC to the cloud for storage, playback, and analytical processing. Customers can use our enhanced WebRTC SDK and cloud APIs to enable real-time streaming, as well as media ingestion to the cloud.
    • api-change:rds: [botocore] Add support for managing master user password in AWS Secrets Manager for the DBInstance and DBCluster.

    ... (truncated)

    Commits
    • 28046ba Merge branch 'release-1.26.41'
    • f425219 Bumping version to 1.26.41
    • 11e4d82 Add changelog entries from botocore
    • ac432c4 Merge branch 'release-1.26.40'
    • 7fa7102 Merge branch 'release-1.26.40' into develop
    • c4f0ff5 Bumping version to 1.26.40
    • e7a557d Add changelog entries from botocore
    • 1b89af9 Merge branch 'release-1.26.39'
    • 20cb544 Merge branch 'release-1.26.39' into develop
    • 73a0bb8 Bumping version to 1.26.39
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump pytz from 2021.1 to 2022.7

    Bump pytz from 2021.1 to 2022.7

    Bumps pytz from 2021.1 to 2022.7.

    Commits
    • 309a457 Update i18n section of README
    • 67b32d0 Separete legacy tests to run in legacy container
    • ce19dbe Bump version numbers to 2022.7/2022g
    • 7285e70 IANA 2022g
    • 3a52798 Squashed 'tz/' changes from d3dc2a9d6..9baf0d34d
    • 8656870 Let _all_timezones_unchecked be garbage collected when no longer needed
    • bd3e51f Rename all_timezones_unchecked to strongly indicate it is not public
    • 01592a9 Merge pull request #90 from eendebakpt/import_time_lazy_list
    • 5e9f112 lazy timezone
    • 4ebc28d Bump version numbers to 2022.6 / 2022f
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Introduce aws env vars

    Introduce aws env vars

    Introduce AWS environment variables to create a boto client. The list of environment variables are listed on this page. https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#using-environment-variables

    usage AWS_PROFILE=ihsan-admin python s4cmd.py ls

    opened by icaliskanoglu 0
  • No progress if executed over ssh

    No progress if executed over ssh

    If I run s4cmd via ssh I see no progress, but if I go in and manually disable these lines:

    if not (sys.stdout.isatty() and sys.stderr.isatty()):
        return
    

    I see the progress just fine. Wondering if it would be possible to introduce a --progress option to force progress even if sys.stdout.isatty() is False.

    opened by jason-adnuntius 1
  • issues with '%' character in path

    issues with '%' character in path

    I am having issues when the source or target s3 path contains a '%' character. Specifically, the error I am seeing is:

    "TypeError: not enough arguments for format string"

    I no longer have the full stack trace, but this is happening when message() is called with a single string argument that contains '%' (that are legitimately part of the source or target file path) and with no additional arguments.

    This issue was mentioned in issue #33 from 2016. The comment before that issue was closed correctly indicated that the problem was the use of call sites that "call message(str) instead of message('%s', str)" However there appears to still be calls to message(str) in the latest code. One example is: https://github.com/bloomreach/s4cmd/blob/7191685030a46112a0df62fff7f27970c291fab1/s4cmd.py#L1463

    Would you prefer to confirm and fix this throughout the codebase or should I open a PR?

    opened by dlaflamme 0
Owner
Bloomreach
Tools to build, extend and manage incredible experiences. Learn more at developers.bloomreach.com.
Bloomreach
📼Command line tool based on youtube-dl to easily download selected channels from your subscriptions.

youtube-cdl Command line tool based on youtube-dl to easily download selected channels from your subscriptions. This tool is very handy if you want to

Anatoly 64 Dec 25, 2022
Command-line program to download videos from YouTube.com and other video sites

youtube-dl - download videos from youtube.com or other video platforms

youtube-dl 116.4k Jan 7, 2023
A lightweight, dependency-free Python library (and command-line utility) for downloading YouTube Videos.

A lightweight, dependency-free Python library (and command-line utility) for downloading YouTube Videos.

pytube 7.9k Jan 2, 2023
Jocomol 16 Dec 12, 2022
A tool written in Python to download all Snapmaps content from a specific location.

snapmap-archiver A tool written in Python to download all Snapmaps content from a specific location.

null 46 Dec 9, 2022
SABnzbd - The automated Usenet download tool

SABnzbd is an Open Source Binary Newsreader written in Python.

SABnzbd 1.8k Dec 30, 2022
MMDL (Mega Music Downloader) - A tool to easily download music.

mmdl - Mega Music Downloader What is mmdl ❓ TLDR: MMDL is a cli app which allows you to quickly and efficiently download one or multiple songs from Yo

techboy-coder 30 Dec 13, 2022
A tool to download program information from Bugcrowd, for use by researchers to compare programs they are eligible to participate in

Description bcstats is a tool which allows Bugcrowd researchers to download information about all accessible programs (public and private) into a sing

null 19 Oct 13, 2022
⚙️ A CLI tool that can download songs from youtube.

⚙️ Music Downloader Music Downloader is a tool that can download songs from Youtube. Installation Base requirements: Python 3.7+ If you have Python 3.

matjs 4 Nov 3, 2021
ImageScraper is a cross-platform tool for downloading a specified count from xkcd, Astronomy Picture of the Day and Existential Comics

ImageScraper The ImageScraper is a cross-platform tool for downloading a specified count from xkcd, Astronomy Picture of the Day and Existential Comic

1amnobody 1 Jan 25, 2022
YT-Downloader is a Tool to download youtube video.

YT-Downloader YT-Downloader is a Tool to download youtube video.If you are looking for a simple video downloader tool Than This YT-Downloader may be u

Pradip Thapa 7 May 11, 2022
This is a tool to allow downloading any links from r/Roms Magethread

ILYFPR This is a tool to allow downloading any links from r/Roms Magethread Limitations: It downloads ALL roms of the system you choose. This will be

Erase 4 Nov 4, 2021
Download YouTube videos/music and images in MP4, JPG with this tool.

ABOUT THE TOOL Download YouTube videos, music and images in MP4, JPG with this tool, with an easy to understand interface. This tool works with both,

TrollSkull 5 Jan 2, 2023
Itchio Downloader Tool with python

Itchio Downloader Tool Install pip install git+https://github.com/emersont1/itchio Download All Games in library from account python -m itchio.downloa

Peter Taylor 69 Dec 5, 2022
Tool To download Amazon 4k SDR HDR 1080, CDM IS Not Included

WV-AMZN-4K-RIPPER Tool To download Amazon 4k SDR HDR 1080, CDM IS Not Included For CDM You can Mail :- [email protected] Note : CDM is not free L

null 11 Dec 23, 2021
Tool to download Netflix in 4k

Netflix-4K-Script Tool to download Netflix in 4k You will need to get a L1 CDM that is whitelsited with Netflix CDM In this script are downgraded

null 9 Dec 23, 2021
A cli tool to download purchased products from the DLsite.

dlsite-downloader A cli tool to download purchased products from the DLsite. How can I use? This program runs with configurations defined at settings.

AcrylicShrimp 9 Dec 23, 2022
Tool To Get Downloads up to 4k from Paramount+

Paramount 4K Downloader Tool To Get Downloads up to 4k from Paramount+ ?? Hello Fellow < Developers/ >! Hi! My name is WVDUMP. I am Leaking the script

null 2 Dec 25, 2021
Tool To download 4KHDR DV SDR from AppleTV

# APPLE-TV 4K Downloader Tool To download 4K HDR DV SDR from AppleTV Hello Fellow < Developers/ >! Hi! My name is WVDUMP. I am Leaking the scripts to

null 5 Dec 25, 2021