Python function to construct a ZIP archive with on the fly - without having to store the entire ZIP in memory or disk

Overview

stream-zip CircleCI Test Coverage

Python function to construct a ZIP archive on the fly - without having to store the entire ZIP in memory or disk. This is useful in memory-constrained environments, or when you would like to start returning compressed data before you've even retrieved all the uncompressed data. Generating ZIPs on-demand in a web server is a typical use case for stream-zip.

Offers similar functionality to zipfly, but with a different API, and does not use Python's zipfile module under the hood.

To unZIP files on the fly try stream-unzip.

Installation

pip install stream-zip

Usage

from datetime import datetime
from stream_zip import stream_zip

def unzipped_files():
    modified_at = datetime.now()
    perms = 0o600

    def file_1_data():
        yield b'Some bytes'

    def file_2_data():
        yield b'Some bytes'

    yield 'my-file-1.txt', modified_at, perms, file_1_data()
    yield 'my-file-2.txt', modified_at, perms, file_2_data()

for zipped_chunk in stream_zip(unzipped_files()):
    print(zipped_chunk)

Limitations

It's not possible to completely stream-write ZIP files. Small bits of metadata for each member file, such as its name, must be placed at the end of the ZIP. In order to do this, stream-unzip buffers this metadata in memory until it can be output.

Comments
  • Data descriptor for no compressed file

    Data descriptor for no compressed file

    For the limitation:

    No compression is supported via the NO_COMPRESSION_* constants as in the above examples. However in these cases the entire contents of each uncompressed file is buffered in memory, and so should not be used for large files. This is because for uncompressed data, its size and CRC32 must be before it in the ZIP file.

    To avoid buffering all contents, I'm wondering if we can add a data descriptor after uncompressed data as well so we can simply set 0 in the local file header, just like what we do for compressed files. From the original cpython zipfile implementation, it seems to only determine from whether output is seekable:

    https://github.com/python/cpython/blob/96b344c2f15cb09251018f57f19643fe20637392/Lib/zipfile.py#L1610-L1611

    So I guess this structure should also be valid?

    opened by ArchangelSDY 6
  • Calculate final zip size before starting

    Calculate final zip size before starting

    Hi, first of all thanks for the great work! Is there a way to know the size of the compressed before starting? An example usecase could be the "Content-Length" header in servers, but I'd need it in some other scenarios too (even no-compression would be okay in this case). Thanks :)

    opened by TaToTanWeb 4
  • Ability to configure chunk size

    Ability to configure chunk size

    Thanks so much for building this package. It is incredibly useful.

    One feature that would make this more powerful, in my opinion, is to be able to configure the chunk size you receive from the generator. I noticed that it provides chunks of 65,536 bytes.

    The reason is that this tool is particularly useful when uploading zip files by way of multipart uploads. I am personally working on a project that needs to upload upwards of terabyte-sized directories to S3, and I generate URLs for each gigabyte that will be uploaded. I had to concatenate the chunks together until they were a gigabyte before I could run the upload on each multipart URL. This wasn't challenging, but I think it'd be a nice interface improvement.

    Maybe there's something I missed in the documentation that does allow for something like this. Please keep me posted.

    Thanks again!

    opened by alec-bell 3
  • feat: no need for local zip64 extra when streaming

    feat: no need for local zip64 extra when streaming

    This was added in https://github.com/uktrade/stream-zip/commit/62f8cf1ee242f7d9fe1467fe9a93b20164764068, but it might not really have ever been needed, so removing it saves a few bytes. And stronger than that, https://github.com/libarchive/libarchive/issues/1834 gives an argument that it even makes the ZIP file invalid.

    To give even more weight to the argument that this is ok, I made a few Zip64 files with data descriptors with InfoZIP. They did not have local Zip64 extra fields. If this were a problem, it probably would have been discovered by now.

    opened by michalc 0
  • feat: allow specification of zlib compressobj

    feat: allow specification of zlib compressobj

    This I think is a reasonable way of addressing the need for large files that should both not be buffered in memory, and not compressed, as reported in https://github.com/uktrade/stream-zip/issues/17

    opened by michalc 0
  • feat!: rename constants

    feat!: rename constants

    This is for clarity, and to allow in future having both NO_COMPRESSION_32 and NO_COMPRESSION_64, to allow clients to fully specify that they do not want any _64 code data.

    Note that LibreOffice doesn't support zip64 at the time of writing, so this is still an issue. This is not just for "legacy" software.

    opened by michalc 0
  • fix: include zip64 local extra header

    fix: include zip64 local extra header

    Suspect it's possible that some clients don't use sizes of 0xffffffff to activate zip64 mode, but actually use the presence of the zip64 extra field. This is so they would be still able to open files with sizes exactly equal to 0xffffffff created by older zip software.

    opened by michalc 0
  • feat: raise exceptions if sizes would cause overflow

    feat: raise exceptions if sizes would cause overflow

    Realistically wouldn't hit the limits on ZIP64, but it's much more possible to hit the ZIP limits, and so offer a "nicer" Exception interface for this case.

    opened by michalc 0
Owner
Department for International Trade
Department for International Trade
Extract an archive file (zip file or tar file) stored on AWS S3

S3 Extract Extract an archive file (zip file or tar file) stored on AWS S3. Details Downloads archive from S3 into memory, then extract and re-upload

Evan 1 Dec 14, 2021
Remove [x]_ from StudIP zip Archives and archive_filelist.csv completely

This tool removes the "[x]_" at the beginning of StudIP zip Archives. It also deletes the "archive_filelist.csv" file

Kelke vl 1 Jan 19, 2022
PaddingZip - a tool that you can craft a zip file that contains the padding characters between the file content.

PaddingZip - a tool that you can craft a zip file that contains the padding characters between the file content.

phithon 53 Nov 7, 2022
Quick and dirty FAT12 filesystem to ZIP file converter

Quick and Dirty FAT12 Filesystem Converter This is a really crappy Python script I wrote to convert a semi-compatible FAT12 filesystem from my HP150's

Tube Time 2 Feb 12, 2022
Kartothek - a Python library to manage large amounts of tabular data in a blob store

Kartothek - a Python library to manage (create, read, update, delete) large amounts of tabular data in a blob store

null 15 Dec 25, 2022
Python Fstab Generator is a small Python script to write and generate /etc/fstab files based on yaml file on Unix-like systems.

PyFstab Generator PyFstab Generator is a small Python script to write and generate /etc/fstab files based on yaml file on Unix-like systems. NOTE : Th

Mahdi 2 Nov 9, 2021
Python's Filesystem abstraction layer

PyFilesystem2 Python's Filesystem abstraction layer. Documentation Wiki API Documentation GitHub Repository Blog Introduction Think of PyFilesystem's

pyFilesystem 1.8k Jan 2, 2023
A python wrapper for libmagic

python-magic python-magic is a Python interface to the libmagic file type identification library. libmagic identifies file types by checking their hea

Adam Hupp 2.3k Dec 29, 2022
An object-oriented approach to Python file/directory operations.

Unipath An object-oriented approach to file/directory operations Version: 1.1 Home page: https://github.com/mikeorr/Unipath Docs: https://github.com/m

Mike Orr 506 Dec 29, 2022
Python library and shell utilities to monitor filesystem events.

Watchdog Python API and shell utilities to monitor file system events. Works on 3.6+. If you want to use Python 2.6, you should stick with watchdog <

Yesudeep Mangalapilly 5.6k Jan 4, 2023
A small Python module for determining appropriate platform-specific dirs, e.g. a "user data dir".

the problem What directory should your app use for storing user data? If running on macOS, you should use: ~/Library/Application Support/<AppName> If

ActiveState Software 948 Dec 31, 2022
Better directory iterator and faster os.walk(), now in the Python 3.5 stdlib

scandir, a better directory iterator and faster os.walk() scandir() is a directory iteration function like os.listdir(), except that instead of return

Ben Hoyt 506 Dec 29, 2022
A platform independent file lock for Python

py-filelock This package contains a single module, which implements a platform independent file lock in Python, which provides a simple way of inter-p

Benedikt Schmitt 497 Jan 5, 2023
Simple Python File Manager

This script lets you automatically relocate files based on their extensions. Very useful from the downloads folder !

Aimé Risson 22 Dec 27, 2022
Vericopy - This Python script provides various usage modes for secure local file copying and hashing.

Vericopy This Python script provides various usage modes for secure local file copying and hashing. Hash data is captured and logged for paths before

null 15 Nov 5, 2022
pydicom - Read, modify and write DICOM files with python code

pydicom is a pure Python package for working with DICOM files. It lets you read, modify and write DICOM data in an easy "pythonic" way.

DICOM in Python 1.5k Jan 4, 2023
A simple file sharing tool written in python

Share it A simple file sharing tool written in python Installation If you are using Windows os you can directly Run .exe file --> download If you are

Sachit Yadav 7 Dec 16, 2022
Python virtual filesystem for SQLite to read from and write to S3

Python virtual filesystem for SQLite to read from and write to S3

Department for International Trade 70 Jan 4, 2023
fast change directory with python and ruby

fcdir fast change directory with python and ruby run run python script , chose drirectoy and change your directory need you need python and ruby deskt

XCO 2 Jun 20, 2022