Powerful Python library for atomic file writes.

Overview

python-atomicwrites

https://travis-ci.com/untitaker/python-atomicwrites.svg?branch=master https://ci.appveyor.com/api/projects/status/vadc4le3c27to59x/branch/master?svg=true Documentation Status

Atomic file writes.

from atomicwrites import atomic_write

with atomic_write('foo.txt', overwrite=True) as f:
    f.write('Hello world.')
    # "foo.txt" doesn't exist yet.

# Now it does.

See API documentation for more low-level interfaces.

Features that distinguish it from other similar libraries (see Alternatives and Credit):

  • Race-free assertion that the target file doesn't yet exist. This can be controlled with the overwrite parameter.

  • Windows support, although not well-tested. The MSDN resources are not very explicit about which operations are atomic. I'm basing my assumptions off a comment by Doug Crook, who appears to be a Microsoft employee:

    Question: Is MoveFileEx atomic if the existing and new files are both on the same drive?

    The simple answer is "usually, but in some cases it will silently fall-back to a non-atomic method, so don't count on it".

    The implementation of MoveFileEx looks something like this: [...]

    The problem is if the rename fails, you might end up with a CopyFile, which is definitely not atomic.

    If you really need atomic-or-nothing, you can try calling NtSetInformationFile, which is unsupported but is much more likely to be atomic.

  • Simple high-level API that wraps a very flexible class-based API.

  • Consistent error handling across platforms.

How it works

It uses a temporary file in the same directory as the given path. This ensures that the temporary file resides on the same filesystem.

The temporary file will then be atomically moved to the target location: On POSIX, it will use rename if files should be overwritten, otherwise a combination of link and unlink. On Windows, it uses MoveFileEx through stdlib's ctypes with the appropriate flags.

Note that with link and unlink, there's a timewindow where the file might be available under two entries in the filesystem: The name of the temporary file, and the name of the target file.

Also note that the permissions of the target file may change this way. In some situations a chmod can be issued without any concurrency problems, but since that is not always the case, this library doesn't do it by itself.

fsync

On POSIX, fsync is invoked on the temporary file after it is written (to flush file content and metadata), and on the parent directory after the file is moved (to flush filename).

fsync does not take care of disks' internal buffers, but there don't seem to be any standard POSIX APIs for that. On OS X, fcntl is used with F_FULLFSYNC instead of fsync for that reason.

On Windows, _commit is used, but there are no guarantees about disk internal buffers.

Alternatives and Credit

Atomicwrites is directly inspired by the following libraries (and shares a minimal amount of code):

Other alternatives to atomicwrites include:

  • sashka/atomicfile. Originally I considered using that, but at the time it was lacking a lot of features I needed (Windows support, overwrite-parameter, overriding behavior through subclassing).
  • The Boltons library collection features a class for atomic file writes, which seems to have a very similar overwrite parameter. It is lacking Windows support though.

License

Licensed under the MIT, see LICENSE.

Comments
  • Passes kwargs through AtomicWriter and uses io.open as opener

    Passes kwargs through AtomicWriter and uses io.open as opener

    This enables users to pass any option supported by tempfile.NamedTemporaryFile() to atomic_write(). However, mode, dir, and delete continue to be overwritten as they were previously, to enforce the atomic write behavior.

    Let me know if this approach is alright. I'm happy to adjust as needed.

    Fixes #37

    opened by lorengordon 18
  • Add mypy type annotations

    Add mypy type annotations

    So, I've been playing with the MyPy static type checker lately, and I decided spend some downtime adding type annotations to Python projects I've worked on.

    I had to change the logic a bit in order to make the code type check successfully, mainly by more eagerly converting a "unicode or bytes" path to unambiguously unicode when the object is constructed.

    I also made it so that tox -e py-typecheck will run the type checker.

    opened by DarwinAwardWinner 16
  • Use os.replace instead of os.rename for python 3.3+

    Use os.replace instead of os.rename for python 3.3+

    I'm using this lib on Windows 10 with no issues so far, but I'm affraid os.rename, though not well documented for Windows, isn't atomic. As per the documentation:

    If you want cross-platform overwriting of the destination, use replace(). https://docs.python.org/3/library/os.html#os.rename

    So I suggest something along the lines of:

    def _replace_atomic(src, dst):
            try:
                    os.replace(src, dst)
            except:
                    os.rename(src, dst)
            _sync_directory(os.path.normpath(os.path.dirname(dst)))
    

    Ill submit a pull request and I would appreciate if you could update your pip entry. Thanks

    opened by dhulke 15
  • Auto deleting outdated tmp files

    Auto deleting outdated tmp files

    Primitive approach implemented here is to fix outstanding temporary files on Windows - it is especially noticeable for long-running processes temporarily saving some state every couple minutes.

    opened by Girgitt 8
  • Restore ability to use tempfile kwargs other than dir

    Restore ability to use tempfile kwargs other than dir

    The changes in #38 broke the ability to use prefix, suffix and bufsize arguments when creating the tempfile. This pull request restores the ability, and is intended to fix #39

    opened by GlenWalker 7
  • Add `atomic_folder`

    Add `atomic_folder`

    Sometimes I have a bunch of operations which create many files in a folder. The folder should only be there in the end of the operation. Something like this:

    with atomic_folder("foo") as folder:
        unzip(something, folder)
        # "foo/" doesn't exist yet.
    # now it does
    
    enhancement help wanted 
    opened by sotte 7
  • fsync after rename/move

    fsync after rename/move

    On POSIX it's possible to fsync the directory, to ensure that the new file names are written to disk. I think on Windows it's necessary to open the new (target) file and fsync it.

    opened by Unrud 6
  • Proper fsync on OS X

    Proper fsync on OS X

    I think this is missing

    if hasattr(fcntl, "F_FULLFSYNC"):
        fcntl.fcntl(f.fileno(), fcntl.F_FULLFSYNC)
    else:
        os.fsync(f.fileno())
    

    on OSX.

    As mentioned here [0] and here [1] fsync doesn't wait for the changes to hit the disk there.

    [0] https://lists.apple.com/archives/darwin-dev/2005/Feb/msg00072.html [1] https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/fsync.2.html#//apple_ref/doc/man/2/fsync

    enhancement 
    opened by lazka 6
  • Add python 3.5 and 3.6 to CI

    Add python 3.5 and 3.6 to CI

    Fun fact: flake was set to run with python 2.7 and 3.5, but 3.5 was not in the build matrix.

    Also did some simplification, since we could remove some conditionals, variable, etc.

    opened by WhyNotHugo 5
  • _open except block accidentally raises the inner exception

    _open except block accidentally raises the inner exception

    I'm talking specifically about this code:

            except:
                try:
                    self.rollback(f)
                except Exception:
                    pass
                raise
    

    I managed to hit a corner case where rollback throws an exception (f was None at that point), so control flow reached the raise statement on the last line. For some reason, this raised the inner exception, not the outer one.

    I can submit a patch to fix this: would you prefer a fix that uses the three-argument raise statement, which only works with Python 2, or should I add a dependency on six, and then call six.reraise, so it works with both 2 and 3?

    bug 
    opened by sruggier 5
  • `pathlib.Path` support

    `pathlib.Path` support

    Would it be possible to support passing in pathlib.Path for the path atomic_write(path)?

    I noticed that doing so works fine for PosixPaths, but fails for WindowsPaths from this CI run. This can result in someone breaking cross-platform support without being obvious. I notice that the test follows the same solution I went with where you just cast the path to a str.

    Allowing atomic_write to accept Paths also helps atomic_write behave more like open as well which also accepts Paths.

    I haven't delved too far, but the fix might be as simple as

    try:
        from pathlib import Path
        if isinstance(path, Path):
            path = str(path)
    except ImportError:
        # Just assume `path` is already a `str`
        pass
    
    opened by LovecraftianHorror 4
  • atomicwrites' old versions have been purged from pypi

    atomicwrites' old versions have been purged from pypi

    Screenshot 2022-07-08 at 19 50 03

    pypi just told me i had to enable 2fa to keep uploading this package. because I thought that was an annoying and entitled move in order to guarantee SOC2 compliance for a handful of companies (at the expense of my free time), i deleted the package and published a new version, just to see if the warning disappears. it did, so that's great.

    what i didn't consider is that this would delete old versions. those are apparently now gone and yet it's apparently not possible for me to re-upload them. i don't think that's sensible behavior by pypi, but either way i'm sorry about that. the API has been the same since the first release anyway.

    opened by untitaker 2
  • add path_generator

    add path_generator

    I have a python script that outputs file(s) to a specific directory and may be running on multiple computers within a cluster. The output files should all be named like name.#.csv where # is an increasing number: 1, 2, 3, etc. I don't want any files to be overwritten, but I also don't want the write to fail. Instead, it should just increment the file number until the write succeeds. I originally used a dumb loop when writing the file to find the next available file to use, but I'd run into race conditions with many processes running simultaneously and files would somewhat frequently get overwritten.

    My solution was the edit I've added here as PR. Basically it adds a "path generator" that should be a generator which sequentially returns the next filename to try. This is flexible, as the user could make the generator return paths based on something entirely different than a sequential numbering. With this, my script works great and I haven't seen anymore files getting overwritten, so I thought I'd add a PR in case others find it useful. If not, feel free to just close this!

    Here is an example of how I use it:

    import itertools
    from atomicwrites import AtomicWriter
    
    def file_incrementor(base_filename):
        for i in itertools.count(1):
            # change i to 1 to see error raised by library to avoid infinite loop
            yield base_filename.format(i)
    
    writer = AtomicWriter(path='name.{}.csv', path_generator=file_incrementor)
    with writer.open() as output:
        # calculations with intermittent writes to output
        output.write('example text')
    # it also saves the actual path that was saved (regardless of whether path_generator was used)
    print(writer.final_path)
    
    opened by Spectre5 7
  • option to preserve the permissions

    option to preserve the permissions

    When I overwrite a file, the permissions of the file are changed:

    $ touch foo
    
    $ chmod 0123 foo
    
    $ stat foo | sed -n 's/^\(Access.*\)Uid.*$/\1/p'
    Access: (0123/---x-w--wx)  
    
    $ python3
    Python 3.7.3rc1 (default, Mar 13 2019, 11:01:15) 
    [GCC 8.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from atomicwrites import atomic_write
    >>> with atomic_write('foo', overwrite=True) as f:
    ...  f.write('foo')
    ... 
    3
    >>> 
    
    $ stat foo | sed -n 's/^\(Access.*\)Uid.*$/\1/p'
    Access: (0600/-rw-------)  
    

    The normal non-atomic method of overwriting a file does not change the mode:

    $ touch foo
    
    $ chmod 0777 foo
    
    $ stat foo | sed -n 's/^\(Access.*\)Uid.*$/\1/p'
    Access: (0777/-rwxrwxrwx)  
    
    $ python3
    Python 3.7.3rc1 (default, Mar 13 2019, 11:01:15) 
    [GCC 8.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> with open('foo', 'w') as f:
    ...  f.write('foo')
    ... 
    3
    >>> 
    
    $ stat foo | sed -n 's/^\(Access.*\)Uid.*$/\1/p'
    Access: (0777/-rwxrwxrwx)  
    

    It would be nice to have an option to preserve the mode of the original file.

    opened by pabs3 8
  • Option to disable fsync

    Option to disable fsync

    Unsure if anybody needs this. I might need this because in one usecase I'm writing a lot of files (to different filenames), and only need a guarantee that a SIGKILL won't leave a partially written file (at the target location, tmpfiles are irrelevant).

    Also this might be a problem with SSDs, as mentioned in #6

    opened by untitaker 16
Owner
Markus Unterwaditzer
"Do not even think of telephoning me about this program. Send cash first!" --Author of the UNIX file command.
Markus Unterwaditzer
A python script to convert an ucompressed Gnucash XML file to a text file for Ledger and hledger.

README 1 gnucash2ledger gnucash2ledger is a Python script based on the Github Gist by nonducor (nonducor/gcash2ledger.py). This Python script will tak

Thomas Freeman 0 Jan 28, 2022
Python package to read and display segregated file names present in a directory based on type of the file

tpyfilestructure Python package to read and display segregated file names present in a directory based on type of the file. Installation You can insta

Tharun Kumar T 2 Nov 28, 2021
File-manager - A basic file manager, written in Python

File Manager A basic file manager, written in Python. Installation Install Pytho

Samuel Ko 1 Feb 5, 2022
A simple Python code that takes input from a csv file and makes it into a vcf file.

Contacts-Maker A simple Python code that takes input from a csv file and makes it into a vcf file. Imagine a college or a large community where each y

null 1 Feb 13, 2022
gitfs is a FUSE file system that fully integrates with git - Version controlled file system

gitfs is a FUSE file system that fully integrates with git. You can mount a remote repository's branch locally, and any subsequent changes made to the files will be automatically committed to the remote.

Presslabs 2.3k Jan 8, 2023
This is a file deletion program that asks you for an extension of a file (.mp3, .pdf, .docx, etc.) to delete all of the files in a dir that have that extension.

FileBulk This is a file deletion program that asks you for an extension of a file (.mp3, .pdf, .docx, etc.) to delete all of the files in a dir that h

Enoc Mena 1 Jun 26, 2022
Search for files under the specified directory. Extract the file name and file path and import them as data.

Search for files under the specified directory. Extract the file name and file path and import them as data. Based on that, search for the file, select it and open it.

G-jon FujiYama 2 Jan 10, 2022
Small-File-Explorer - I coded a small file explorer with several options

Petit explorateur de fichier / Small file explorer Pour la première option (création de répertoire) / For the first option (creation of a directory) e

Xerox 1 Jan 3, 2022
Pti-file-format - Reverse engineering the Polyend Tracker instrument file format

pti-file-format Reverse engineering the Polyend Tracker instrument file format.

Jaap Roes 14 Dec 30, 2022
Generates a clean .txt file of contents of a 3 lined csv file

Generates a clean .txt file of contents of a 3 lined csv file. File contents is the .gml file of some function which stores the contents of the csv as a map.

Alex Eckardt 1 Jan 9, 2022
PaddingZip - a tool that you can craft a zip file that contains the padding characters between the file content.

PaddingZip - a tool that you can craft a zip file that contains the padding characters between the file content.

phithon 53 Nov 7, 2022
Extract longest transcript or longest CDS transcript from GTF annotation file or gencode transcripts fasta file.

Extract longest transcript or longest CDS transcript from GTF annotation file or gencode transcripts fasta file.

laojunjun 13 Nov 23, 2022
Two scripts help you to convert csv file to md file by template

Two scripts help you to convert csv file to md file by template. One help you generate multiple md files with different filenames from the first colume of csv file. Another can generate one md file with several blocks.

null 2 Oct 15, 2022
Simple, convenient and cross-platform file date changing library. 📝📅

Simple, convenient and cross-platform file date changing library.

kubinka0505 15 Dec 18, 2022
Python Fstab Generator is a small Python script to write and generate /etc/fstab files based on yaml file on Unix-like systems.

PyFstab Generator PyFstab Generator is a small Python script to write and generate /etc/fstab files based on yaml file on Unix-like systems. NOTE : Th

Mahdi 2 Nov 9, 2021
An object-oriented approach to Python file/directory operations.

Unipath An object-oriented approach to file/directory operations Version: 1.1 Home page: https://github.com/mikeorr/Unipath Docs: https://github.com/m

Mike Orr 506 Dec 29, 2022
A platform independent file lock for Python

py-filelock This package contains a single module, which implements a platform independent file lock in Python, which provides a simple way of inter-p

Benedikt Schmitt 497 Jan 5, 2023
Simple Python File Manager

This script lets you automatically relocate files based on their extensions. Very useful from the downloads folder !

Aimé Risson 22 Dec 27, 2022
Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once

Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once

Department for International Trade 206 Jan 2, 2023