A proof-of-concept implementation of a parallel-decodable PNG format

Overview

mtpng

A parallelized PNG encoder in Rust

by Brion Vibber [email protected]

Background

Compressing PNG files is a relatively slow operation at large image sizes, and can take from half a second to over a second for 4K resolution and beyond. See my blog post series on the subject for more details.

The biggest CPU costs in traditional libpng seem to be the filtering, which is easy to parallelize, and the deflate compression, which can be parallelized in chunks at a slight loss of compression between block boundaries.

pigz is a well-known C implementation of parallelized deflate/gzip compression, and was a strong inspiration for the chunking scheme used here.

I was also inspired by an experimental C++/OpenMP project called png-parallel by Pascal Beyeler, which didn't implement filtering but confirmed the basic theory.

State

Creates correct files in all color formats (input must be pre-packed). Performs well on large files, but needs work for small files and ancillary chunks. Planning API stability soon, but not yet there -- things will change before 1.0.

Goals

Performance:

  • ☑️ MUST be faster than libpng when multi-threaded
  • ☑️ SHOULD be as fast as or faster than libpng when single-threaded

Functionality:

  • ☑️ MUST support all standard color types and depths
  • ☑️ MUST support all standard filter modes
  • ☑️ MUST compress within a few percent as well as libpng
  • MAY achieve better compression than libpng, but MUST NOT do so at the cost of performance
  • ☑️ SHOULD support streaming output
  • MAY support interlacing

Compatibility:

  • MUST have a good Rust API (in progress)
  • MUST have a good C API (in progress)
  • ☑️ MUST work on Linux x86, x86_64
  • ☑️ MUST work on Linux arm, arm64
  • ☑️ SHOULD work on macOS x86_64
  • ☑️ SHOULD work on iOS arm64
  • ☑️ SHOULD work on Windows x86, x86_64
  • ☑️ ️ SHOULD work on Windows arm64

Compression

Compression ratio is a tiny fraction worse than libpng with the dual-4K screenshot and the arch photo at the current default 256 KiB chunk size, getting closer the larger you increase it.

Using a smaller chunk size, or enabling streaming mode, will increase the file size slightly more in exchange for greater parallelism (small chunks) and lower latency to bytes hitting the wire (streaming).

In 0.3.5 a correction was made to the filter heuristic algorithm to match libpng in some circumstances where it differs; this should provide very similar results to libpng when used as a drop-in replacement now. Later research may involve changing the heuristic, as it fails to correctly predict good performance of the "none" filter on many screenshot-style true color images.

Performance

Note that unoptimized debug builds are about 50x slower than optimized release builds. Always run with --release!

As of September 26, 2018 with Rust 1.29.0, single-threaded performance on Linux x86_64 is ~30-40% faster than libpng saving the same dual-4K screenshot sample image on Linux and macOS x86_64. Using multiple threads consistently beats libpng by a lot, and scales reasonably well at least to 8 physical cores.

See docs/perf.md for informal benchmarks on various devices.

At the default settings, files whose uncompressed data is less than 128 KiB will not see any multi-threading gains, but may still run faster than libpng due to faster filtering.

Todos

See the projects list on GitHub for active details.

Usage

Note: the Rust and C APIs are not yet stable, and will change before 1.0.

Rust usage

See the crate API docs for details.

The mtpng CLI tool can be used as an example of writing files.

In short, something like this:

let mut writer = Vec::<u8>::new();

let mut header = Header::new();
header.set_size(640, 480)?;
header.set_color(ColorType::TruecolorAlpha, 8)?;

let mut options = Options::new();

let mut encoder = Encoder::new(writer, &options);

encoder.write_header(&header)?;
encoder.write_image_rows(&data)?;
encoder.finish()?;

C usage

See c/mtpng.h for a C header file which connects to unsafe-Rust wrapper functions in the mtpng::capi module.

To build the C sample on Linux or macOS, run make. On Windows, run build-win.bat x64 for an x86-64 native build, or pass x86 or arm64 to build for those platforms.

These will build a sample executable from sample.c as well as a libmtpng.so, libmtpng.dylib, or mtpng.dll for it to link. It produces an output file in out/csample.png.

Data flow

Encoding can be broken into many parallel blocks:

Encoder data flow diagram

Decoding cannot; it must be run as a stream, but can pipeline (not yet implemented):

Decoder data flow diagram

Dependencies

Rayon is used for its ThreadPool implementation. You can create an encoder using either the default Rayon global pool or a custom ThreadPool instance.

crc is used for calculating PNG chunk checksums.

libz-sys is used to wrap libz for the deflate compression. I briefly looked at pure-Rust implementations but couldn't find any supporting raw stream output, dictionary setting, and flushing to byte boundaries without closing the stream.

itertools is used to manage iteration in the filters.

typenum is used to do compile-time constant specialization via generics.

png is used by the CLI tool to load input files to recompress for testing.

clap is used by the CLI tool to handle option parsing and help display.

time is used by the CLI tool to time compression.

License

You may use this software under the following MIT-style license:

Copyright (c) 2018-2021 Brion Vibber

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Comments
  • Setting compression strategy in C API

    Setting compression strategy in C API

    Hi, I was experimenting with the compression strategy setting of the mtpng cli tool. For a specific type of images I'm seeing a huge benefit in using RLE or Huffman instead of the default setting.

    I don't really see a way of setting this option using the C-API though. Unfortunately I don't know anything about Rust, but I'd assume that a new wrapper method similar to capi.rs/mtpng_encoder_options_set_filter is required to expose this setting to C?

    opened by pwuertz 5
  • Specialize filter_iter using const generic instead of macro

    Specialize filter_iter using const generic instead of macro

    Hi, I recently read your very good blog posts about mtpng and thought that maybe you didn't know about the excellent create typenum when you wrote your article about constant specialization.

    This PR is about using this crate to solve the same problem exposed in this blog post: https://brionv.com/log/2018/09/12/parallelizing-png-part-8-rust-macros-for-constant-specialization/


    Since real const generics are not implemented yet (RFC2000), we use the widely used typenum create which fills the gap quite elegantly.

    Monomorphization of this function is now semantically explicit to Rust and does not anymore rely on an LLVM optimization pass which might break at anytime in the future.

    opened by PaulGrandperrin 4
  • expose

    expose "write_chunk" functionality

    The current API does not allow writing arbitrary chunks. This is because the Writer is not exposed on the Encoder struct, and the writer module is not public. This is currently limiting me from using mtpng, as I need to be able to write extra/custom chunks other than allowed currently. Maybe a new write_chunk method could be added to the Encoder that simply forwards to the inner Writer?

    opened by JasperDeSutter 3
  • Avoid panicking on Encoder drop

    Avoid panicking on Encoder drop

    This PR

    This pull request changes two lines in encoder.rs, preventing rayon threads that are still running when their Encoder is dropped from panicking when they try to send their deflated/filtered data. The two lines changed replace the unwrap() calls on the rayon threads' Senders' send() results with ok() calls, essentially ignoring whether the send operation was successful.

    Linked Issues

    This PR should close #22.

    Notes

    This is a quick fix. Let me know if you think something else would be a better idea here.

    opened by Kneelawk 1
  • Can't build with libz-sys on Windows ARM64

    Can't build with libz-sys on Windows ARM64

    Currently I can't figure out how to get a Windows ARM64 (aarch64-pc-windows-msvc) build working with the libz-sys dependency. Since there's no system zlib, it pulls the source and tries to build an embedded copy.

    But something builds for x86 (the toolchain arch) instead of arm64 (the target arch) and ends up failing to link. (It's cross-compiling on the actual device because there's no native arm64 toolchain, but it runs the x86 toolchain fine in emulation.)

    Note that on the miniz-oxide branch, it builds and runs fine without the C dependency, but the encoding is much slower.

    opened by brion 1
  • Encoder panics if dropped before calling flush()

    Encoder panics if dropped before calling flush()

    The Issue

    If an Encoder is created and written to, then dropped, there is a likelihood that the encoder will panic when still-running encoder threads attempt to send filtered or deflated results back to the now dropped encoder. This is because the Receiver held by the encoder will be dropped, but separate threads could still be running and attempting to send using the Senders attached to the now dropped receiver. Once these threads have tried to send their results back to the dropped encoder, they unwrap the results on the sender, causing their rayon threads to panic, killing the application.

    The code of note is in encoder.rs on lines 734 and 757. https://github.com/brion/mtpng/blob/7b8bc8939c8dda1c571dd5083113f4a45f074a99/src/encoder.rs#L734 https://github.com/brion/mtpng/blob/7b8bc8939c8dda1c571dd5083113f4a45f074a99/src/encoder.rs#L757

    Context

    I am using mtpng in a situation where there are many circumstances under which it could be dropped and attempting to find each of them and make sure flush() is called before the drop happens is difficult.

    Pull Request

    I am working on a pull request that specifically replaces these unwrap() statements with ok() statements so that the Results of attempting to send the filtered/deflated data are ignored.

    bug 
    opened by Kneelawk 0
  • Compression level setter for C-API

    Compression level setter for C-API

    Added another c-api export for setting the compression_level encoder option.

    Also added a extern "C" guard to the header which C++ requires for linking.

    opened by pwuertz 0
  • Single threaded option - rayon as the default feature

    Single threaded option - rayon as the default feature

    I know the crate is specifically for multi-threaded encoding/decoding. I have managed to get sub millisecond encoding per image for my use case of encoding hundreds of small png files concurrently, and I would like to use mtpng to have low level control over Indexed pngs with transparency.

    However, server side I do not want to use many threads on each request. Throughput of the server is more important, not time per request, so I think using the current thread would be the best way to do this.

    I have looked at the code to see how easy it would be to have rayon as a default (optional) dependency, and be able to add default-features=false. However, I dont understand the code enough to remove the multithreading part in encoder.rs.

    Also Im not even sure there would be a significant performance gain over

    let pool = rayon::ThreadPoolBuilder::new().num_threads(1).build().unwrap(); (except that creating a thread pool per request seems like a bad idea)

    I'd like to get feedback on this, also it could be useful for the WASM issue #13

    enhancement 
    opened by apps4uco 1
  • Removal of unsafe and panics

    Removal of unsafe and panics

    This could be interesting to merge into master. It removes all panics from code, use results where possible, and also removes unsafe code, by using the rust implementation of deflate. There is also an attempt to create some fuzz testing, but probably needs more work.

    opened by dgsantana 2
  • Add metadata to allow parallel decoding

    Add metadata to allow parallel decoding

    PNG is an extensible format - you can add custom Ancillary chunks to a file, and decoders will ignore any such chunk that they don't recognize.

    If an Ancillary chunk was added, to store the offsets of the zlib sub-streams, then a decoder could take advantage of this information to parallelize decoding.

    ~~Optionally, to allow parallel defiltering, the pixel values of the last row in each block could be included, compressed (this comes with an obvious file-size penalty, however)~~ (thinking about this some more, it'd make more sense to just never use filtering on the first row of each block)

    Thinking even more about this, a sufficiently intelligent client requesting PNGs over the network could use this metadata to request each chunk over a parallel connection (e.g. using HTTP range requests)

    opened by DavidBuchanan314 2
  • Use crc32fast crate instead of crc

    Use crc32fast crate instead of crc

    See https://crates.io/crates/crc32fast for benchmark data. The speedup is noticeable when the image is multiple MBs. Tests on my machine show 5% faster on 8MB screenshot.

    opened by JasperDeSutter 0
Owner
Brion Vibber
MediaWiki, video playback, and other random stuff.
Brion Vibber
Png-to-stl - Converts PNG and text to SVG, and then extrudes that based on parameters

have ansible installed locally run ansible-playbook setup_application.yml this sets up directories, installs system packages, and sets up python envir

null 1 Jan 3, 2022
A Python Script to convert Normal PNG Image to Apple iDOT PNG Image.

idot-png-encoder A Python Script to convert Normal PNG Image to Apple iDOT PNG Image (Multi-threaded Decoding PNG). Usage idotpngencoder.py -i <inputf

Lrdcq 2 Feb 17, 2022
Pyconvert is a python script that you can use to convert image files to another image format! (eg. PNG to ICO)

Pyconvert is a python script that you can use to convert image files to another image format! (eg. PNG to ICO)

null 1 Jan 16, 2022
Transfers a image file(.png) to an Excel file(.xlsx)

Transfers a image file(.png) to an Excel file(.xlsx)

Junu Kwon 7 Feb 11, 2022
An API that renders HTML/CSS content to PNG using Chromium

html_png An API that renders HTML/CSS content to PNG using Chromium Disclaimer I am not responsible if you happen to make your own instance of this AP

null 10 Aug 8, 2022
Convert the SVG code to PNG and replace the line by a call to the image in markdown

Convert the SVG code to PNG and replace the line by a call to the image in markdown

null 6 Sep 6, 2022
Fast batch image resizer and rotator for JPEG and PNG images.

imgp is a command line image resizer and rotator for JPEG and PNG images.

Terminator X 921 Dec 25, 2022
A simple programme for converting url into a qr code (.png file)

QrTk A simple lightweight programme for converting url into a qr code (.png file) Pre-Requisites Before installing the programme , you need to run the

Juss Patel 4 Nov 8, 2021
A little Python tool to convert a TrueType (ttf/otf) font into a PNG for use in demos.

font2png A little Python tool to convert a TrueType (ttf/otf) font into a PNG for use in demos. To use from command line it expects python3 to be at /

Rich Elmes 3 Dec 22, 2021
Typesheet is a tiny Python script for creating transparent PNG spritesheets from TrueType (.ttf) fonts.

typesheet typesheet is a tiny Python script for creating transparent PNG spritesheets from TrueType (.ttf) fonts. I made it because I couldn't find an

Grayson Chao 12 Dec 23, 2022
With this simple py script you will be able to get all the .png from a folder and generate a yml for Oraxen

Oraxen-item-to-yml With this simple py script you will be able to get all the .png from a folder and generate a yml for Oraxen How to use Install the

Akex 1 Dec 29, 2021
A simple python script to reveal the contents of a proof of vaccination QR code.

vaxidecoder A simple python script to reveal the contents of a proof of vaccination QR code. It takes a QR code image as input, and returns JSon data.

Hafidh 2 Feb 28, 2022
impy is an all-in-one image analysis library, equipped with parallel processing, GPU support, GUI based tools and so on.

impy is All You Need in Image Analysis impy is an all-in-one image analysis library, equipped with parallel processing, GPU support, GUI based tools a

null 24 Dec 20, 2022
A pure python implementation of the GIMP XCF image format. Use this to interact with GIMP image formats

Pure Python implementation of the GIMP image formats (.xcf projects as well as brushes, patterns, etc)

FHPyhtonUtils 8 Dec 30, 2022
Conversion of Image, video, text into ASCII format

asciju Python package that converts image to ascii Free software: MIT license

Aju Tamang 11 Aug 22, 2022
A Icon Maker GUI Made - Convert your image into icon ( .ico format ).

Icon-Maker-GUI A Icon Maker GUI Made Using Python 3.9.0 . It will take any image and convert it to ICO file, for web site favicon or Windows applicati

Insanecodes 12 Dec 15, 2021
New program to export a Blender model to the LBA2 model format.

LBA2 Blender to Model 2 This is a new program to export a Blender model to the LBA2 model format. This is also the first publicly released version of

null 2 Nov 30, 2022
Xmas-Tree-GIF-Tool - Convert any given animated gif file into an animation in GIFT CSV format

This repo is made to participate in Matt Parker's XmasTree 2021 event. Convert a

Aven Zitzelberger 2 Dec 30, 2021
Multi-view 3D reconstruction using neural rendering. Unofficial implementation of UNISURF, VolSDF, NeuS and more.

Multi-view 3D reconstruction using neural rendering. Unofficial implementation of UNISURF, VolSDF, NeuS and more.

Jianfei Guo 683 Jan 4, 2023