proof reading before publication? couldn't be me
fix plot ref
initial
I compress three different corpora using various compression methods available as Rust crates. Only lossless general-purpose compression algorithms are considered. Where available, bulk compression/decompression APIs are used rather than streaming APIs. Executable code size and use of main memory (internal buffers etc.) is not considered. Details on the corpora and the choice of compression schemes can be found below.
The test was automated (except for the figures), the source code can be found in this repository. A rough overview of the test procedure follows.
The source code is laid out as follows.
The directory common
contains a library crate with common functionality: Reading the corpora, Compress
and Decompress
traits as a common abstraction for all schemes, recording runtime, and statistical summary of the results.
The directory schemes
contains a binary crate for each of the compression schemes.
Each of those crates can be run with cargo run --release
to perform the test for that scheme.
The output is a headerless csv with the following columns in order: scheme name, compression settings, corpus, average compression speed (MB/s), empirical standard deviation of compression speed (MB/s), average decompression speed (MB/s), empirical standard deviation of decompression speed (MB/s), compression ratio.
The folder schemes
also contains a simple shell script that runs each benchmark.
For raw data, see results.csv
(this CSV includes a header line).
Note: speed of compression and decompression is always measured by the throughput of uncompressed data.
For additional plots not shown here, see the plots directory.
Generally, all schemes compress the canterbury corpus (2.8 MB) best in terms of both size and throughput (i.e. compression & decompression speed), followed by the "large" canterbury corpus (11 MB), followed by the Silesia corpus (212 MB).
Note that compression and decompression speed are depicted with error bars (depicting the square root of the sample variance) but they are almost always smaller than the markers.
The smallest compressed sizes (of the Silesia corpus) are achieved, by:
(Order is from best to worst.)
The following graphics depict the throughput (compression and decompression speed) of each scheme and all tested settings against the achieved size reduction (i.e. quality of the compression) for the Silesia corpus. For the smaller corpora, the shape is roughly the same when accounting for the previously stated fact that they are generally compressed better in all metrics.
At the highest range of compression (smallest compressed sizes, >70% size reduction), there are four contenders: rust-lzma, brotli/brotlic (which have similar overall performance), bzip2, and zstd. rust-lzma achieves the best compression and, at a given speed, generally compresses better than zstd and brotli. brotli decompresses faster than rust-lzma and zstd decompresses faster than brotli. zstd compresses about as fast or faster than brotli but doesn't quite reach the same top size reduction. bzip2 compresses faster than rust-lzma but has the slowest decompression speed.
In the upper middle range (50% to 70% size reduction), zstd compresses faster than anything else. It also decompresses faster than most other algorithms except lzzzz, which has the fastest decompression speed in its entire range of supported compression qualities. In this range, brotlic performs similarly to most DEFLATE implementations, but brotli can also achieve smaller file sizes (at reduced throughput). brotli (the Rust implementation) has slower decompression speed than the DEFLATE decoders. zopfli and zopfli-rs achieve marginally better compression than the other DEFLATE encoders but have absolutely terrible compression throughput. lzss has very low compression speed and can't achieve very high size reduction, but it decompresses at a similar rate as the DEFLATE decoders. The variants with compile-time parameters are only faster than the one with run-time parameters when decompressing, not when compressing.
In the mid-range (~50% size reduction), the snappy implementations, lz4_flex, and lzo1x-1 have similar compressed sizes and compression speed. lz4_flex has a much higher decompression speed.
For worse compression quality (bigger compressed sizes), lzzzz achieves the fastest compression and decompression speed. zstd has similar compression speed, but decompression is slower by a varying factor of up to ~2.
Use zstd. At compression level 5 zstd compresses at ~100 MB/s and decompresses at ~1 GB/s while reducing file size by 70-75%. It doesn't achieve the best size reduction or the fastest throughput overall, but it's competitive for a wide range of speed/size tradeoffs. It also supports pre-training dictionaries to achieve better compression and throughput for many instances of small similar data (though this is not tested here).
If you need the smallest compressed size with no regard for speed, use rust-lzma at the highest compression settings. Be aware however that the file size reduction is only a little better than zstd while decompression takes an order of magnitude longer.
If you need something in pure Rust with no FFI, use either lz4_flex or yazi. lz4_flex decompresses extremely fast (2+ GB/s), compresses only slightly slower (350 MB/s) than the fastest Rust compressor overall and achieves a middling size reduction (50-55% depending on corpus). yazi achieves significantly better compression (70-75%) at the cost of disproportionately lower throughput.
If you have requirements which haven't been considered here or a particular kind of data, do your own research. 🙂
Unless noted otherwise, all compression schemes are used with default features.
Settings are mostly combinations of parameters that get special mention in the documentation.
For example, brotli allows setting a block buffer size, but doesn't make note of valid or suggested values, so I only used the setting 4096
(which appears in the documentation).
As another example, flate2 documents that valid compression levels are 0-9 inclusive, and the entire range is tested.
Note that some compression algorithms (e.g. zstd) have provisions for compressing many instances of small but similar data. These algorithms may (or may not) perform much better on the canterbury corpus and smaller data than it appears here.
The automated benchmarks were compiled using stable Rust 1.69 (2023-04-20) in release mode (no custom settings). Compression and decompression were performed sequentially. Benchmarks were run sequentially.
There are a few groups of compression schemes/crates:
.to_vec()
, "decompression" via .copy_from_slice()
.
Serves as a baseline comparison.zlib-ng
backends are tested.
Does not support no_std
, but miniz_oxide
does.
Settings are: format (deflate/zlib/gzip) and compression level (0 to 10 inclusive, 0 being no compression, 10 being high/slow compression).no_std
(block format only).no_std
Support.
The documentation is bad, but I make an exception because it's popular.
Same settings as for the bindings.libzstd
.
The full range of positive quality levels from 0 to 22 (inclusive) are tested, as well as a few negative (fast) quality levels down to -50 are tested.no_std
?
No tunable parameters.no_std
.
Intended for embedded systems, claims small code size and little RAM/CPU use.
Both dynamic (runtime parameters) and generic (compile-time parameters) are tested.
Parameters are the number of bits in the offset (range 10 to 13 inclusive is tested), the number of bits in the length (4 and 5 bits are tested), and the initial fill byte of the compression buffer (0x20).The following compression schemes / crates are categorically excluded:
I have purposefully excluded the following schemes from consideration (again, in no particular order):
lz4_compression
.no_std
. Build failed when I tried it.There are many more crates which I don't note here, probably many more that I haven't noticed, and some of them may be good.
If you want to run the comparison yourself, you can download the corpora at the above URLs.
To set up the file structure used by the test runner, place the archived corpora, cantrbry.tar.gz
, large.tar.gz
, and silesia.zip
at the root of the project folder and run setup-corpora.sh
.
This should unpack the data to the corpora
subdirectory.
You can delete the archives afterwards.