~shabbyrobe/go-puff

Go port of zlib's puff.c
3486610d — Blake Williams 9 months ago
Try <del>
56157adf — Blake Williams 9 months ago
Add install to README
2fb66def — Blake Williams 1 year, 7 months ago
Move to vanity

refs

master
browse  log 

clone

read-only
https://git.sr.ht/~shabbyrobe/go-puff
read/write
git@git.sr.ht:~shabbyrobe/go-puff

You can also use your local clone with git send-email.

#Puff

Go port of zlib's puff.c.

Install:

go get -u go.shabbyrobe.org/puff

This is not production-quality code, I haven't polished or finessed it at all. I wouldn't recommend using it directly without a very specific reason, and even then I'd suggest pasting it into your own project so you can hack on it in peace and isolation.

Puff is best explained by the comments at the top of puff.c:

puff.c is a simple inflate written to be an unambiguous way to specify the deflate format. It is not written for speed but rather simplicity. As a side benefit, this code might actually be useful when small code is more important than speed, such as bootstrap applications. For typical deflate data, zlib's inflate() is about four times as fast as puff(). zlib's inflate compiles to around 20K on my machine, whereas puff.c compiles to around 4K on my machine (a PowerPC using GNU cc). If the faster decode() function here is used, then puff() is only twice as slow as zlib's inflate().

All dynamically allocated memory comes from the stack. The stack required is less than 2K bytes. This code is compatible with 16-bit int's and assumes that long's are at least 32 bits. puff.c uses the short data type, assumed to be 16 bits, for arrays in order to conserve memory. The code works whether integers are stored big endian or little endian.

In the comments below are "Format notes" that describe the inflate process and document some of the less obvious aspects of the format. This source code is meant to supplement RFC 1951, which formally describes the deflate format.

I ported puff at a hackathon ages ago while mucking around with something where the decompression had to happen with stack memory only. I don't remember what the precise constraints were at the time and I never ended up using it, but it was a fun little exercise and if nothing else it helped me get a bit better at porting stuff from C.

The error handling is not very Go-like at the moment. If I ever decide I need to use this again, I'll clean that up.

I also intended to futz around with it to make it able to decompress in-place (i.e. to not require a separate output slice) but never got around to that either. Wait this doesn't make much sense and I'm getting confused with something else anyway. Stack memory. It was about stack memory. I think. Haha!

Puff is generally slower than flate.Reader for common workloads, which is unsurprising. You can see for yourself by running ./tools.sh benchcmp. Here's the results in MB/s on a cantankerous old 13" MacBook Pro from 2015 for decompressing data compressed at level 6, which at present is what the flate package's DefaultCompression uses:

Benchmark/in=natural/sz=64/lvl=6-4         16.16        6.45         0.40x
Benchmark/in=natural/sz=1024/lvl=6-4       29.74        24.47        0.82x
Benchmark/in=natural/sz=16384/lvl=6-4      21.60        28.02        1.30x
Benchmark/in=natural/sz=262144/lvl=6-4     6.45         11.07        1.72x
Benchmark/in=rand/sz=64/lvl=6-4            44.38        8.83         0.20x
Benchmark/in=rand/sz=1024/lvl=6-4          414.74       111.91       0.27x
Benchmark/in=rand/sz=16384/lvl=6-4         443.98       722.65       1.63x
Benchmark/in=rand/sz=262144/lvl=6-4        443.44       859.83       1.94x
Benchmark/in=repeat/sz=64/lvl=6-4          39.13        2.51         0.06x
Benchmark/in=repeat/sz=1024/lvl=6-4        8.70         2.86         0.33x
Benchmark/in=repeat/sz=16384/lvl=6-4       1.49         2.45         1.64x
Benchmark/in=repeat/sz=262144/lvl=6-4      0.79         1.65         2.09x

Puff can be quicker for very small data sizes, i.e. less than around 2k, which is likely to be below the threshold at which compression is actually a useful time/space trade, but it always results in 0 additional allocations (beyond input and output buffers) regardless of input size.