Unix Data Compression Shootout

Fri 23 May 2025 by R.L. Dane

I wanted to try a new-to-me compressor, lz4, but it turned into a full ADHD-fueled file compression shoot-out:

Dang, lz4 is crazy fast!

Data/setup

The corpus is a 2.29 GiB uncompressed tar file consisting of several years worth of GPS data in various plain-text formats.
The computer is a Thinkpad x260 with the CPU governor set to performance. The CPU is an Intel i5-6200U

Outcome

Chart: (grouped by compressor)

command/compressor  time (user)       size  ratio
none/cat              0.077     2462955520 
gzip                 57.283      338289587   7.28
gzip -1              22.682      400956710   6.14
gzip -9             113.047      325547190   7.57
bzip2               319.847      262857414   9.37
bzip2 -1            255.654      278217711   8.85
bzip2 -9            326.718      262857414   9.37
bzip3               205.822      231173201  10.65
zstd                 12.520      321229917   7.67
zstd -1               8.812      317234226   7.76
zstd -9              63.019      282940675   8.70
zstd -11            101.278      281894351   8.74
zstd --ultra -22   7317.944      230075751  10.70
xz                 1476.153      228082956  10.80
xz -1               201.569      290137816   8.49
xz -9e             4683.144      212748984  11.58
lz4                   5.744      549838913   4.48
lz4 -1                5.762      549838913   4.48
lz4 -9               74.670      434543206   5.67

Sorted by size: (descending)

command/compressor  time (user)       size  ratio
none/cat              0.077     2462955520 
lz4                   5.744      549838913   4.48
lz4 -1                5.762      549838913   4.48
lz4 -9               74.670      434543206   5.67
gzip -1              22.682      400956710   6.14
gzip                 57.283      338289587   7.28
gzip -9             113.047      325547190   7.57
zstd                 12.520      321229917   7.67
zstd -1               8.812      317234226   7.76
xz -1               201.569      290137816   8.49
zstd -9              63.019      282940675   8.70
zstd -11            101.278      281894351   8.74
bzip2 -1            255.654      278217711   8.85
bzip2               319.847      262857414   9.37
bzip2 -9            326.718      262857414   9.37
bzip3               205.822      231173201  10.65
zstd --ultra -22   7317.944      230075751  10.70
xz                 1476.153      228082956  10.80
xz -9e             4683.144      212748984  11.58

Sorted by time: (ascending)

command/compressor  time (user)       size  ratio
none/cat              0.077     2462955520 
lz4                   5.744      549838913   4.48
lz4 -1                5.762      549838913   4.48
zstd -1               8.812      317234226   7.76
zstd                 12.520      321229917   7.67
gzip -1              22.682      400956710   6.14
gzip                 57.283      338289587   7.28
zstd -9              63.019      282940675   8.70
lz4 -9               74.670      434543206   5.67
zstd -11            101.278      281894351   8.74
gzip -9             113.047      325547190   7.57
xz -1               201.569      290137816   8.49
bzip3               205.822      231173201  10.65
bzip2 -1            255.654      278217711   8.85
bzip2               319.847      262857414   9.37
bzip2 -9            326.718      262857414   9.37
xz                 1476.153      228082956  10.80
xz -9e             4683.144      212748984  11.58
zstd --ultra -22   7317.944      230075751  10.70

Chart: (compression ratio / time score)

command/compressor  time (user) size        ratio   ratio/time
zstd --ultra -22    7317.944     230075751  10.70    0.0015
xz -9e              4683.144     212748984  11.58    0.0025
xz                  1476.153     228082956  10.80    0.0073
bzip2 -9             326.718     262857414   9.37    0.0287
bzip2                319.847     262857414   9.37    0.0293
bzip2 -1             255.654     278217711   8.85    0.0346
xz -1                201.569     290137816   8.49    0.0421
bzip3                205.822     231173201  10.65    0.0518
gzip -9              113.047     325547190   7.57    0.0669
lz4 -9                74.67      434543206   5.67    0.0759
zstd -11             101.278     281894351   8.74    0.0863
gzip                  57.283     338289587   7.28    0.1271
zstd -9               63.019     282940675   8.70    0.1381
gzip -1               22.682     400956710   6.14    0.2708
zstd                  12.52      321229917   7.67    0.6124
lz4 -1                5.762      549838913   4.48    0.7774
lz4                   5.744      549838913   4.48    0.7798
zstd -1               8.812      317234226   7.76    0.8811
none/cat              0.077     2462955520   1.00   12.9870 (nonsensical)

Conclusion

lz4 is the fastest compressor... but zstd -1 still kicks butt While it doesn't score well in the overalls core, bzip3 still provides excellent compression in a reasonable amount of time.

Raw output

tmp $ lscpu |grep i5
Model name:                           Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
tmp $ time cat < corpus.tar |wc -c
2462955520

real    0m1.388s
user    0m0.077s
sys 0m1.497s
tmp $ time gzip < corpus.tar |wc -c
338289587

real    0m57.971s
user    0m57.283s
sys 0m0.633s
tmp $ time bzip2 < corpus.tar |wc -c
262857414

real    5m21.280s
user    5m19.847s
sys 0m1.192s
tmp $ time bzip3 < corpus.tar |wc -c
231173201

real    3m26.608s
user    3m25.822s
sys 0m0.712s
tmp $ time zstd < corpus.tar |wc -c
321229917

real    0m11.717s
user    0m12.520s
sys 0m1.278s
tmp $ time xz < corpus.tar |wc -c
228082956

real    6m15.579s
user    24m36.153s
sys 0m1.481s
tmp $ time lz4 < corpus.tar |wc -c
549838913

real    0m2.190s
user    0m5.744s
sys 0m0.833s
tmp $ time lz4 -9 < corpus.tar |wc -c
434543206

real    0m25.151s
user    1m14.670s
sys 0m0.869s
tmp $ time zstd -9 < corpus.tar |wc -c
282940675

real    1m2.564s
user    1m3.019s
sys 0m1.351s
tmp $ time zstd -11 < corpus.tar |wc -c
281894351

real    1m40.556s
user    1m41.278s
sys 0m1.292s
tmp $ time zstd --ultra -22 < corpus.tar |wc -c
230075751

real    122m1.384s
user    121m57.944s
sys 0m2.642s
tmp $ time xz -9e < corpus.tar |wc -c
212748984

real    78m3.870s
user    78m3.144s
sys 0m1.345s
tmp $ 
tmp $ time xz -1 < corpus.tar |wc -c
290137816

real    0m50.878s
user    3m21.569s
sys 0m1.083s
tmp $ time zstd -1 < corpus.tar |wc -c
317234226

real    0m8.282s
user    0m8.812s
sys 0m1.162s
tmp $ time gzip -1 < corpus.tar |wc -c
400956710

real    0m23.496s
user    0m22.682s
sys 0m0.721s
tmp $ time gzip -9 < corpus.tar |wc -c
325547190

real    1m55.453s
user    1m53.047s
sys 0m0.730s
tmp $ time bzip2 -1 < corpus.tar |wc -c
278217711

real    4m16.753s
user    4m15.654s
sys 0m1.376s
tmp $ time bzip2 -9 < corpus.tar |wc -c
262857414

real    5m27.726s
user    5m26.718s
sys 0m1.157s
tmp $ time lz4 -1 < corpus.tar |wc -c
549838913

real    0m2.212s
user    0m5.762s
sys 0m0.832s

100 Days to Offload 2025 - Day 32