Unix Data Compression Shootout

Fri 23 May 2025

I wanted to try a new-to-me compressor, lz4, but it turned into a full ADHD-fueled file compression shoot-out:

Dang, lz4 is crazy fast!

Data/setup

The corpus is a 2.29 GiB uncompressed tar file consisting of several years worth of GPS data in various plain-text formats.
The computer is a Thinkpad x260 with the CPU governor set to performance. The CPU is an Intel i5-6200U

Outcome

Chart: (grouped by compressor)

command/compressor  time (user)       size  ratio
none/cat              0.077     2462955520 
gzip                 57.283      338289587   7.28
gzip -1              22.682      400956710   6.14
gzip -9             113.047      325547190   7.57
bzip2               319.847      262857414   9.37
bzip2 -1            255.654      278217711   8.85
bzip2 -9            326.718      262857414   9.37
bzip3               205.822      231173201  10.65
zstd                 12.520      321229917   7.67
zstd -1               8.812      317234226   7.76
zstd -9              63.019      282940675   8.70
zstd -11            101.278      281894351   8.74
zstd --ultra -22   7317.944      230075751  10.70
xz                 1476.153      228082956  10.80
xz -1               201.569      290137816   8.49
xz -9e             4683.144      212748984  11.58
lz4                   5.744      549838913   4.48
lz4 -1                5.762      549838913   4.48
lz4 -9               74.670      434543206   5.67

Sorted by size: (descending)

command/compressor  time (user)       size  ratio
none/cat              0.077     2462955520 
lz4                   5.744      549838913   4.48
lz4 -1                5.762      549838913   4.48
lz4 -9               74.670      434543206   5.67
gzip -1              22.682      400956710   6.14
gzip                 57.283      338289587   7.28
gzip -9             113.047      325547190   7.57
zstd                 12.520      321229917   7.67
zstd -1               8.812      317234226   7.76
xz -1               201.569      290137816   8.49
zstd -9              63.019      282940675   8.70
zstd -11            101.278      281894351   8.74
bzip2 -1            255.654      278217711   8.85
bzip2               319.847      262857414   9.37
bzip2 -9            326.718      262857414   9.37
bzip3               205.822      231173201  10.65
zstd --ultra -22   7317.944      230075751  10.70
xz                 1476.153      228082956  10.80
xz -9e             4683.144      212748984  11.58

Sorted by time: (ascending)

command/compressor  time (user)       size  ratio
none/cat              0.077     2462955520 
lz4                   5.744      549838913   4.48
lz4 -1                5.762      549838913   4.48
zstd -1               8.812      317234226   7.76
zstd                 12.520      321229917   7.67
gzip -1              22.682      400956710   6.14
gzip                 57.283      338289587   7.28
zstd -9              63.019      282940675   8.70
lz4 -9               74.670      434543206   5.67
zstd -11            101.278      281894351   8.74
gzip -9             113.047      325547190   7.57
xz -1               201.569      290137816   8.49
bzip3               205.822      231173201  10.65
bzip2 -1            255.654      278217711   8.85
bzip2               319.847      262857414   9.37
bzip2 -9            326.718      262857414   9.37
xz                 1476.153      228082956  10.80
xz -9e             4683.144      212748984  11.58
zstd --ultra -22   7317.944      230075751  10.70

Chart: (compression ratio / time score)

command/compressor  time (user) size        ratio   ratio/time
zstd --ultra -22    7317.944     230075751  10.70    0.0015
xz -9e              4683.144     212748984  11.58    0.0025
xz                  1476.153     228082956  10.80    0.0073
bzip2 -9             326.718     262857414   9.37    0.0287
bzip2                319.847     262857414   9.37    0.0293
bzip2 -1             255.654     278217711   8.85    0.0346
xz -1                201.569     290137816   8.49    0.0421
bzip3                205.822     231173201  10.65    0.0518
gzip -9              113.047     325547190   7.57    0.0669
lz4 -9                74.67      434543206   5.67    0.0759
zstd -11             101.278     281894351   8.74    0.0863
gzip                  57.283     338289587   7.28    0.1271
zstd -9               63.019     282940675   8.70    0.1381
gzip -1               22.682     400956710   6.14    0.2708
zstd                  12.52      321229917   7.67    0.6124
lz4 -1                5.762      549838913   4.48    0.7774
lz4                   5.744      549838913   4.48    0.7798
zstd -1               8.812      317234226   7.76    0.8811
none/cat              0.077     2462955520   1.00   12.9870 (nonsensical)

Conclusion

lz4 is the fastest compressor... but zstd -1 still kicks butt While it doesn't score well in the overalls core, bzip3 still provides excellent compression in a reasonable amount of time.

Raw output

tmp $ lscpu |grep i5
Model name:                           Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
tmp $ time cat < corpus.tar |wc -c
2462955520

real    0m1.388s
user    0m0.077s
sys 0m1.497s
tmp $ time gzip < corpus.tar |wc -c
338289587

real    0m57.971s
user    0m57.283s
sys 0m0.633s
tmp $ time bzip2 < corpus.tar |wc -c
262857414

real    5m21.280s
user    5m19.847s
sys 0m1.192s
tmp $ time bzip3 < corpus.tar |wc -c
231173201

real    3m26.608s
user    3m25.822s
sys 0m0.712s
tmp $ time zstd < corpus.tar |wc -c
321229917

real    0m11.717s
user    0m12.520s
sys 0m1.278s
tmp $ time xz < corpus.tar |wc -c
228082956

real    6m15.579s
user    24m36.153s
sys 0m1.481s
tmp $ time lz4 < corpus.tar |wc -c
549838913

real    0m2.190s
user    0m5.744s
sys 0m0.833s
tmp $ time lz4 -9 < corpus.tar |wc -c
434543206

real    0m25.151s
user    1m14.670s
sys 0m0.869s
tmp $ time zstd -9 < corpus.tar |wc -c
282940675

real    1m2.564s
user    1m3.019s
sys 0m1.351s
tmp $ time zstd -11 < corpus.tar |wc -c
281894351

real    1m40.556s
user    1m41.278s
sys 0m1.292s
tmp $ time zstd --ultra -22 < corpus.tar |wc -c
230075751

real    122m1.384s
user    121m57.944s
sys 0m2.642s
tmp $ time xz -9e < corpus.tar |wc -c
212748984

real    78m3.870s
user    78m3.144s
sys 0m1.345s
tmp $ 
tmp $ time xz -1 < corpus.tar |wc -c
290137816

real    0m50.878s
user    3m21.569s
sys 0m1.083s
tmp $ time zstd -1 < corpus.tar |wc -c
317234226

real    0m8.282s
user    0m8.812s
sys 0m1.162s
tmp $ time gzip -1 < corpus.tar |wc -c
400956710

real    0m23.496s
user    0m22.682s
sys 0m0.721s
tmp $ time gzip -9 < corpus.tar |wc -c
325547190

real    1m55.453s
user    1m53.047s
sys 0m0.730s
tmp $ time bzip2 -1 < corpus.tar |wc -c
278217711

real    4m16.753s
user    4m15.654s
sys 0m1.376s
tmp $ time bzip2 -9 < corpus.tar |wc -c
262857414

real    5m27.726s
user    5m26.718s
sys 0m1.157s
tmp $ time lz4 -1 < corpus.tar |wc -c
549838913

real    0m2.212s
user    0m5.762s
sys 0m0.832s

100 Days to Offload 2025 - Day 32

Category: Tech Tagged: 100DaysToOffload ADHD BSD Computing FOSS (Free and Open Source Software) Linux Non-religious post Productivity UNIX Unix Tips


Gathering Hashtags from the Fediverse

Thu 08 May 2025

Background

One minor foible of the fediverse instance I'm on is that searching (for accounts or hashtags) can be quite slow. As a workaround for now, I've saved a list of accounts I've followed for easy reference, but I also wanted some way of saving a list of hashtags to …

Category: Tech Tagged: 100DaysToOffload Computing Content Warning Federated Services FOSS (Free and Open Source Software) Linux Non-religious post Productivity Social Media

Read More

A Toast to the Prolific ones

Tue 06 May 2025

A screencap of the "toast" scene from The Wolf of Wall Street

I wanted to take some time out today to acknowledge some folks on the fediverse that are remarkably prolific, just for fun.

Prolific blogger — Rubenerd

Oh holy moly. This fella has words. Lots of words. Many very fine words. Just look at bro's output for 2024:

~ $ curl -s https://rubenerd …

Category: Life Tagged: 100DaysToOffload Beauty Computing Entertainment Federated Services FOSS (Free and Open Source Software) Hobbies Life Non-religious post Non-technical post Productivity Social Media Writing

Read More

Why I love Markdown

Fri 11 April 2025

...
...
...
Because it's cool! But first, a brief history of writing in the digital age!

Some History, or: I have ADHD and we're all aboard the unnecessary detail traaaaaainnnn!......

The very first computer I had at home was an Apple ][+ that my mom rented for a computer class in university. The …

Category: Tech Tagged: 100DaysToOffload ADHD Computing FOSS (Free and Open Source Software) Hobbies Humor Linux Non-religious post Philosophy Productivity Retrocomputing UNIX

Read More

The Case for AVIF

Thu 13 February 2025
Disclaimer: Right off the bat, I am NOT an imaging expert, a compression expert, or really an anything expert. I'm just speaking from my own experiences as a relatively ordinary user of image formats.

When I was a young-un in the early, early 90s, PICT was the day-to-day image format …

Category: Tech Tagged: 100DaysToOffload Computing Federated Services FOSS (Free and Open Source Software) Linux Non-religious post Philosophy Productivity Retrocomputing Social Media

Read More

"Online" documentation should be offline

Thu 30 January 2025

I'm noticing a troubling trend among FOSS projects, even terminal-only utilities: no manpages (or a 1-paragraph useless one), barely any help screens, and a link to a wiki site like a github page or "readthedocs."

The thing is, the whole ethos behind so many terminal utilities is a hearkening back …

Category: Tech Tagged: 100DaysToOffload BSD Computing Ethics FOSS (Free and Open Source Software) FreeBSD Linux Non-religious post Polemic Productivity UNIX

Read More

You don't need to share that clip

Thu 09 January 2025

I recall once having a discussion with some friends online about which chat client/network to use for communication, and someone strongly endorsed one over another because it had a better selection of animated reactions/videos. I was pretty incredulous that that was the most important thing to them.

Imagine …

Category: Tech Tagged: 100DaysToOffload Computing Entertainment Humor Non-religious post Non-technical post Philosophy Productivity

Read More
Page 1 of 4

Next »