I wanted to try a new-to-me compressor, lz4, but it turned into a full ADHD-fueled file compression shoot-out:

Dang, lz4 is crazy fast!

Data/setup

The corpus is a 2.29 GiB uncompressed tar file consisting of several years worth of GPS data in various plain-text formats.
The computer is a Thinkpad x260 with the CPU governor set to performance. The CPU is an Intel i5-6200U

Outcome

Chart: (grouped by compressor)

command/compressor  time (user)       size  ratio
none/cat              0.077     2462955520 
gzip                 57.283      338289587   7.28
gzip -1              22.682      400956710   6.14
gzip -9             113.047      325547190   7.57
bzip2               319.847      262857414   9.37
bzip2 -1            255.654      278217711   8.85
bzip2 -9            326.718      262857414   9.37
bzip3               205.822      231173201  10.65
zstd                 12.520      321229917   7.67
zstd -1               8.812      317234226   7.76
zstd -9              63.019      282940675   8.70
zstd -11            101.278      281894351   8.74
zstd --ultra -22   7317.944      230075751  10.70
xz                 1476.153      228082956  10.80
xz -1               201.569      290137816   8.49
xz -9e             4683.144      212748984  11.58
lz4                   5.744      549838913   4.48
lz4 -1                5.762      549838913   4.48
lz4 -9               74.670      434543206   5.67

Sorted by size: (descending)

command/compressor  time (user)       size  ratio
none/cat              0.077     2462955520 
lz4                   5.744      549838913   4.48
lz4 -1                5.762      549838913   4.48
lz4 -9               74.670      434543206   5.67
gzip -1              22.682      400956710   6.14
gzip                 57.283      338289587   7.28
gzip -9             113.047      325547190   7.57
zstd                 12.520      321229917   7.67
zstd -1               8.812      317234226   7.76
xz -1               201.569      290137816   8.49
zstd -9              63.019      282940675   8.70
zstd -11            101.278      281894351   8.74
bzip2 -1            255.654      278217711   8.85
bzip2               319.847      262857414   9.37
bzip2 -9            326.718      262857414   9.37
bzip3               205.822      231173201  10.65
zstd --ultra -22   7317.944      230075751  10.70
xz                 1476.153      228082956  10.80
xz -9e             4683.144      212748984  11.58

Sorted by time: (ascending)

command/compressor  time (user)       size  ratio
none/cat              0.077     2462955520 
lz4                   5.744      549838913   4.48
lz4 -1                5.762      549838913   4.48
zstd -1               8.812      317234226   7.76
zstd                 12.520      321229917   7.67
gzip -1              22.682      400956710   6.14
gzip                 57.283      338289587   7.28
zstd -9              63.019      282940675   8.70
lz4 -9               74.670      434543206   5.67
zstd -11            101.278      281894351   8.74
gzip -9             113.047      325547190   7.57
xz -1               201.569      290137816   8.49
bzip3               205.822      231173201  10.65
bzip2 -1            255.654      278217711   8.85
bzip2               319.847      262857414   9.37
bzip2 -9            326.718      262857414   9.37
xz                 1476.153      228082956  10.80
xz -9e             4683.144      212748984  11.58
zstd --ultra -22   7317.944      230075751  10.70

Chart: (compression ratio / time score)

command/compressor  time (user) size        ratio   ratio/time
zstd --ultra -22    7317.944     230075751  10.70    0.0015
xz -9e              4683.144     212748984  11.58    0.0025
xz                  1476.153     228082956  10.80    0.0073
bzip2 -9             326.718     262857414   9.37    0.0287
bzip2                319.847     262857414   9.37    0.0293
bzip2 -1             255.654     278217711   8.85    0.0346
xz -1                201.569     290137816   8.49    0.0421
bzip3                205.822     231173201  10.65    0.0518
gzip -9              113.047     325547190   7.57    0.0669
lz4 -9                74.67      434543206   5.67    0.0759
zstd -11             101.278     281894351   8.74    0.0863
gzip                  57.283     338289587   7.28    0.1271
zstd -9               63.019     282940675   8.70    0.1381
gzip -1               22.682     400956710   6.14    0.2708
zstd                  12.52      321229917   7.67    0.6124
lz4 -1                5.762      549838913   4.48    0.7774
lz4                   5.744      549838913   4.48    0.7798
zstd -1               8.812      317234226   7.76    0.8811
none/cat              0.077     2462955520   1.00   12.9870 (nonsensical)

Conclusion

lz4 is the fastest compressor... but zstd -1 still kicks butt While it doesn't score well in the overalls core, bzip3 still provides excellent compression in a reasonable amount of time.

Raw output

tmp $ lscpu |grep i5
Model name:                           Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
tmp $ time cat < corpus.tar |wc -c
2462955520

real    0m1.388s
user    0m0.077s
sys 0m1.497s
tmp $ time gzip < corpus.tar |wc -c
338289587

real    0m57.971s
user    0m57.283s
sys 0m0.633s
tmp $ time bzip2 < corpus.tar |wc -c
262857414

real    5m21.280s
user    5m19.847s
sys 0m1.192s
tmp $ time bzip3 < corpus.tar |wc -c
231173201

real    3m26.608s
user    3m25.822s
sys 0m0.712s
tmp $ time zstd < corpus.tar |wc -c
321229917

real    0m11.717s
user    0m12.520s
sys 0m1.278s
tmp $ time xz < corpus.tar |wc -c
228082956

real    6m15.579s
user    24m36.153s
sys 0m1.481s
tmp $ time lz4 < corpus.tar |wc -c
549838913

real    0m2.190s
user    0m5.744s
sys 0m0.833s
tmp $ time lz4 -9 < corpus.tar |wc -c
434543206

real    0m25.151s
user    1m14.670s
sys 0m0.869s
tmp $ time zstd -9 < corpus.tar |wc -c
282940675

real    1m2.564s
user    1m3.019s
sys 0m1.351s
tmp $ time zstd -11 < corpus.tar |wc -c
281894351

real    1m40.556s
user    1m41.278s
sys 0m1.292s
tmp $ time zstd --ultra -22 < corpus.tar |wc -c
230075751

real    122m1.384s
user    121m57.944s
sys 0m2.642s
tmp $ time xz -9e < corpus.tar |wc -c
212748984

real    78m3.870s
user    78m3.144s
sys 0m1.345s
tmp $ 
tmp $ time xz -1 < corpus.tar |wc -c
290137816

real    0m50.878s
user    3m21.569s
sys 0m1.083s
tmp $ time zstd -1 < corpus.tar |wc -c
317234226

real    0m8.282s
user    0m8.812s
sys 0m1.162s
tmp $ time gzip -1 < corpus.tar |wc -c
400956710

real    0m23.496s
user    0m22.682s
sys 0m0.721s
tmp $ time gzip -9 < corpus.tar |wc -c
325547190

real    1m55.453s
user    1m53.047s
sys 0m0.730s
tmp $ time bzip2 -1 < corpus.tar |wc -c
278217711

real    4m16.753s
user    4m15.654s
sys 0m1.376s
tmp $ time bzip2 -9 < corpus.tar |wc -c
262857414

real    5m27.726s
user    5m26.718s
sys 0m1.157s
tmp $ time lz4 -1 < corpus.tar |wc -c
549838913

real    0m2.212s
user    0m5.762s
sys 0m0.832s

100 Days to Offload 2025 - Day 32


Gathering Hashtags from the Fediverse

Thu 08 May 2025 by R.L. Dane

Background

One minor foible of the fediverse instance I'm on is that searching (for accounts or hashtags) can be quite slow. As a workaround for now, I've saved a list of accounts I've followed for easy reference, but I also wanted some way of saving a list of hashtags to …

read more

Device mini-review: One by Wacom

Wed 07 May 2025 by R.L. Dane

Partially for the sake of my daily doodles, which I've been posting to the Fediverse, and also because I've been learning the Persian alphabet, I recently purchased a small USB pen digitizer, the "One by Wacom," by Wacom (great branding 😄).

For those of you not familiar with digitizers, think of …

read more

Used laptops are wonderful... except when you need batteries

Sun 04 May 2025 by R.L. Dane

When I was a kid, a new computer cost the equivalent of $3,000 in today's money, and a five year old computer was basically a dinosaur.

Nowadays you can get a brand new computer for $200 or less, and a ten-year-old computer can still be a viable daily-driver. You …

read more

Why I love Markdown

Fri 11 April 2025 by R.L. Dane

...
...
...
Because it's cool! But first, a brief history of writing in the digital age!

Some History, or: I have ADHD and we're all aboard the unnecessary detail traaaaaainnnn!......

The very first computer I had at home was an Apple ][+ that my mom rented for a computer class in university. The …

read more

Blog Questions Challenge: Technology Edition

Thu 10 April 2025 by R.L. Dane

I saw this update to the "Blog Questions Challenge" format on jnv's blog, and without reading* it first (in order to not taint my own answers — I will definitely go back and read his), I thought I'd write down some of my own thoughts, for fun.

*I used curl …

read more

I miss the days of ubiquitous portable data storage

Wed 09 April 2025 by R.L. Dane

I was reading Clayton's blog post about his habit of buying larger and larger USB thumb drives back in the day (when they were getting rapidly larger every few months), and it reminded me of a wish I had in the 2000s that got only partially fulfilled.

As anyone who …

read more

The Case for AVIF

Thu 13 February 2025 by R.L. Dane
Disclaimer: Right off the bat, I am NOT an imaging expert, a compression expert, or really an anything expert. I'm just speaking from my own experiences as a relatively ordinary user of image formats.

When I was a young-un in the early, early 90s, PICT was the day-to-day image format …

read more

I kinda hate "apps"

Tue 11 February 2025 by R.L. Dane

I'm talking mostly about terminology, but of course, when in doubt, "I don't want your dumb app."

So, as I briefly alluded to in yesterdays blog post, most mobile apps are pretty terrible. They're generally chock full of spyware (trackers), and are inherently user-hostile.

But I want to talk about …

read more

How I Mitigate Horrible "Apps"

Mon 10 February 2025 by R.L. Dane

This morning, I enjoyed a blog post by James Ashford about whether or not it's good to use a bible app.

I commented on a tangential issue: the fact that bible apps have so many trackers!

I mentioned that I do use YouVersion, even though it has many trackers (including …

read more