Here is the comparative numbers reported by Arch devs on which they based their decision to use this fast but resource hungry compression tool. XZ still wins in size, loses on time, while ZSTD is a huge loser in memory use while compressing; decompressing is comparable and equally fast. Zstd (gang) software also relies heavily on very current powerful server grade machines to provide the benefit of speed, to make up what it lacks in quality. Compression software should primarily be judged on their ability to compress, and zstd fails miserably against this 45 year old trusty switchblade called xz. So we can conclude that arch has an abundance of computing/building/packaging apparatus, with truck loads of spare ram to parallely process many packages.
My article (a link to it) was removed from r/linux yesterday for no good reason, 100% linux related material, and as I complained I was permanently banned from posting there.
https://www.reddit.com/r/linux/comments/ejn5c5/arch_2020_welcomes_its_little_brothers_and/
In case you are wondering I was reporting that arch nearly silently started using this facebook compression algorithm on packaging and here is their own test data to support this decision:
https://lists.archlinux.org/pipermail/arch-dev-public/2019-March/029520.html
A different set of tests about more compression utilities
From this article that goes into a more general (many compression tools compared) but in more in-depth comparison (not the ideal for arch’s use tests run above, striving to make zstd look good) we isolated two tables on xz and zstd.
The next test was with a much larger file/archive – this time using linux 5.1-rc5
Note where on column 4 the % of cpu utilized (on a 4th gen. i7 8 thread machine) that the speed is due to multithreading. So on a single or double core machine (single thread) the effects should be analogous to multiplying the speed by the inverse of your lack of threads. 2sec on an 8 core is 16sec on a single core, and the inverse for MB/s, 10MB/s is equal time to 80MB/s with multicore. So on a lesser machine than the testers don’t expect the speeds to be as spectacular. Why can’t conclude with zstd’s ram use deficiency would be on a single or double core machine.
About “free space to distribute software”
The mirror space and bandwidth to distribute those compressed packages are paid by others (us in most cases, public university servers). While arch-devs and their super machine builders are all relieved from the burden of packaging speed, and the tons of additional memory required to do compression, which if I can interpret it correctly it counterbalances the multi-threading abilities, the increase in size and bandwidth to distribute packages is falling on the users and their corresponding mirrors feeding them.
On the question why did both r/linux and r/archlinux blocked content on the xz/zstd change:
As a late announcement in archlinux.org news, 8 days after the shift took effect, and AFTER our articles and banned posts on r/linux and r/archlinux, they made the following statement to cover their “posteriors”.
Don’t make it personal to r/linux and r/archlinux moderators. This is the real reflection of the status of linux and its evolution. A year or two ago Google took NSA’s speck cryptography algorithm and pushed linux to adopt it. Linux did. And many distros left it enabled to be used by unsuspecting users. A popular outcry was met by a silent decision to dump it eventually, so whining and cursing eventually works, or in the case of Linus should I call it whistle-blowing? I think it was around 4.17-4.18 that Linux had included speck. Arch switched it off after several other distributions had already done so, but still included the code into the kernel.
So it is not linux, it is not r/linux, or Arch-Linux, it is a problematic decision making fashion across most of linux. What I find even more problematic is the passive audience “customers” who refrain from getting involved. They just care about their “free as in beer software” filling the empty cells of disk space on their pc. I would recommend that more people need to get involved and influence the decisions made and not allow Large Multinational Corporations to keep making all the decisions about their software and really corroding the nature of open and free software and the freedom of users/sysadmins to choose their tools.
Based on Fedora’s and Arch’s decision to switch package compression tools, without judgement and further research many more distributions will try to “catch up to the trend”. Those who are limited by economic realities and rely on cheaper older machines in network to do packaging, will soon find out the burdens of using such a tool as zstd, despite of our value judgement to reject it based on its origins, and not on performance.
Like your mommy told you when you were young, don’t accept candy from a stranger, or a needle from a cheap pusher! And facebook is and will always be a strnager to the real world of open and free software, not to say an offender of our intelligence to see it as a good willing contributor.
Enough? We will add more data and sources as they come up from friends and activists against the hydra of corporatism and domination.
Hello,
This is Dylan (KISS Linux). I’ve been reading your posts and I’m sorry to see what happened on Reddit. I’ve noticed this censorship trend for a long while now and I despise it really.
Don’t worry too much about it as I and others are reading your posts, learning from them and sharing them around. See: https://github.com/kisslinux/community/commit/cd29cbd27e34a767378c9585b8964760909afd48 (zstd: Drop from community)
I removed zstd from KISS (wasn’t used for anything other than btrfs-progs which I will also be removing).
What’s funny is that I was talking to someone just this week about the zstd issue and the jump towards the next “new shiny thing”.
Keep it up. 🙂
LikeLike
Hi Dylan, I am not writing on Kiss since you “are here” to write better than I could 🙂
HNYear
I have been doing some xz/zstd tests and when you use the simple -T (–threads) option not only does it kick butt, their claims about reproducing the same sums are crap. Only on single core does it give a different sum, on an 8 thread machine from compression 1 through 9 and from 2 to 8 threads I got 100% same sums across each compression degree. The level of compression is unbeatable by zstd. When speed has a speed advantage the compression ratio is mediocre.
So their reports ARE INTENTIONALLY skewed to justify the choice. To me this translates to motive. And the motive is to use users as guinea pigs to assist the development of facebook’s toy.
I used a 15MB archive for my tests, and such sizes are subsecond processes. Only on single thread tests with max compression I went to 5s.
What ya think?
Void’s xbps is capable of zstd much earlier than arch, they choose not to use it, but provide you the freedom, if you want to built and install your own from your repository.
I am banned from both r/linux r/archlinux since the 3rd of Jan 2020. On the 4th they placed an announcement on their webpage about utilizing zstd now. They have been shipping .zst pkgs since 12/27!!!
LikeLike
This news story solidifies everything you say…
https://www.phoronix.com/scan.php?page=news_item&px=OpenMandriva-Zstd-RPMs
…
I also read somewhere that Debian will wait 5 years. The GnuPg developers are the first project I have seen to say NO to zstd.
Dylan
LikeLike
from arsv via /r/initFreedom
Suggestion: instead of copying somebody’s data, write a script to benchmark them on some easily available files. It’s very easy to do, and you would avoid getting called out instantly by the first person who bothered to check.
From my experience, these times are representative. It’s about this kinda of difference for common package-related tasks. I’m not sure how Arch got the numbers they posted, no idea, their dataset is not really what most people care about anyway. Nonetheless, even for what I think are common use cases, the effect is there and it’s quite noticeable. Zstd trades a bit of compression, like 10% larger files, for something like 8x decompression speed-up over LZMA.
And I must point out that it’s not only that Zstd is fast, it’s also that LZMA is unusually slow.
I’ve been messing around with LZMA, and I will be again very soon specifically with package management applications in mind. It’s not a simple problem, it’s something that needs to be addressed properly. Just going around and denying Zstd exists will not get you anywhere. You’ll just piss people off and make them sneer every time the issue is brought up, making life very difficult for anyone who’d hopefully come up with an actual valid alternative to Zstd.
fungalnet
Why, don’t you trust the guy that published them?
Max compression for zstd is 19 for xz is 9, right?
% time zstd -19k texlive-core-2019.52579-1-any.pkg.tar
texlive-core-2019.52579-1-any.pkg.tar : 33.25% (438732800 => 145889607 bytes, texlive-core-2019.52579-1-any.pkg.tar.zst)
zstd -20k texlive-core-2019.52579-1-any.pkg.tar 128.26s user 0.21s system 100% cpu 2:08.37 total
% time xz -9kT8 texlive-core-2019.52579-1-any.pkg.tar
xz -9kT8 texlive-core-2019.52579-1-any.pkg.tar 140.21s user 1.04s system 208% cpu 1:07.70 total
140M Jan 8 21:23 texlive-core-2019.52579-1-any.pkg.tar.zst
134M Jan 8 21:23 texlive-core-2019.52579-1-any.pkg.tar.xz
419M Jan 8 21:23 texlive-core-2019.52579-1-any.pkg.tar
xz took half the time to compress and the end size was smaller by 4-5%
To decompress zstd wins:
% time xz -kd texlive-core-2019.52579-1-any.pkg.tar.xz
xz -kd texlive-core-2019.52579-1-any.pkg.tar.xz 6.68s user 0.23s system 99% cpu 6.959 total
% time zstd -kd texlive-core-2019.52579-1-any.pkg.tar.zst
zstd -kd texlive-core-2019.52579-1-any.pkg.tar.zst 0.38s user 0.16s system 99% cpu 0.538 total
Now, the average of total upgrades for a user daily is less than this, but let’s say it is as much as this. The difference is in decompression time, about 6.5s. To install the packages takes so much longer that the difference becomes negligible. The size to download and to store packages has increased by 5%. At 500KB/s this is a significant difference. Let’s say for a 150MB pkg like this the difference being 6MB that is 12s. So we have a 6.5s deficit and 5% more disk space needed over xz. (to keep pkgs in case you need to reinstall).
Now, you see the difference is in compression, not decompression (128s over 67s) . The user will never notice 5-10s delay per day on a daily upgrade. Since you asked me to produce numbers I am showing how Arch’s disk space can be cut by 5% and their compression time down to half. So why are we doing this again?
Both xz and zstd are Arch’s packages.
232K Nov 13 02:53 /var/cache/pacman/pkg/xz-5.2.4-2-x86_64.pkg.tar.xz
392K Nov 28 08:25 /var/cache/pacman/pkg/zstd-1.4.4-1-x86_64.pkg.tar.xz
Ohhh,… wait, xz itself. this 45year old algorithm is half as big as this 3yo zstd facebook marvel.
I am still going to question the motives. For modernization being the motive, the flying magnetic train is an improvement over the 1000s of years development of the wheel, but for some reason I still see many wheels around. I’d love to have a magnetic skateboard to go around town, but for now I keep my 30yo bicycle well lubed.
LikeLike