Red Hat Bugzilla – Bug 864516
use xz compression by default
Last modified: 2014-01-21 18:24:14 EST
Description of problem:
A lot of Fedora users complain that yum is very slow. A large part of this perceived "slowness" is the time to refresh repository metadata. In order to improve the issue, we should try to make the metadata as small as possible.
createrepo can already create repository metadata using xz compression, but it is not the default. I have made some measurements. This is the size of Fedora 17 master repository metadata:
> 1.9M ./orig/42ec1be1745e71753d57892fefff57fc16d98541662637e1a8f2c0b2f18870f6-comps-f17.xml
> 8.4M ./orig/53f38b1595ef2c93061264cbda1c6015873425768fbb51f84a306b1704409d51-other.xml.gz
> 508K ./orig/63d1bddad9470b822b2a9873cb9a047bf4f2a114222d2d374e10860601b9fc6d-prestodelta.xml.gz
> 9.2M ./orig/7009de56f1a1c399930fa72094a310a40d38153c96d0b5af443914d3d6a7d811-primary.xml.gz
> 7.8M ./orig/a65f7d7fe900ba34acc7cc7c45e728b6c9af5ce5cf0562a0ecc8fafd56d33ef9-other.sqlite.bz2
> 21M ./orig/a9c70025ee9577048b4aea69e1d10cede198546ab342bd508edb6e995d5908d4-filelists.xml.gz
> 436K ./orig/cd6b943c066d5eae4c407ca104128f3dc46ebeb5017f65a47709f299871de21d-comps-f17.xml.gz
> 22M ./orig/ddcb2f6c2ba6ca8e6f47f4a7df96bb920318bdf9991466a12923da38c04966bf-filelists.sqlite.bz2
> 15M ./orig/eda1f9b2d7da63ef28865a5d3d3c9ec8de8f10f8c101f07fea4fb8835c94c514-primary.sqlite.bz2
> 8.0K ./orig/repomd.xml
> 85M ./orig
I have recompressed all gz/bz2 files with "xz --best", here are the results:
> 1.9M ./new/42ec1be1745e71753d57892fefff57fc16d98541662637e1a8f2c0b2f18870f6-comps-f17.xml
> 2.9M ./new/53f38b1595ef2c93061264cbda1c6015873425768fbb51f84a306b1704409d51-other.xml.xz
> 376K ./new/63d1bddad9470b822b2a9873cb9a047bf4f2a114222d2d374e10860601b9fc6d-prestodelta.xml.xz
> 5.3M ./new/7009de56f1a1c399930fa72094a310a40d38153c96d0b5af443914d3d6a7d811-primary.xml.xz
> 5.0M ./new/a65f7d7fe900ba34acc7cc7c45e728b6c9af5ce5cf0562a0ecc8fafd56d33ef9-other.sqlite.xz
> 15M ./new/a9c70025ee9577048b4aea69e1d10cede198546ab342bd508edb6e995d5908d4-filelists.xml.xz
> 268K ./new/cd6b943c066d5eae4c407ca104128f3dc46ebeb5017f65a47709f299871de21d-comps-f17.xml.xz
> 17M ./new/ddcb2f6c2ba6ca8e6f47f4a7df96bb920318bdf9991466a12923da38c04966bf-filelists.sqlite.xz
> 11M ./new/eda1f9b2d7da63ef28865a5d3d3c9ec8de8f10f8c101f07fea4fb8835c94c514-primary.sqlite.xz
> 8.0K ./new/repomd.xml
> 58M ./new
The total repository size went from 85 MB to 58 MB (that's 68% of the original size). We can save 32% of data just by changing the compression algorithm, that's amazing.
I propose to change the default compression algorithm in createrepo from gz/bz2 to xz (using --best variant).
I talked to Zdenek Pavlas (CC'd), yum developer, and he said yum supported .xz files since 2010, so there should be no problems switching to it.
There is one drawback, xz compression is much slower than gz/bz2. It also doesn't support multi-threading at the moment, so multi-cores are not leveraged. Still, repository metadata are created just once on the server, but they are downloaded by thousands of users afterwards. It makes sense to "waste" a few more minutes on the server to decrease the bandwidth and waiting time for all Fedora users out there.
Version-Release number of selected component (if applicable):
I'm quite sure this is a duplicate of bug #700020
Fedora Bugzappers volunteer triage team
Yes, marking as such.
*** This bug has been marked as a duplicate of bug 700020 ***