Metadata is too big, I don't know why is it so bloated, but IMO using xz instead of gz will decrease the size of the metadata. To demonstrate my point: I downloaded the biggest metadata file from http://mirror.isoc.org.il/pub/fedora/updates/14/x86_64/repodata/?C=S;O=D (fedora 14 updates, x86_64) which is (some long hash)-filelists.sqlite.bz2 I extracted the sqlite file and compressed it with xz -9ezk [elad@elephant 1]$ ls -hs total 60M 46M 39da471ad24f9173ac063f73aac90c4353df23234624c02f559a39a54795ff5f-filelists.sqlite 8.6M 39da471ad24f9173ac063f73aac90c4353df23234624c02f559a39a54795ff5f-filelists.sqlite.bz2 5.8M 39da471ad24f9173ac063f73aac90c4353df23234624c02f559a39a54795ff5f-filelists.sqlite.xz See the differences, 5.8M instead of 8.6M! And to prove I'm not cheating, with the default option 6.7M 39da471ad24f9173ac063f73aac90c4353df23234624c02f559a39a54795ff5f-filelists.sqlite.xz Still, a huge improvement. Benefits to fedora: 1. Less space will be needed for metadata on mirrors 2. Less bandwidth will be needed for mirrors User experience: 1. Updates/searches/installations/removals in packagekit or yum will start faster and get to the dep-solving stage quicker because the metadata will be downloaded faster. 2. Less bandwidth will be needed for those package operations, so users who pay per data amount or users with slow connection will benefit from this. Support for xz compression for metadata needs to be in yum and createrepo for at least one release before using this compression method in fedora repositories.
Assigning to yum, as it's really their call (and the createrepo code is owned by the same maintainers.) This will presumably bring in another dependency (pyliblzma), and it won't make the repodata stored *in repos* any smaller, as the bz2 will have to be kept for backwards compat.
(In reply to comment #1) > and it won't make the repodata stored *in repos* any smaller, as > the bz2 will have to be kept for backwards compat. For how long? AFAIK we don't have to leave it there, and we could just modify preupgrade to parse it, so upgrades would work ("yum upgrades" wouldn't, but it's not officialy supported anyway). This is the kind of change you introduce to the repos in a new release, it's not something going to get in F15 updates or something. The support itself can get in as an update, but not used by the repositories, until at least f16.
*** Bug 864516 has been marked as a duplicate of this bug. ***
Please read bug 864516, there are some additional information (like there should not be any changes required for yum). Changing component to createrepo, as Zdenek Pavlas advised me before.
I do remember discussing this with some yum developers in #fedora-devel shortly after I filed this bug. If I recall correctly, they said yum supports xz compressed RPMs because librpm supports them, but the metadata is handled by yum itself, and the current Python libraries for handling xz decompression and compression are not stable enough. Handling xz compression and decompression was added as a feature to the python standard library in Python3, so a prerequisite to fixing this bug would be either finding a stable enough xz (de)compression library or porting yum to Python3 -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
As I wrote on IRC, Yum handles .gz .bz2 and .xz compressed metadata files uniformly. .xz support has been added 2+ years ago. The only requirement is that pyliblzma package (about 45k) has to be installed. It isn't by default (sorry, found out this just now). If pyliblzma isn't stable enough (do you have any references?), and there's a demand, I can change the decompression code to exec the 'xz' binary directly, it seems to be simple.
My only reference is a vague recollection of an IRC conversation with a yum developer whose name I don't recall more than a year ago, so you shouldn't take that seriously. I can't think of any major reason why we shouldn't implement this, and it would be a great improvement, especially for people with slow internet connection. As for multi-threading, it's true that you can't compress or decompress a single xz archive using more than one thread, but since we have a lot of metadata spread across numerous files, you could decompress each file in it's own thread thus utilizing multi-core CPUs which are common enough this day. Does this sound reasonable? -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
(In reply to comment #7) > As for multi-threading, it's true that you can't compress or decompress a > single xz archive using more than one thread, but since we have a lot of > metadata spread across numerous files, you could decompress each file in > it's own thread thus utilizing multi-core CPUs which are common enough this > day. That would be a fine improvement, but the basic single-threaded implementation is totally OK at this point, I believe. gzip doesn't support multi-threading, too. The best way to handle this would be inside xz itself (7z can do multi-threading, why not xz?). But let's discuss this once we have xz support up'n'running.
According to everyone in #fedora-devel, we already support this in yum and createrepo, so we should file a ticket with rel-eng. Closing this as NOTABUG -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
Elad, can the discussion be pasted here? Also, I know createrepo already supports xz, but my intention was the change the _default_. If we change the default, then it is completely transparent for Fedora infrastructure and they don't have to adjust anything. Also, Fedora emphasizes using upstream defaults. I believe we should try discussing the defaults first, and try to convince Fedora Infra to switch only if we fail. Reopening the bug. Maybe upstream issue tracker would be better for this request?
Elad created the rel-eng ticket here: https://fedorahosted.org/rel-eng/ticket/5362
James, we need your response. The upstream trac is full of spam and apparently no one uses it (really guys, you should disable it and redirect to RH Bugzilla, at least). I consider this bug an upstream bug then. James, what are the chances of changing the default to xz, as RelEng guys would like to (less work for them, see the ticket from comment 11)?
So ... I believe we could change it in Fedora 18, and everything would work. However, I'm also sure that it'd break RHEL-6 ... and people still create repos. for RHEL-6 on latest Fedora by doing "createrepo my-dir-of-rpms". The failure mode kind of sucks too. After RHEL-7 goes GA it might be worth making "xz" the default upstream/rawhide ... but it seems like just tweaking what mash does should be much easier for the next year or so.