Turning an email from Alberto to a bug for better tracking: > For some time, Peter, Richard and I have been looking into > improving the dnf workflow wrt the amount of network bandwidth > and CPU that it takes to update the cache. > > We come at it from different perspectives: > > - UX degradation for people coming from debian/ubuntu/apt > - Network efficiency towards our mirrors > - CPU usage while parsing the XML blob is extremely slow (i.e. minutes) > in RaspberryPi and slow ARM devices > etc... > > We've been looking into the possibility of preloading the libsolv file > into the repository and the repomd.xml and then have dnf just download that.
This is not the first time I hear about an idea of using solv based repodata instead of XML. DNF team has several concerns: Network efficiency ------------------ The benefit is not clear. Solv files seem to be even bigger than XMLs. $ du -shc /var/cache/dnf/fedora-f21308f6293b3270/repodata/* 45M ...filelists.xml.gz 4.0K ...repomd.xml 440K ...comps-Everything.x86_64.xml.gz 16M ...primary.xml.gz 61M total $ du -shc /var/cache/dnf/fedora*solv* 47M /var/cache/dnf/fedora-filenames.solvx 21M /var/cache/dnf/fedora.solv 67M total CPU usage on slow devices ------------------------- Parsing XMLs is really slow here. We take a look if we couldn't improve any bottleneck to speed it up. Format stability ---------------- I consider solv files as a cache. Not sure how much it's stable and arch independent. We definitely need to understand more details before switching from XML to solv.
(In reply to Daniel Mach from comment #1) > This is not the first time I hear about an idea of using solv based repodata > instead of XML. > > DNF team has several concerns: > > Network efficiency > ------------------ > The benefit is not clear. > Solv files seem to be even bigger than XMLs. > > $ du -shc /var/cache/dnf/fedora-f21308f6293b3270/repodata/* > 45M ...filelists.xml.gz > 4.0K ...repomd.xml > 440K ...comps-Everything.x86_64.xml.gz > 16M ...primary.xml.gz > 61M total > > $ du -shc /var/cache/dnf/fedora*solv* > 47M /var/cache/dnf/fedora-filenames.solvx > 21M /var/cache/dnf/fedora.solv > 67M total That's an unfair comparison, those xml files are compressed, with gzip this is the result: 36M fedora-filenames.solvx.gz 12M fedora.solv.gz And with xz: 30M fedora-filenames.solvx.xz 9.6M fedora.solv.xz Though it is true that: a) This is still same ballpark b) Not all the data in the xml files is in the solv files So perhaps not the strongest point indeed. > CPU usage on slow devices > ------------------------- > Parsing XMLs is really slow here. > We take a look if we couldn't improve any bottleneck to speed it up. I've done measurements with libxml2 and expat, both are around as slow as each other, I see hardly no difference in cpu time on each of them. I think we won't single handedly make libexpat or libxml2 any faster. > Format stability > ---------------- > I consider solv files as a cache. > Not sure how much it's stable and arch independent. > We definitely need to understand more details before switching from XML to > solv. You have one repo per architecture, so cross platformness is not really that important, take into account that you can't even cross bootstrap across arches since all the post/pre rpm scripts will not work. There might be some cases dealing with just downloading rpms where this might be necessary, but I'd say in those cases you can still use the xml data. I think we need to take into account that we should try our best to optimize the most common case here, dnf is a rather common operation on each machine and I don't think compromising everybody's most common operation because of the cross platform repo introspection usecase. As per stability, I think that is a fair point and one we should clarify with libsolv upstream.
> > Parsing XMLs is really slow here. > > We take a look if we couldn't improve any bottleneck to speed it up. > > I've done measurements with libxml2 and expat, both are around as slow as > each other, I see hardly no difference in cpu time on each of them. > > I think we won't single handedly make libexpat or libxml2 any faster. This is a big win in cloud images, containers as well as ARM SBCs > > Format stability > > ---------------- > > I consider solv files as a cache. > > Not sure how much it's stable and arch independent. > > We definitely need to understand more details before switching from XML to > > solv. > > You have one repo per architecture, so cross platformness is not really that > important, take into account that you can't even cross bootstrap across Not entirely true, in most cases on the generation side this is currently done architecture agnostically. So would be a change/regression. That said all the information atm is shoved into the xml as well as the sqlite DB without issue so there's no reason the libsolv DB should be any different. > I think we need to take into account that we should try our best to optimize > the most common case here, dnf is a rather common operation on each machine > and I don't think compromising everybody's most common operation because of > the cross platform repo introspection usecase. Yes, generating this once per repo on "server side" would be a huge win in constrained environments and even without much of a win in terms of size it would overall a huge win for end users.
BTW not sure if this shouldn't be a RFE/bug against createrepo_c with a matching client side one for dnf to check/retrieve the libsolv DB before opting for the xml.
Main problem here is that distributing binary files which do not have stable interface is risky. Moreover, there might be some vulnerabilities in solv parser (since it segfaults when reading fails).
(In reply to Igor Gnatenko from comment #5) > Main problem here is that distributing binary files which do not have stable > interface is risky. Moreover, there might be some vulnerabilities in solv > parser (since it segfaults when reading fails). I wouldn't have thought the risk is any worse in terms of vulnerabilities whether it's processed client or server side, the consequences should be the same and ultimately the solv parser should be fixed.
(In reply to Peter Robinson from comment #6) > (In reply to Igor Gnatenko from comment #5) > > Main problem here is that distributing binary files which do not have stable > > interface is risky. Moreover, there might be some vulnerabilities in solv > > parser (since it segfaults when reading fails). > > I wouldn't have thought the risk is any worse in terms of vulnerabilities > whether it's processed client or server side, the consequences should be the > same and ultimately the solv parser should be fixed. Agreed, I think the one risk we do have is on-disk format stability across releases. Who's a good contact from libsolv upstream to ask about these stability guarantees?
Libsolv upstream is: https://github.com/openSUSE/libsolv Michael Schroeder <mls> I've verified that the problem truly is with the CPU performance. My educated guess is that XML parsing isn't most likely the bottleneck. I think it's indexing data from XML into solv file. Once DNF team has enough free capacity, we can make a more detailed analysis and eventually improve the performance. I don't think this will happen sooner than in October because Modularity and YUM3 compatibility have top priority and keep us busy.
Created attachment 1448388 [details] callgrind on dnf makecache for reference
This bug appears to have been reported against 'rawhide' during the Fedora 29 development cycle. Changing version to '29'.
I agree with Igor's explanation in comment#5. DNF team has no plans to support distributing the solv files at the moment.