Bug 1579927 - Distribute solv files as part of repodata
Summary: Distribute solv files as part of repodata
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: libdnf
Version: 29
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
Assignee: rpm-software-management
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-18 17:25 UTC by Daniel Mach
Modified: 2019-03-06 18:22 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-03-02 13:52:38 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
callgrind on dnf makecache for reference (1.60 MB, text/plain)
2018-06-06 17:27 UTC, Alberto Ruiz
no flags Details

Description Daniel Mach 2018-05-18 17:25:06 UTC
Turning an email from Alberto to a bug for better tracking:

> For some time, Peter, Richard and I have been looking into
> improving the dnf workflow wrt the amount of network bandwidth
> and CPU that it takes to update the cache.
> 
> We come at it from different perspectives:
> 
> - UX degradation for people coming from debian/ubuntu/apt
> - Network efficiency towards our mirrors
> - CPU usage while parsing the XML blob is extremely slow (i.e. minutes)
>   in RaspberryPi and slow ARM devices
> etc...
> 
> We've been looking into the possibility of preloading the libsolv file
> into the repository and the repomd.xml and then have dnf just download that.

Comment 1 Daniel Mach 2018-05-18 17:54:17 UTC
This is not the first time I hear about an idea of using solv based repodata instead of XML.

DNF team has several concerns:

Network efficiency
------------------
The benefit is not clear.
Solv files seem to be even bigger than XMLs.

$ du -shc /var/cache/dnf/fedora-f21308f6293b3270/repodata/*
45M     ...filelists.xml.gz
4.0K    ...repomd.xml
440K    ...comps-Everything.x86_64.xml.gz
16M     ...primary.xml.gz
61M     total

$ du -shc /var/cache/dnf/fedora*solv*
47M     /var/cache/dnf/fedora-filenames.solvx
21M     /var/cache/dnf/fedora.solv
67M     total


CPU usage on slow devices
-------------------------
Parsing XMLs is really slow here.
We take a look if we couldn't improve any bottleneck to speed it up.


Format stability
----------------
I consider solv files as a cache.
Not sure how much it's stable and arch independent.
We definitely need to understand more details before switching from XML to solv.

Comment 2 Alberto Ruiz 2018-05-18 18:24:37 UTC
(In reply to Daniel Mach from comment #1)
> This is not the first time I hear about an idea of using solv based repodata
> instead of XML.
> 
> DNF team has several concerns:
> 
> Network efficiency
> ------------------
> The benefit is not clear.
> Solv files seem to be even bigger than XMLs.
> 
> $ du -shc /var/cache/dnf/fedora-f21308f6293b3270/repodata/*
> 45M     ...filelists.xml.gz
> 4.0K    ...repomd.xml
> 440K    ...comps-Everything.x86_64.xml.gz
> 16M     ...primary.xml.gz
> 61M     total
> 
> $ du -shc /var/cache/dnf/fedora*solv*
> 47M     /var/cache/dnf/fedora-filenames.solvx
> 21M     /var/cache/dnf/fedora.solv
> 67M     total

That's an unfair comparison, those xml files are compressed, with gzip this is the result:
36M	fedora-filenames.solvx.gz
12M	fedora.solv.gz

And with xz:
30M	fedora-filenames.solvx.xz
9.6M	fedora.solv.xz

Though it is true that:
a) This is still same ballpark
b) Not all the data in the xml files is in the solv files

So perhaps not the strongest point indeed.

> CPU usage on slow devices
> -------------------------
> Parsing XMLs is really slow here.
> We take a look if we couldn't improve any bottleneck to speed it up.

I've done measurements with libxml2 and expat, both are around as slow as each other, I see hardly no difference in cpu time on each of them.

I think we won't single handedly make libexpat or libxml2 any faster.

> Format stability
> ----------------
> I consider solv files as a cache.
> Not sure how much it's stable and arch independent.
> We definitely need to understand more details before switching from XML to
> solv.

You have one repo per architecture, so cross platformness is not really that important, take into account that you can't even cross bootstrap across arches since all the post/pre rpm scripts will not work. There might be some cases dealing with just downloading rpms where this might be necessary, but I'd say in those cases you can still use the xml data.

I think we need to take into account that we should try our best to optimize the most common case here, dnf is a rather common operation on each machine and I don't think compromising everybody's most common operation because of the cross platform repo introspection usecase.

As per stability, I think that is a fair point and one we should clarify with libsolv upstream.

Comment 3 Peter Robinson 2018-05-19 11:35:05 UTC
> > Parsing XMLs is really slow here.
> > We take a look if we couldn't improve any bottleneck to speed it up.
> 
> I've done measurements with libxml2 and expat, both are around as slow as
> each other, I see hardly no difference in cpu time on each of them.
> 
> I think we won't single handedly make libexpat or libxml2 any faster.

This is a big win in cloud images, containers as well as ARM SBCs

> > Format stability
> > ----------------
> > I consider solv files as a cache.
> > Not sure how much it's stable and arch independent.
> > We definitely need to understand more details before switching from XML to
> > solv.
> 
> You have one repo per architecture, so cross platformness is not really that
> important, take into account that you can't even cross bootstrap across

Not entirely true, in most cases on the generation side this is currently done architecture agnostically. So would be a change/regression. That said all the information atm is shoved into the xml as well as the sqlite DB without issue so there's no reason the libsolv DB should be any different.

> I think we need to take into account that we should try our best to optimize
> the most common case here, dnf is a rather common operation on each machine
> and I don't think compromising everybody's most common operation because of
> the cross platform repo introspection usecase.

Yes, generating this once per repo on "server side" would be a huge win in constrained environments and even without much of a win in terms of size it would overall a huge win for end users.

Comment 4 Peter Robinson 2018-05-19 11:36:35 UTC
BTW not sure if this shouldn't be a RFE/bug against createrepo_c with a matching client side one for dnf to check/retrieve the libsolv DB before opting for the xml.

Comment 5 Igor Gnatenko 2018-05-20 13:50:48 UTC
Main problem here is that distributing binary files which do not have stable interface is risky. Moreover, there might be some vulnerabilities in solv parser (since it segfaults when reading fails).

Comment 6 Peter Robinson 2018-05-21 08:16:28 UTC
(In reply to Igor Gnatenko from comment #5)
> Main problem here is that distributing binary files which do not have stable
> interface is risky. Moreover, there might be some vulnerabilities in solv
> parser (since it segfaults when reading fails).

I wouldn't have thought the risk is any worse in terms of vulnerabilities whether it's processed client or server side, the consequences should be the same and ultimately the solv parser should be fixed.

Comment 7 Alberto Ruiz 2018-05-21 12:38:39 UTC
(In reply to Peter Robinson from comment #6)
> (In reply to Igor Gnatenko from comment #5)
> > Main problem here is that distributing binary files which do not have stable
> > interface is risky. Moreover, there might be some vulnerabilities in solv
> > parser (since it segfaults when reading fails).
> 
> I wouldn't have thought the risk is any worse in terms of vulnerabilities
> whether it's processed client or server side, the consequences should be the
> same and ultimately the solv parser should be fixed.

Agreed, I think the one risk we do have is on-disk format stability across releases.

Who's a good contact from libsolv upstream to ask about these stability guarantees?

Comment 8 Daniel Mach 2018-06-05 06:47:01 UTC
Libsolv upstream is:
https://github.com/openSUSE/libsolv
Michael Schroeder <mls>

I've verified that the problem truly is with the CPU performance.
My educated guess is that XML parsing isn't most likely the bottleneck.
I think it's indexing data from XML into solv file.

Once DNF team has enough free capacity, we can make a more detailed
analysis and eventually improve the performance.
I don't think this will happen sooner than in October because Modularity and YUM3 compatibility have top priority and keep us busy.

Comment 9 Alberto Ruiz 2018-06-06 17:27:44 UTC
Created attachment 1448388 [details]
callgrind on dnf makecache for reference

Comment 10 Jan Kurik 2018-08-14 10:30:09 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 29 development cycle.
Changing version to '29'.

Comment 11 Daniel Mach 2019-03-02 13:52:38 UTC
I agree with Igor's explanation in comment#5.
DNF team has no plans to support distributing the solv files at the moment.


Note You need to log in before you can comment on or make changes to this bug.