Bug 1619368 - RFE: lazy loading filelists.xml.gz repodata
Summary: RFE: lazy loading filelists.xml.gz repodata
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: dnf
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: rpm-software-management
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-20 15:52 UTC by Kevin Fenzi
Modified: 2023-08-23 06:50 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2023-08-23 06:50:12 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Kevin Fenzi 2018-08-20 15:52:22 UTC
See: 

https://pagure.io/fesco/issue/1955
https://pagure.io/packaging-committee/issue/714

It would be great if dnf could only download the filelists.xml.gz repodata when it needs data from it, and not bother the rest of the time. 

This would save a LOT of data users download. For example the rawhide one is currently 47MB. Not downloading and loading this into memory would also make dnf start much quicker.

Comment 1 Daniel Mach 2018-08-27 11:24:03 UTC
This is something we'd like to implement, although we have different priorities (F29 blockers, modularity etc.). We could probably deliver it for F30.

Comment 2 Kevin Fenzi 2018-08-27 17:23:42 UTC
Thanks.

Comment 3 Jaroslav Mracek 2019-07-24 15:55:13 UTC
This is tricky request. How dnf knows that filelist is not required? Filelists are not only used for dependency resolution but also by users that want to install or discover a package that provide a certain path.

Some examples:
dnf install foo
# foo package found, but it required foo1, and it requires /etc/dnf/dnf.conf. But still some package could provide it. But in this phase it really fails. DNF tries to download all filelist for all repositories and put then into sack, but one filelist cannot be downloaded, because new metadata available. Now it gets really messy, because dnf would drop the sack, re-download the problematic repository, load sack again, resolve all arguments again.

dnf makecache
dnf repoquery -C /etc/dnf/dnf.conf - With cache only - nothing found, fail, or what dnf should provide?

The most of commands will have similar problems:
1. We cannot predict whatever filelist will be required in advance
2. We don't know whatever downloading of filelist will help to resolve the request
3. Fail of downloading of filelist could result in re-downloading complete metadata from repository, recreating of sack - a lot of CPU is used, query all arguments again (start from the beginning).

What it brings:
1. less download
2. lower disc requirements
3. faster resolution and dnf loading

The cost:
1. more downloads - zchunk
2. slower resolution and dnf loading (load 2x)
3. The spare that was spared for filelist cannot be used anyway, because it could be required anyway.
4. Working in cache only mode gets really ugly or unreliable.

Comment 4 Kevin Fenzi 2019-08-16 00:40:43 UTC
Yeah, understood... 

> 2. We don't know whatever downloading of filelist will help to resolve the request

Can't dnf tell this by the /path in the requires and download then?

Comment 5 Jaroslav Mracek 2023-08-23 06:50:12 UTC
We implement the feature in DNF5. We detected that there is nearly 80 source packages in Fedora that has a problem with the feature, but at the present time less then 10 package (Fedora 39) is conflicting with the feature.

I am closing the issue, because we don't have a plan to modify DNF behavior and DNF5 will replace dnf (Fedora 41) where the feature is present.


Note You need to log in before you can comment on or make changes to this bug.