Bug 1195036 - Something about metadata like npm
Summary: Something about metadata like npm
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Fedora
Classification: Fedora
Component: dnf
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Packaging Maintenance Team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-02-22 11:40 UTC by Megh Parikh
Modified: 2015-03-17 10:41 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-03-12 13:49:16 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 850896 0 medium CLOSED [rfe] [metadata] smarter metadata format for quicker sync (delta/per-package/...) 2021-02-22 00:41:40 UTC

Description Megh Parikh 2015-02-22 11:40:42 UTC
Description of problem:
Many a times users use package managers to install a single piece of software weighing lets say 5MB and the metadata downloaded is 14MB. Users frequently installing software is not so common so the additional burden of downloading metadata uses both the server's and the user's resources and time.

This can be solved if we use the following approach :

AFAIK npm package manager for nodejs doesnot work by downloading metadata.
Instead it downloads the metadata of only the required packages.

- For eg., if you request package A (which is not installed), its spec is first downloaded from http://fedorarepo/f21/spec/A.
- Now we read metadata of A and see it requests package B and C.
- In parallel the binary download is also started (from http://fedorarepo/f21/binary/A-version-x86_64.rpm).
- But B is already installed so we query http://fedorarepo/f21/version/B to check if we have the latest version if not repeat similarly for B.
- C is not installed so we repeat for C. But C requires libLIB.so so we query http://fedorarepo/f21/provides/libLIB and install the required dependencies

Considerations :
- When running `sudo dnf update` like system wide updates, it may be feasible to download the metadata as many queries will have to be processed by the server.
- A static site generator should be used which overwrites only the needed files as compared to the previous metadata so that the changes may be deployed to mirrors and the server is not burdened with a scripting language or a database.
- Not all packages should be done like this. We could separate system packages and libraries whose metadata is downloaded every time but the metadata is only 1-2 MB
- While a database is not needed by the mirrors it may/should be needed for the static site generator. I think thata generator can be builtatop of the current metadata sqlite format
- This is a detailed suggestion for bug 850896
- If both B and C are dependent on the same package/library it should be queried only once. Thus we also need a smart client side

Comment 1 Megh Parikh 2015-02-22 11:44:21 UTC
I may try to work on the generator in my spare time if but currenly I am waiting for somebody providing feedback on this

Comment 2 Hedayat Vatankhah 2015-02-22 12:07:37 UTC
Notice that there are file based dependencies like: /usr/share/foo/bar.data, so, every file in every package might be required by some other packages. Do you want to add all files, one by one, to /provides/ section of the server?

I'm also in favor of such kind of scalable metadata format, and have talked about it in yum mailing list long time ago. Also, there were some suggestions in yum wiki. Unfortunately, I've not found enough time to start working on a prototype. BTW, since such an idea will probably need some refinement, dnf/yum developers should participate in the discussion and talk about different aspects of it.

Comment 3 Megh Parikh 2015-02-22 12:14:16 UTC
We may have a mechanism in which we list all the rquired files and count the number of packages requiring them and the package providing them. Then we may move the packages providing the files which are rquired bymany packages to the system packages metadata which will be downloaded evvery time.

Here I assume that multiple packages dont provide the same file and there are not many packages which will have to be inserted into the system packages. So essentially we need some statistics

Comment 4 Jan Zeleny 2015-02-23 09:38:13 UTC
The situation is actually not as simple as it might seem. The rpm dependency model is orders of magnitude more complex than what npm uses. Dnf uses SAT solver (libsolv) to resolv all dependencies. The SAT solver is actually one of the main reason for dnf to exist and replace yum in future releases of Fedora. This dependency solver is also the main reason why the approach you suggest is unlikely not work. If I understand what you suggest correctly, it requires the dependencies to be resolved "on the fly" - that is not possible, as libsolv needs to have all the dependencies to be "final" at the time it starts its calculations. Having the dependencies pre-resolved for downloads and then really resolved by libsolv might be an option but not a good one for number of reasons (redundant calculations, not 100% reliable, ...).

A few years ago we discussed something similar. The core idea to have metadata per package was the same, the difference was that all the metadata where checksum would not match would be downloaded at once (no depsolving). In the end we decided to defer that idea in favor of the delta metadata that are described in bug 850896.

Comment 5 Jan Pazdziora 2015-03-03 12:39:44 UTC
(In reply to Megh Parikh from comment #0)
> 
> - For eg., if you request package A (which is not installed), its spec is
> first downloaded from http://fedorarepo/f21/spec/A.
> - Now we read metadata of A and see it requests package B and C.
> - In parallel the binary download is also started (from
> http://fedorarepo/f21/binary/A-version-x86_64.rpm).
> - But B is already installed so we query http://fedorarepo/f21/version/B to
> check if we have the latest version if not repeat similarly for B.

Why does installation of package A require upgrade of B to latest version (unless there is a versioned dependency)?

Comment 6 Jan Pazdziora 2015-03-03 12:44:49 UTC
(In reply to Jan Zeleny from comment #4)
> libsolv needs to have all the dependencies to be "final" at the
> time it starts its calculations.

This seems like a weak argument. The library can fail to find feasible transaction, backtrace (or abort the computation), fetch the metadata missing (or update those that were used in their cache version during the first run), and start anew.

If the move to dnf means we will now have to download all the metadata every time, without having a way to minimize those download sizes, we are heading for unpleasant regressions.

Comment 7 Michael Mráka 2015-03-03 12:53:30 UTC
> Why does installation of package A require upgrade of B to latest version
> (unless there is a versioned dependency)?

E.g. because B.latest contains libB.so.latest required by A but currently installed version of B contains libB.so.previous.

Comment 8 Jan Pazdziora 2015-03-03 13:02:13 UTC
(In reply to Michael Mráka from comment #7)
> > Why does installation of package A require upgrade of B to latest version
> > (unless there is a versioned dependency)?
> 
> E.g. because B.latest contains libB.so.latest required by A but currently
> installed version of B contains libB.so.previous.

Hmm, libB.so is required by C, not by A.

Comment 9 Jan Zeleny 2015-03-03 13:46:01 UTC
(In reply to Jan Pazdziora from comment #6)
> (In reply to Jan Zeleny from comment #4)
> > libsolv needs to have all the dependencies to be "final" at the
> > time it starts its calculations.
> 
> This seems like a weak argument. The library can fail to find feasible
> transaction, backtrace (or abort the computation), fetch the metadata
> missing (or update those that were used in their cache version during the
> first run), and start anew.

In which case this will be painfully slow, as it will download the missing metadata, try again, fail, download the missing metadata, ... while rebuilding its internal database in every iteration ...

> If the move to dnf means we will now have to download all the metadata every
> time, without having a way to minimize those download sizes, we are heading
> for unpleasant regressions.

How so? Yum does the same thing ...

Comment 10 Jan Pazdziora 2015-03-03 13:54:02 UTC
(In reply to Jan Zeleny from comment #9)
> 
> > If the move to dnf means we will now have to download all the metadata every
> > time, without having a way to minimize those download sizes, we are heading
> > for unpleasant regressions.
> 
> How so? Yum does the same thing ...

IIRC, at least filelists only get downloaded when a path dependency is hit, not upfront.

Comment 11 Jan Zeleny 2015-03-04 07:43:38 UTC
(In reply to Jan Pazdziora from comment #10)
> (In reply to Jan Zeleny from comment #9)
> > 
> > > If the move to dnf means we will now have to download all the metadata every
> > > time, without having a way to minimize those download sizes, we are heading
> > > for unpleasant regressions.
> > 
> > How so? Yum does the same thing ...
> 
> IIRC, at least filelists only get downloaded when a path dependency is hit,
> not upfront.

Fair enough, but that has nothing to do with the new format requested in this bug. I can see the validity of this specific request. To be honest I was under the impression that dnf does the same thing yum does even in this respect but now I'm not that sure. If it doesn't, it can be looked into. I'm not sure there is an official request for this though.

Comment 12 Honza Silhan 2015-03-12 13:49:16 UTC
This is not gonna change.(In reply to Jan Zeleny from comment #11)
> Fair enough, but that has nothing to do with the new format requested in
> this bug. I can see the validity of this specific request. To be honest I
> was under the impression that dnf does the same thing yum does even in this
> respect but now I'm not that sure. If it doesn't, it can be looked into. I'm
> not sure there is an official request for this though.

This is bug 968006 - it has better chance of getting fixed than this one. As was said in comment 4, all packages and relations are indexed and needs to be known beforehand. That's why the depsolver is so fast - due to accessing efficiently stored data. Switching to concept proposed in this bug would require rewrite 90% of depsolver. This is not gonna happen. To bring you at least some light to your lives you can take a look at this project for server side dependency solving [1] - no metadata downloaded at all on client side.

[1] https://github.com/rh-lab-q/server-side-dependency-solving

Comment 13 Jan Pazdziora 2015-03-12 17:46:18 UTC
(In reply to Jan Silhan from comment #12)
> 
> This is bug 968006 - it has better chance of getting fixed than this one. As

How does bug 968006 have any chance when it already was closed?

Comment 14 Honza Silhan 2015-03-17 10:41:19 UTC
If we find the bug critical, no other bugs left and more demand (comments) for fixing this, we will reconsider doing that.


Note You need to log in before you can comment on or make changes to this bug.