Description of problem: During a network install, downloading and installation phases strictly alternate. Each can take considerable time, during which one or the other resource (network, disk) is relatively idle (=> wasted). It would be much more efficient (=> faster installation) if the process was pipelined: downloads could proceed continuously, in an independent thread from installation (rpm -i). This would keep both the machine and the network busy, like they should be.
Yea, I also thought of that. The only problem is the dependency packages, either should they all be installed with rpm --force, or they should only be installed when their dependencies also were there. The first one would probably be the fastest, but what if a user tries to run a package before its deps has been satisfied. This would hopefully make yum twice as fast, which is quite a lot, as it some time takes half an hour or so, if you haven't been running it for a while. btw, shouldn't the bug be changed from fc3 to fc4, and from normal to high priority?
I don't see how making it high priority makes any sense.
If the downloads are ordered in a suitable dependency order (topological sort), there may well be points where it is correct to install subsets of the entire session, without resorting to --nodeps etc.
Without knowing any of the guts of yum's resolver, it must already create an ordered to do list. Perhaps an extra field could be created that indicates a single rpm transaction, incrementing from 1 for packages for which a transaction only requires itself to be updated (and other packages don't require the currently installed version). Do this throughout the list leaving the harder ones (like kernel with it's looping dependencies) at a higher value in the list. Meanwhile (back in Gotham city), a second thread has begun downloading the lowest numbered packages. Might as well get them - gonna need em in a few minutes anyway - and the network/disk is idle, while the dependency calculation goes near max CPU. Once the depency list is resolved, the first of the downloaded packages can be installed (since it had no other deps/reqs that would have been broken). (I assume that yum performs single rpm transactions... rather than a single command with all the packages it has worked out as needed for success). Also, if the user has given the -y "go do it" switch (or scheduled install), then there is no reason to wait to download any rpm...if the repo says there is an update and you already have an earlier version installed then yum must download it, the earlier the better (perhaps even save getting the headers separately for these directly mentioned packages, since user has already said "yes mate, stop stuffing around and get on with the job" ;) . Extract the header from the rpm if need be. Does the resolver open the rpm database to check the requires etc, if not, or it's not blocking rpm, then yum could install each single package that completes download (and keep the disk/cpu revving) that doesn't require other package changes in another thread while the resolver is do it's thing. When yum gets to somewhere where it needs say 10 packages to complete a single update then the rpm process would pause until all rpms in that group are download. This process could even help with rawhide when new packages created have conflicted with other packages. Instead of all users being unable to successfully complete yum update for a period of time until the repo is "brought into line", yum would simply end up with installing on the resolvable dependency packages, rather than totally bailing out. (eg couple times in 2006-01, eg initscripts v udev v hal). This would be a nice improvement, and would help to get community test machines more up2date so that resolvable updated packages are being tested sooner.
To keep the yum CLI simple, downloading and installing isn't going to happen (it makes the progress bar UI way too odd). The infrastructure is there and used in cases where it helps a lot such as anaconda.