Bug 143254

Summary: RFE: "patch" rpms to reduce downloads / unnecessary "updates"
Product: Red Hat Enterprise Linux 3 Reporter: David Anderson <david>
Component: rpmAssignee: Jeff Johnson <jbj>
Status: CLOSED WONTFIX QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: nobody+pnasrat
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-12-29 09:58:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Anderson 2004-12-17 19:07:11 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.2) (KHTML, like Gecko)

Description of problem:
I've just registered a new RHEL machine. On running up2date for the first time, there are 477Mb of updates available. That'll take me 4 hours to download on our 512kpbs connection (I'm grateful this isn't dialup!).

This update set includes a complete replacement of all the XFree86 packages - 18 in my installation, including XFree86-doc. All this to fix just a handful of bugs in very select parts of the XFree86 installation.

Now, to some extent that's a result of the way the packager of XFree86 has set things up (why should a new release of XFree86-doc be necessary when fixing a bug in the server?). But the implementation of "patch" RPMs would have big benefits:

- Much reduced bandwidth usage; people on dialup might be able to actually download updates (*gasp* !)
- Avoid upgrading RPMs that already work and don't need fixing (as happens e.g. when a new openssh-client is released for no other reason than that openssh-server had a bug fix and the version numbers need to stay in sync); stop violating the principle "don't fix what isn't broken - especially on 'enterprise' production systems!"
- Reduced time to install updates; probably 95 out of every 100 files that RPM touches when doing an up2date haven't actually been changed...

SUSE have had "patch" RPMs implemented for some time. I don't know how they've done it (I'm not even a C coder), but as a sysadmin it seems to me like an obvious and basic feature for package management (our Solaris systems all have it with Sun's package manager).

The simplest way to implement "patch" RPMs would avoid having the concept of a "patch" in the actual RPM database itself. It should be done so that the result of installing a "patch" RPM is indistinguishable from downloading a complete copy of the complete updated RPM and updating to that.

In a first attempt, the patch RPM could contain a list of files that are new, files that should go from the original, and files that have been changed from the original. The payload would only need to contain the changed and new files. A more refined version could contain binary diffs for the changed files, but this requires rpm to have access to the original RPM when creating the patch. The "patch" RPM would also need extra headers - to indicate both the checksums of the "patch" RPM and of the result when the patch is installed over the original. It'd also need to have the original as a hard dependency (hard as in "can't be overridden by --nodeps").

This request seems so obvious that I can't believe it's not been asked for before; couldn't find any open or closed bugs for it for RHEL 3 though. Sorry if I didn't look hard enough!

An obvious place to look for doing this would be to see how SUSE have done it. To give a couple of examples of the bandwidth savings from http://www.suse.com/us/private/download/updates/92_i386.html :

Update RPM:  mysql 4.0.21-4.2 (i586): 8326 kB
Patch RPM: mysql 4.0.21-4.2-patch (i586): 53 kB

Update RPM: cups 1.1.21-5.3 (i586): 6760 kB
Patch RPM: cups 1.1.21-5.3-patch (i586): 365 kB

Not all savings are that good, but that's a 14Mb saving for just two updates.

Version-Release number of selected component (if applicable):
4.2.2-0.14

How reproducible:
Always

Steps to Reproduce:
1. Install RHEL3
2. Run up2date

    

Actual Results:  Marvel at the amount of bandwidth/CPU time you have to burn in because of the deficiences of Red Hat's package manager.

Expected Results:  Purr and coo as at the efficiency of Red Hat's package manager as it only downloads updates to bits that have been fixed and not whole swathes of unchanged material.

Additional info:

Comment 1 David Anderson 2004-12-17 19:25:51 UTC
I've now done a more extensive search in Bugzilla; other requests 
are: 
 
7121 : 1999-11-18 : WONTFIX 
64053 : 2002-04-24 : DEFERRED 
103205 : 2003-08-27 : NEEDINFO 
 
The last is interesting as it has a patch from SUSE attached. 
 
Objections seem to cluster around the following: 
1. Packagers may not be competent enough to handle the extra 
complexity. (My response: I have a suggestion below that has little 
complexity - no changes to spec files and no extra skills needed). 
2. SUSE's patch for RPM 4 wasn't "well tested" (this is 16 months 
ago), and there are extra complexities to deal with. (My response: I 
don't know if it is now well tested or not). 
3. Philosophical objections to the whole concept. (My response: I 
believe philosophical objections would only apply against a "patch" 
concept that integrates to RPM at the wrong level). 
 
I have an idea for how to implement patch RPMs that wouldn't 
necessitate major re-working of RPM itself; no changes to spec files, 
no new skills necessary to learn. Just reduced downloads and CPU 
churn! 
 
a) Create a new program to generate a "patch RPM"; its input would be 
the old RPM, a new RPM, and a list of files that are changed (i.e. in 
both RPMs, but different in substance - e.g. as a result of bug fix). 
The program would then auto-discover (by examining the RPMs) the list 
of files that have been deleted, and those that have been added. 
The created "patch" could then have binary diffs also to further save 
bandwidth. 
 
- Modify rpm installation/updating as follows: if one of the RPMs to 
install is a patch, do this: 
* Verify that installed package has correct checksums, etc - 
otherwise, complain. (This is only necessary if binary diffs are 
being used). 
* Create a new RPM in a temporary directory by combining the 
installed RPM and the patch to recreate the original "new" RPM. 
* Behave as if the request was to update to the recreated new RPM, 
and forget that it originated from a patch. 
 
Under this method, there's little complexity; the update RPM is 
re-constructed at an "early" stage - the core RPM software never 
needs to know or care that there was originally a patch, or that 
patches even exist. 
The downside to this method is that there's no way to roll back just 
the patch; you'd have to download the original RPM. However, that's 
the way it is now in any case, so there's no loss - just not an 
optimal gain. 

Comment 2 James Olin Oden 2004-12-17 20:31:06 UTC
Hi, 

Here is conversation on the rpm-devel list that gives some insight 
about what Jeff Johnson is thinking of doing regarding patches:

   https://lists.dulug.duke.edu/pipermail/rpm-devel/2004-
December/000154.html

Concerning your last statement, you can rollback rpms presently today 
without downloading the version you wish to rollback to.   At least 
for my company this is a very important feature.  That said I believe 
it is possible to provide what your looking for without overly 
complicated rollback mechanisms.

That said, I am only a hacker in the shadows, and not a RH employee.

Cheers...james

Comment 3 David Anderson 2004-12-21 11:09:34 UTC
Regarding rollback - I ought to read the man page more carefully to 
learn how to do it properly! 
 
I removed an rpm with --repackage last week (on Fedora Core 2), but 
trying to reinstall it gave errors about signature; I tried to 
install without signature or package checksums but it complained 
about the checksums of individual files within the package, at which 
point I gave up and downloaded the original RPM again to save time in 
working out what I should be doing. :-( 

Comment 4 Jeff Johnson 2004-12-29 09:58:23 UTC
Patch packages add a great deal of complexity to package installs
and are unlikely to be able to be used generally or widely.

The far better solution is deltas on *.rpm packages, which can
be done with all existing packages right now.

Comment 5 David Anderson 2005-01-05 19:29:05 UTC
OK... I'm using up2date to get updates from RHN on RHEL 3 - how do I 
get it to do "deltas on *.rpm packages" ?? I'd love to reduce my 
500Mb download on a 512kbps line. 

Comment 6 David Anderson 2005-01-08 12:32:46 UTC
In response to #4 I'd also say: 
 
- The complexity is all yours! This is a RHEL bug; to update, we 
just run up2date and click a few boxes. Isn't it RH's job to deal 
with the complexity rather than make the users suffer? Isn't that 
what we pay for? 
 
- If patch RPMs were able to be used by up2date, then they would be 
used everywhere where up2date is used - i.e. everywhere where RHEL 
is updated. 
 
I don't really understand how to use "deltas on *.rpm packages" to 
update our RHEL 3 install, so if anyone can enlighten me, I'd be 
very thankful! Cheers. 

Comment 7 David Anderson 2005-01-12 20:33:33 UTC
I've now read a bit more on this problem... I think I now understand 
that by "right now" you don't mean that I can do it right now on 
RHEL 3, but that it can be on RPMs that already exist (i.e. they 
don't need to be rebuilt). Sorry for my ignorance/timewasting...