200008 – RFE: rpm/rpmlib should allow for better recovery options on failed transactions

Bug 200008 - RFE: rpm/rpmlib should allow for better recovery options on failed transactions

Summary: RFE: rpm/rpmlib should allow for better recovery options on failed transactions

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	rpm
Sub Component:
Version:	3.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Panu Matilainen
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	191841
Blocks:
TreeView+	depends on / blocked

Reported:	2006-07-24 20:20 UTC by Jason Vas Dias
Modified:	2007-11-30 22:07 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-10-19 18:42:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
screen grab from RHEL3 U7 to U8 up2date (28.98 KB, application/octet-stream) 2006-07-31 12:11 UTC, Mark Komarinski	no flags	Details
output from failed up2date run (24.53 KB, application/octet-stream) 2006-08-17 14:54 UTC, Jack Neely	no flags	Details
View All

Comment 1 Jason Vas Dias 2006-07-24 20:24:54 UTC

I agree it seems most remiss of up2date to abort the entire upgrade, leaving
essential packages half / not updated, if the upgrade of just one RPM on which
nothing else depends fails.

I've now cloned this issue as an up2date bug ; the pvm bug that caused the 
original issue is now fixed with pvm-3.4.5-8_EL3, but perhaps the up2date team
could look into fixing this nasty up2date behaviour in a future release.

Comment 2 Phil Moses 2006-07-24 20:30:55 UTC

As mentioned earlier, I had reported a similiar case in August of 2004 with
other packages that failed causing up2date to halt its progress and the result
was similar, packages not being installed and services being affected. 

This was reported as bug 129294

It was largely ignored and this has now reared its ugly  head again creating
quite a bit of frustration. 

I feel that this point needs to be pushed due to the fact that if it were
addressed originally it would not been a problem this time around and it is
realtively disheartening to see others are affected by it.

Comment 3 Bret McMillan 2006-07-26 20:10:34 UTC

up2date pretty much just builds up a single rpm transaction, and relies upon
that being successful.  Fatal errors to rpm are fatal errors to up2date.

What this really is is an rfe against rpm/rpmlib to provide better recovery
against failed transactions.  It could be something totally internal to rpm, but
it may be something that tools using rpm, such as up2date or yum, will have to
be extended once the fundamental bits are in place in rpmlib.

Resetting summary, component, and owner.

Comment 4 Jeff Johnson 2006-07-27 11:56:56 UTC

What exactly failed?

Comment 5 Mark Komarinski 2006-07-27 18:10:45 UTC

Had the exact same problem today going from RHEL U7 to U8.  PVM was not
installed on my system.

Comment 6 Jeff Johnson 2006-07-29 14:12:31 UTC

What *exactly* failed? "Me too" doesn't help any more than incoherent ramblings about "better recovery" 
features does.

Comment 7 Mark Komarinski 2006-07-31 12:11:17 UTC

Created attachment 133311 [details]
screen grab from RHEL3 U7 to U8 up2date

Comment 8 Mark Komarinski 2006-07-31 13:34:59 UTC

Find attached.  After the up2date ran, just about every RPM that was to be
upgraded was instead erased or in an off state.  RPM itself was hosed with
packages (libelf) missing.

I was able to get around this by going through the console, copying over libelf
from another RHEL system, using that to get the RPM packages installed from the
up2date spool directory, then put the rest of the packages back as best I could.
 Fortunately it's a demo system, but it still took the better part of an hour to
get things mostly back to where it should have been.

Comment 9 Jeff Johnson 2006-08-04 10:59:55 UTC

WAs libgcc actually installed twice or is that an artifact of cut-n-paste?

Comment 10 Jeff Johnson 2006-08-04 11:04:26 UTC

In fact, why are identical packages duplicated in the same transaction?

That will surely hose an rpmdb, and should have been detected and
prevented when adding packages to the transaction.

Comment 11 Mark Komarinski 2006-08-04 12:43:44 UTC

> WAs libgcc actually installed twice or is that an artifact of cut-n-paste?

Not an artifact.  In fact, duplicate versions of libgcc were in
/var/spool/up2date (along with other packages).  I've started upgrading other
systems and checked the up2date spool directory and they had nothing in them. 
All those upgrades have completed without issue, so my guess is this is the root
cause.

BTW, rpmdb was not hosed.  Once I copied over the libelf files from another
system and reinstalled the libelf and rpm rpms, I was able to piece the system
back together by hand.

Comment 12 Jeff Johnson 2006-08-05 07:16:28 UTC

Identical packages being erased in the same transaction should never ever happen, and
will exhibit error messages like
    error: error(-30990) setting header #1431 record for Packages removal
because already erased entries in indices are being removed again.

Duplicate identical packages needs to be investigated.

Sure the rpmdb is not hosed, but it may not contain accurate information either.

Comment 13 Jeff Johnson 2006-08-05 08:42:00 UTC

This should be reassigned back to up2date to investigate how multiple
identical packages are being added to a transaction.

Comment 14 Mark Komarinski 2006-08-05 11:12:59 UTC

No, this is definately an rpm bug.  It might have been triggered by a 
different bug in up2date, and a new bug report can be opened for the up2date 
bug.

Comment 15 Jeff Johnson 2006-08-07 18:20:58 UTC

What is the bug?

rpmlib is data driven, if an application chooses to construct a transaction with
multiple erasures of identical packages -- even though this is not supported by rpmlib --
then its an application, not an rpmlib, bug.

Or a feature request for rpm.

Now which is it?

Comment 16 Mark Komarinski 2006-08-07 19:18:05 UTC

Normally Linux should be considered something like a loaded gun pointed at your
foot and you have an itchy trigger finger.  And with a lot of other
applications, I'd agree that if you really want to do something stupid, then by
all means go for it.

RPM is not and should not be considered one of those packages.  Every other time
you try to do something stupid with RPM it prevents you from doing it, or at
least makes you confirm that yes, in fact, you want to do this dumb thing. 
Allowing this case is now inconsistent with RPM's normal behavior, so either RPM
should allow you do do dumb things without warning or fix cases where dumb
things are allowed without warning.  Given past history of RPM, it would be
easier to do the latter.

To answer your question:  Neither.  It's a bug in rpm.  There may be bugs in
other applications (up2date) that trigger it, but there's no way that rpm should
accept a transaction like this in the first place.

Comment 17 Jeff Johnson 2006-08-08 04:50:21 UTC

So fix the bug in rpm. Or convince RH to fix the bug.

I see something broken in up2date, not rpmlib, your expectations notwithstanding.

Comment 18 Mark Komarinski 2006-08-08 13:01:56 UTC

> Or convince RH to fix the bug.

Right.  That's why this was filed.

Can I get a response on this from someone with a redhat.com address?  I'm
getting a bit tired hearing excuses from what appears to be a professional staller.

Comment 19 Jack Neely 2006-08-17 14:51:35 UTC

I've been able to reproduce this bug on RHEL 4 while appling Update 4. 
(However, this case was not due to a bad package from RHN.)  The examples I have
are from an x86_64 machine.

The administrator of these machines had previously run the following commands in
thier kickstart:

  /bin/mv -f /usr/share/ssl/certs /usr/share/ssl/localcerts
  /bin/ln -s /afs/bp/contrib/openssl/ssl/certs /usr/share/ssl/certs

So the /usr/share/ssl/certs was now a symlink rather than the original directory
 that the package put there.  Update 4 contained an errata for openssl and the
following errors occured:

Installing /var/spool/up2date/openssl-0.9.7a-43.10.x86_64.rpm...
error: unpacking of archive failed on file /usr/share/ssl/certs: cpio: chown
Shutting down NFS mountd: [FAILED]
Shutting down NFS daemon: [FAILED]
Shutting down NFS quotas: [FAILED]
Shutting down NFS services:  [  OK  ]
Shutting down RPC idmapd: [  OK  ]
Stopping NFS statd: [  OK  ]
groupdel: group rpm does not exist
There was a fatal RPM install error. The message was:
There was a rpm unpack error installing the package: openssl-0.9.7a-43.10

This being in the middle of the entire Update 4 transaction all of the affect
machines were completely hosed.  Most services (like sshd and crond) were no
longer running after the update.  Most of the system was nonfunctional due to
missing files and libraries.  The RPM database was inconsistant with what was
applied to the file system.

This issue has become a very serious issue for NCSU and we need to see some
movement about getting this condition fixed.

Comment 20 Jack Neely 2006-08-17 14:54:36 UTC

Created attachment 134389 [details]
output from failed up2date run

This attachment contains the output from up2date showing all packages being
upgraded and the error in the transaction.

Comment 21 RHEL Program Management 2007-10-19 18:42:16 UTC

This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.