Red Hat Bugzilla – Bug 200008
RFE: rpm/rpmlib should allow for better recovery options on failed transactions
Last modified: 2007-11-30 17:07:10 EST
I agree it seems most remiss of up2date to abort the entire upgrade, leaving
essential packages half / not updated, if the upgrade of just one RPM on which
nothing else depends fails.
I've now cloned this issue as an up2date bug ; the pvm bug that caused the
original issue is now fixed with pvm-3.4.5-8_EL3, but perhaps the up2date team
could look into fixing this nasty up2date behaviour in a future release.
As mentioned earlier, I had reported a similiar case in August of 2004 with
other packages that failed causing up2date to halt its progress and the result
was similar, packages not being installed and services being affected.
This was reported as bug 129294
It was largely ignored and this has now reared its ugly head again creating
quite a bit of frustration.
I feel that this point needs to be pushed due to the fact that if it were
addressed originally it would not been a problem this time around and it is
realtively disheartening to see others are affected by it.
up2date pretty much just builds up a single rpm transaction, and relies upon
that being successful. Fatal errors to rpm are fatal errors to up2date.
What this really is is an rfe against rpm/rpmlib to provide better recovery
against failed transactions. It could be something totally internal to rpm, but
it may be something that tools using rpm, such as up2date or yum, will have to
be extended once the fundamental bits are in place in rpmlib.
Resetting summary, component, and owner.
What exactly failed?
Had the exact same problem today going from RHEL U7 to U8. PVM was not
installed on my system.
What *exactly* failed? "Me too" doesn't help any more than incoherent ramblings about "better recovery"
Created attachment 133311 [details]
screen grab from RHEL3 U7 to U8 up2date
Find attached. After the up2date ran, just about every RPM that was to be
upgraded was instead erased or in an off state. RPM itself was hosed with
packages (libelf) missing.
I was able to get around this by going through the console, copying over libelf
from another RHEL system, using that to get the RPM packages installed from the
up2date spool directory, then put the rest of the packages back as best I could.
Fortunately it's a demo system, but it still took the better part of an hour to
get things mostly back to where it should have been.
WAs libgcc actually installed twice or is that an artifact of cut-n-paste?
In fact, why are identical packages duplicated in the same transaction?
That will surely hose an rpmdb, and should have been detected and
prevented when adding packages to the transaction.
> WAs libgcc actually installed twice or is that an artifact of cut-n-paste?
Not an artifact. In fact, duplicate versions of libgcc were in
/var/spool/up2date (along with other packages). I've started upgrading other
systems and checked the up2date spool directory and they had nothing in them.
All those upgrades have completed without issue, so my guess is this is the root
BTW, rpmdb was not hosed. Once I copied over the libelf files from another
system and reinstalled the libelf and rpm rpms, I was able to piece the system
back together by hand.
Identical packages being erased in the same transaction should never ever happen, and
will exhibit error messages like
error: error(-30990) setting header #1431 record for Packages removal
because already erased entries in indices are being removed again.
Duplicate identical packages needs to be investigated.
Sure the rpmdb is not hosed, but it may not contain accurate information either.
This should be reassigned back to up2date to investigate how multiple
identical packages are being added to a transaction.
No, this is definately an rpm bug. It might have been triggered by a
different bug in up2date, and a new bug report can be opened for the up2date
What is the bug?
rpmlib is data driven, if an application chooses to construct a transaction with
multiple erasures of identical packages -- even though this is not supported by rpmlib --
then its an application, not an rpmlib, bug.
Or a feature request for rpm.
Now which is it?
Normally Linux should be considered something like a loaded gun pointed at your
foot and you have an itchy trigger finger. And with a lot of other
applications, I'd agree that if you really want to do something stupid, then by
all means go for it.
RPM is not and should not be considered one of those packages. Every other time
you try to do something stupid with RPM it prevents you from doing it, or at
least makes you confirm that yes, in fact, you want to do this dumb thing.
Allowing this case is now inconsistent with RPM's normal behavior, so either RPM
should allow you do do dumb things without warning or fix cases where dumb
things are allowed without warning. Given past history of RPM, it would be
easier to do the latter.
To answer your question: Neither. It's a bug in rpm. There may be bugs in
other applications (up2date) that trigger it, but there's no way that rpm should
accept a transaction like this in the first place.
So fix the bug in rpm. Or convince RH to fix the bug.
I see something broken in up2date, not rpmlib, your expectations notwithstanding.
> Or convince RH to fix the bug.
Right. That's why this was filed.
Can I get a response on this from someone with a redhat.com address? I'm
getting a bit tired hearing excuses from what appears to be a professional staller.
I've been able to reproduce this bug on RHEL 4 while appling Update 4.
(However, this case was not due to a bad package from RHN.) The examples I have
are from an x86_64 machine.
The administrator of these machines had previously run the following commands in
/bin/mv -f /usr/share/ssl/certs /usr/share/ssl/localcerts
/bin/ln -s /afs/bp/contrib/openssl/ssl/certs /usr/share/ssl/certs
So the /usr/share/ssl/certs was now a symlink rather than the original directory
that the package put there. Update 4 contained an errata for openssl and the
following errors occured:
error: unpacking of archive failed on file /usr/share/ssl/certs: cpio: chown
Shutting down NFS mountd: [FAILED]
Shutting down NFS daemon: [FAILED]
Shutting down NFS quotas: [FAILED]
Shutting down NFS services: [ OK ]
Shutting down RPC idmapd: [ OK ]
Stopping NFS statd: [ OK ]
groupdel: group rpm does not exist
There was a fatal RPM install error. The message was:
There was a rpm unpack error installing the package: openssl-0.9.7a-43.10
This being in the middle of the entire Update 4 transaction all of the affect
machines were completely hosed. Most services (like sshd and crond) were no
longer running after the update. Most of the system was nonfunctional due to
missing files and libraries. The RPM database was inconsistant with what was
applied to the file system.
This issue has become a very serious issue for NCSU and we need to see some
movement about getting this condition fixed.
Created attachment 134389 [details]
output from failed up2date run
This attachment contains the output from up2date showing all packages being
upgraded and the error in the transaction.
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
For more information of the RHEL errata support policy, please visit:
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.