Bug 97019

Summary: Database corruption upgrading rpm-4.2, glibc-2.3.2
Product: [Retired] Red Hat Linux Reporter: Ralph Siemsen <ralphs>
Component: rpmAssignee: Jeff Johnson <jbj>
Status: CLOSED DEFERRED QA Contact: Mike McLean <mikem>
Severity: medium Docs Contact:
Priority: medium    
Version: 9   
Target Milestone: ---   
Target Release: ---   
Hardware: strongarm   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-08-21 15:32:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ralph Siemsen 2003-06-09 01:30:01 UTC
Description of problem:
Attempting to get rpm-4.2 working on armv4l architecture.  Before installation
is completed, database appears to be corrupted.  All operations result in:

rpmdb: fatal region error detected; run recovery
error: db4 error(-30982) from db->sync: DB_RUNRECOVERY: Fatal error, run
database recovery

Attempting to extract the database manually fails as well:
# /usr/lib/rpm/rpmdb_dump Packages-ORIG                                           
VERSION=3
format=bytevalue
type=hash
db_dump: DB->stat: DB_PAGE_NOTFOUND: Requested page not found

Version-Release number of selected component (if applicable):
elfutils-0.76-3
rpm-4.2-0.69
glibc-2.3.2-27.9

How reproducible:
100% so far... (two tries out of two)

Steps to Reproduce:
1. rpm -Uvh glibc-2.3.2-27.9_nw2.armv4l.rpm
2.
3.
    
Actual results:
During the upgrade, lots of errors from RPM telling you to "run database
recovery".  Not sure if the install completed correctly - rpm is unusable
afterwards.

Expected results:
Package should have installed, after which we would have installed new elfutils
and rpm packages.

Additional info:
Backups of /var/lib/rpm/Packages before and after the "upgrade" are available.
http://www.netwinder.org/autobuild/rpm-debug/Packages-old-4.1.gz
http://www.netwinder.org/autobuild/rpm-debug/Packages-new-4.2.gz

Thanks Jeff for all your assistance so far!

Comment 1 Jeff Johnson 2003-06-11 13:10:16 UTC
I'm getting there, you haven't been forgotten, currently running
    cd gcc-3_3
    make check-gcc
to avoid the java test failures on the netwinder.

Comment 2 Ralph Siemsen 2003-06-11 15:37:37 UTC
Thanks for the update :)  I also have some additional information.

I used the exact same binary packages (glibc, rpm, ...) and did a "fresh"
installation, eg. create a subdirectory, rpm --root /newdir --initdb, then
install the packages there.  Then I chrooted into this directory and ran various
tests with rpm.  Everything worked fine.  Installed and removed some packages,
again no problems.  Also tried booting this image (not chroot) and worked fine
as well.

So, the tools seem to work fine.  The problem occurs only during the "upgrade"
while glibc is being switched.  It is failing reproducibly there.

Since "fresh" install is preferable to me anyways, this problem is not really
holding me up now.  It still needs to be fixed, since users who are upgrading
will "feel the pain".

Comment 3 Jeff Johnson 2003-06-11 17:30:37 UTC
Thanks for pinning it down. If rpm is statically linked,
you might start looking for an alignment problem in some
glibc data structure. That's a killer if static+strongarm.

A backtrace might shed some light ...

Comment 4 Ralph Siemsen 2003-06-11 17:50:56 UTC
Yes, rpm is statically linked.  However I'm not seeing a segfault or anything
like that that I could backtrace on.  What happens is i do "rpm -Uvh glibc
glibc-common" and it begins installing files, then (in the middle of all the
hash marks) it starts spewing: 

rpmdb: fatal region error detected; run recovery
error: db4 error(-30982) from db->sync: DB_RUNRECOVERY: Fatal error, run
database recovery

It does eventually finish and has gotten the new glibc installed on the disk,
although the rpm database is corrupted.

So the only think I could do is run the same thing with extra -vv and maybe
under strace() to try and narrow down where the corruption is happening. 
However it costs me a build node each time I do this so I would prefer to do as
few as possible :)

Comment 5 Jeff Johnson 2003-06-11 20:21:28 UTC
Be lazy, I believe I already have eough info to
reproduce the problem after I get a toolchain
together ...

Comment 6 Ralph Siemsen 2003-07-23 00:00:19 UTC
Looks like this may have worked itself out.  I repeated the upgrade from
dm-3.9-28 image in two stages, first going to rpm-4.0.4 and converting the
database, then upgrading simultanously to rpm-4.2 and glibc-3.2.3.  This isn't
any different from what I did previously, except that some half dozen full
rebuilds have elapsed.

So far the database seems intact.


Comment 7 Jeff Johnson 2003-08-21 15:32:00 UTC
I suspect this problem may have fallen off critical pathsnow.