Bug 101161

Summary: rpm 4.2 --rebuilddb hangs and rpm database is corrupted
Product: [Retired] Red Hat Linux Reporter: Jari Aalto <jari.aalto>
Component: rpmAssignee: Jeff Johnson <jbj>
Status: CLOSED WORKSFORME QA Contact: Mike McLean <mikem>
Severity: high Docs Contact:
Priority: high    
Version: 9   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-08-19 16:03:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jari Aalto 2003-07-29 18:59:37 UTC
This bug relates to:

        Bugzilla Bug 73097
        rpm-4.1 hangs, can't be killed: READ THIS FIRST
        http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=73097

        
Description of problem:

rpm --rebuilddb cannot rebuild database. It does not respond to
C-c key. Removing the lock files from
in /var/lib/rpm do not help

Version-Release number of selected component (if applicable):

rpm 4.2

How reproducible:

It's impossible to generate the situation, because it happened 
during standard RH9 install followed by call to up2date which hung
at updating 'ypserv'. After that no rpm calls worked any more. 
Database was corrupt.

up2date process had to be killed with -KILL, since it did not repsond to
C-c key.


Steps to Reproduce:

1. Install RH9 with kickstart + HTTP install (desktop)
2. Arrange up2date to be ready to update packages
3. Start up2date -u 
4. At the server side, generate link error, so that up2date hangs
    
Actual results:



Expected results:

5. The rpm database is corrupt after that
6. rpm --rebuilddb does not restore the situation

Additional info:

$ rm -f /var/lib/rpm/__db*
$ rpm --rebuilddb

None of these restored the state. Rebuild hung and couldn't be 
stopped with C-c. For the record for users with similar problems,
do this:I tried to restore that database state with:

$ scp /var/lib/rpm/Packages debian.machine:/tmp/fixit/

I loggged in debian:

$ apt-get install db4.1-util db4.1-doc
$ cd /tmp/fixit
$ db4.1_dump -r -f dumped Packages
$ db4.1_load -f dumped Packges.ok

Transferred that "Packages.ok" to the Redhat machine again, but
a simple 

$ rpm -qa

Gave huge list of errors, like:

error: rpmdbnextIterator: skipping h#xxxxxx blob size(8): BAD 8 + 16 * 
il(xxxxxx) + dl(xxxxxx9
memory alloc (xxxxx bytes) returned NULL.

According to db4.1_load(1) manual page if the database used user defined 
prefixes or comparision functions, it is impossible tdump and restore the
database.

Is this what RPM utilities are doing? Using custom settings, so that standard 
tools cannot repair the damage. If so, please change to use standard 
hash-database.

Comment 1 Jeff Johnson 2003-07-31 17:56:52 UTC
No, there are no "custon" utilities, only internal Berkeley db-4.1.25
compiled with --with-uniquename=_rpmdb to
    a) make an rpm build easier (i.e. don't have to configure/build db4)
    b) unique symbols to avoid symbol collisions.
In fact, this is what was recommended by Sleepycat.

All I can tell from above is that you have some damage, I'd
need the xxxxx to guess what was damaged.

Meanwhile, fix is probably possible if you give me a pointer
(i.e. URL, attachments won't work) to the earliest possible
(i.e. least "fixed") version of /var/lib/rpm that you still have.

Comment 2 Jari Aalto 2003-08-02 10:40:17 UTC
Further info: As the problem persistested I did the following with standard RH 
9:

1. Deletd /var/lib/rpm/Packages
2. rpm --initdb

.. Then I used an awk script to extract the /roor/install.log file to get the
list of installed files (that were supposed to be in RPM database) and make a 
shell for-loop to manually install all files again into newly created RPM DB.

cd /tmp/redhat/all-rpms-here/  (Downloaded from a Mirror site)

for $package in ....script to feed names ...
do
    rpm --nodeps --force -Uvh $package
done

It took a night to install all rPMS to "database" again, although the actual
packages were already in my machin. The informationwas just lost. However
even this method did not restore Database. rpm -qa locked up as like before.

After that I reinstalled whole RH9. I can't provide much further details 
for this error situation any more, so you can close this bug report after this 
message.

However, I do that the original situation left in my Debian disk, 

1. The initially corrupted database (Packages)
2. The one that was result of the db_dump + db_load to try to fix it, which
   gave the errors I mentioned (Packages.ok)

You can download these from below. Hopefully you get something out of those to 
prevent similar lock up and corruptions in the future.

http://tierra.dyndns.org:81/rh-data

This link will cease to exist after some time after I have posted it.













Comment 3 Jeff Johnson 2003-08-19 16:03:06 UTC
This problem appears resolved. There's little that can
be identified from looking at the database post mortem.

The messages indicate lots of headers failing simple sanity
checks. How that information got in the database cannot be
determined.