Bug 54802

Summary: rpm-4.0.3-1.04 database corruption, db_dump cannot fix
Product: [Retired] Red Hat Public Beta Reporter: Need Real Name <rhbz>
Component: rpmAssignee: Jeff Johnson <jbj>
Status: CLOSED WORKSFORME QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: roswell   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-10-19 12:48:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Need Real Name 2001-10-19 12:48:22 UTC
Description of Problem:

rpm database gets corrupted by normal operations
(rpm -i, rpm -e, rpm -U)
`db_dump Packages' will hang, 100% CPU because of a loop

Version-Release number of selected component (if applicable):
rpm-4.0.3-1.04
db31-3.1.17-1

How Reproducible:
The db_dump hang 100% reproducible
As for the database corruption, I don't know, I only have weekly
snapshots of /var/lib/rpm/* and I cannot reconstruct in which order
packages were installed. I  know they are logged in /var/log/rpmpkgs.*
but rpm -i foo.rpm, rpm -i bar.rpm, rpm -e foo.rpm will not show 
foo.rpm in /var/log/rpmpkgs

Steps to Reproduce:
db_dump Packages
will loop in hash/hash_stat.c:85-94
with pgno = 2922, 2923, 2922, 2923 etc.
        for (sp->hash_free = 0, pgno = hcp->hdr->dbmeta.free;
            pgno != PGNO_INVALID;) {
                ++sp->hash_free;

                if ((ret = memp_fget(dbp->mpf, &pgno, 0, &h)) != 0)
                        goto err;

                pgno = h->next_pgno;
                (void)memp_fput(dbp->mpf, h, 0);
        }

Hmm, which makes sense because it started with this message:
error: db3 error(-30998) from db->close: DB_INCOMPLETE: Cache flush was
unable to complete
rpmdb: Overflow page 3401 of invalid type
rpmdb: Non-invalid page 2922 on free list
error: db3 error(-30985) from db->verify: DB_VERIFY_BAD: Database
verification failed

Looks like page 2922 was reused in spite of the warning
that it was invalid  ?

Additional Information:
This has happened with all rpm-4.03 versions
I use the rpm panacea `rm -f /var/lib/rpm/__db*' whenever
I see `Program version X doesn't match environment version Y'
or `Invalid argument'

I use this shell script whenever I get the
`error: db3 error(...)' message
============================================================
#!/bin/sh
RPMDBDIR=/var/lib/rpm
cp -pr $RPMDBDIR $RPMDBDIR.$$
cd $RPMDBDIR
for file in  Basenames Conflictname Dirnames Group \
             Installtid Name Packages Providename \
             Provideversion Requirename \
             Requireversion Triggername
do
 mv $file $file.maybedamaged
 ( db31_dump $file.maybedamaged | db31_load $file ) && rm -f $file.maybedamaged
done
rpm --rebuilddb
=============================================================

Now even *that* failed ..

Comment 1 Jeff Johnson 2001-10-23 15:17:32 UTC
If db_dump cannot fix, then this is a problem with db3, not rpm.

FWIW, rpm-4.0.3 uses an internal copy of db-3.3.11, not db-3.1.17.

Again FWIW. only Packages is critical, all the other indices can be
rebuilt with --rebuilddb.

You might try db-3.2.9 or later, that might have a better db_dump.
However, w/o a reproducuible test case, or a broken database,
or some angle on the problem.