Bug 52043

Summary: Weird RPM errors
Product: [Retired] Red Hat Linux Reporter: Chris Ricker <chris.ricker>
Component: rpmAssignee: Jeff Johnson <jbj>
Status: CLOSED WORKSFORME QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: high    
Version: 7.3   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-08-21 06:27:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris Ricker 2001-08-19 16:54:38 UTC
A tarball of my RPM db can be found at
<http://horton.gatech.edu/kaboom/rpmdb.tar.gz>

With roswell2, I've been getting lots of RPM errors.  Any operation
modifying the db results in messages like:

rpmdb: Item 0 on page 1274 hashes incorrectly
rpmdb: Item 0 on page 406 hashes incorrectly
rpmdb: Item 0 on page 574 hashes incorrectly
error: db3 error(-30985) from db->verify: DB_VERIFY_BAD: Database 
verification failed

though the operation then seems to complete.

Attempting a db_verify fails miserably:

[root@verdande rpm]# db_verify Packages      
db_verify: __db_pgin: Unknown db type: 0x20
db_verify: Packages: pgin failed for page 1816
db_verify: __db_pgin: Unknown db type: 0x64
db_verify: Packages: pgin failed for page 1817
db_verify: __db_pgin: Unknown db type: 0x74
db_verify: Packages: pgin failed for page 1818
db_verify: __db_pgin: Unknown db type: 0x73
db_verify: Packages: pgin failed for page 1820
db_verify: __db_pgin: Unknown db type: 0x50
db_verify: Packages: pgin failed for page 1821
db_verify: __db_pgin: Unknown db type: 0x57
db_verify: Packages: pgin failed for page 1822
db_verify: __db_pgin: Unknown db type: 0x69
db_verify: Packages: pgin failed for page 1823
db_verify: Overflow page 1823 has bogus prev_pgno value
db_verify: Overflow item incomplete on page 1823
db_verify: Overflow page 1818 has bogus prev_pgno value
db_verify: Overflow item incomplete on page 1818
db_verify: Overflow page 1820 has bogus prev_pgno value
db_verify: Overflow item incomplete on page 1820
db_verify: DB->verify: Packages: DB_VERIFY_BAD: Database verification failed


Attempting to dump and restore the db fails equally horribly:

[root@verdande rpm]# db_dump Packages-ORIG | db_load Packages      
db_dump: __db_pgin: Unknown db type: 0x69
db_dump: Packages-ORIG: pgin failed for page 1823
db_dump: DB->stat: Invalid argument
[root@verdande rpm]# 


Any ideas?

Comment 1 Jeff Johnson 2001-08-20 15:36:25 UTC
Hmmm, you're hosed pretty good, I'd suggest a fresh install.
Do you have any idea where the damage came from?

Running
    strace -o /tmp/xxx db_verify /var/lib/Packages
shows
    pread(7, "\0\0\0\0\0\0\0\0\232\0\0\0!\0\0\0B\0\0\0\20\0@\1\1\5\370"...,
1024, 157696) = 1024
    pwrite(7, "\0\0\0\0\0\0\0\0\222\0\0\0q\0\0\0\5\1\0\0\f\0\360\1\1\5"...,
1024, 149504) = 1024
    pread(4, ". [builtins]         (1)  - bash"..., 4096, 7438336) = 4096
    write(2, "db_verify: ", 11)             = 11
    write(2, "__db_pgin: Unknown db type: 0x20", 32) = 32
    write(2, "\n", 1)                       = 1

(Note the pread to location 7438336 == 0x718000)

Examining /var/lib/rpm/Packages with hexedit at location 0x718000
shows the "damage" (this stuff shouldn't be in Packages).

00718000   2E 20 5B 62  75 69 6C 74  69 6E 73 5D  20 20 20 20  . [builtins]
00718010   20 20 20 20  20 28 31 29  20 20 2D 20  62 61 73 68       (1)  - bash
00718020   20 62 75 69  6C 74 2D 69  6E 20 63 6F  6D 6D 61 6E   built-in comman
00718030   64 73 2C 20  73 65 65 20  62 61 73 68  28 31 29 0A  ds, see bash(1).
00718040   2E 2E 20 67  72 6F 66 66  5F 6D 61 6E  20 5B 67 72  .. groff_man [gr
00718050   6F 66 66 5F  6D 61 6E 5D  20 28 37 29  20 20 2D 20  off_man] (7)  -
00718060   67 72 6F 66  66 20 60 6D  61 6E 27 20  6D 61 63 72  groff `man' macr
00718070   6F 73 20 74  6F 20 73 75  70 70 6F 72  74 20 67 65  os to support ge
00718080   6E 65 72 61  74 69 6F 6E  20 6F 66 20  6D 61 6E 20  neration of man
00718090   70 61 67 65  73 20 2E 0A  2E 6C 64 61  70 72 63 20  pages ...ldaprc
007180A0   5B 6C 64 61  70 5D 20 20  20 20 20 20  20 28 35 29  [ldap]       (5)
007180B0   20 20 2D 20  6C 64 61 70  20 63 6F 6E  66 69 67 75    - ldap configu
007180C0   72 61 74 69  6F 6E 20 66  69 6C 65 0A  2E 6E 65 74  ration file..net
007180D0   72 63 20 5B  6E 65 74 72  63 5D 20 20  20 20 20 20  rc [netrc]
007180E0   20 28 35 29  20 20 2D 20  75 73 65 72  20 63 6F 6E   (5)  - user con
007180F0   66 69 67 75  72 61 74 69  6F 6E 20 66  6F 72 20 66  figuration for f
00718100   74 70 0A 2F  65 74 63 2F  61 6E 61 63  72 6F 6E 74  tp./etc/anacront
00718110   61 62 20 5B  61 6E 61 63  72 6F 6E 74  61 62 5D 20  ab [anacrontab]
00718120   28 35 29 20  20 2D 20 63  6F 6E 66 69  67 75 72 61  (5)  - configura
00718130   74 69 6F 6E  20 66 69 6C  65 20 66 6F  72 20 61 6E  tion file for an
00718140   61 63 72 6F  6E 0A 2F 65  74 63 2F 61  75 74 6F 2E  acron./etc/auto.
00718150   6D 61 73 74  65 72 20 5B  61 75 74 6F  5D 20 28 35  master [auto] (5
00718160   29 20 20 2D  20 4D 61 73  74 65 72 20  4D 61 70 20  )  - Master Map
00718170   66 6F 72 20  61 75 74 6F  6D 6F 75 6E  74 65 72 0A  for automounter.
00718180   2F 65 74 63  2F 68 6F 73  74 73 2E 65  71 75 69 76  /etc/hosts.equiv
00718190   20 5B 68 6F  73 74 73 5D  20 28 35 29  20 20 2D 20   [hosts] (5)  -
007181A0   6C 69 73 74  20 6F 66 20  68 6F 73 74  73 20 61 6E  list of hosts an
007181B0   64 20 75 73  65 72 73 20  74 68 61 74  20 61 72 65  d users that are
007181C0   20 67 72 61  6E 74 65 64  20 74 72 75  73 74 65 64   granted trusted
007181D0   20 72 20 63  6F 6D 6D 61  6E 64 20 61  63 63 65 73   r command access




Comment 2 Chris Ricker 2001-08-21 06:27:42 UTC
I did re-install from scratch, and now I'm working again.

Most likely, this was due to the sporadic disk corruption I've been seeing on
this system with both ext2 and ext3 ever since RH 7.1 (see 35981)

I did notice that a bit of broken-ness on the installer's part caused it to
install sendmail even though postfix was already installed, and that left both
sendmail and postfix in a broken state (due to same binaries owned by two
different packages, and partial sets of binaries from each package being on the
system).  Not sure how relevant that is, and if /usr/sbin/sendmail being owned
by two different packages could have caused the breakage you see; I think the
disk corruption is a more likely suspect.

Comment 3 Jeff Johnson 2001-08-21 12:40:23 UTC
Disk corruption seems more likely then sendmail <-> postfix problems.