Bug 89738
Summary: | rpm -e causes (or reveals??) RPM database corruption under some circumstances | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Barry K. Nathan <barryn> |
Component: | rpm | Assignee: | Jeff Johnson <jbj> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike McLean <mikem> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 9 | CC: | mitr |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
URL: | http://math.uci.edu/~bnathan/.vlr/ | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-02-07 23:40:10 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Barry K. Nathan
2003-04-27 07:52:12 UTC
Just to confirm: This happens on Red Hat 9 with LD_ASSUME_KERNEL=2.2.5? I believe I know what's happening if so. There's a window between the database opened O_RDONLY and O_RDWR that intense erase/install concurrent access can exercise. Thanks muchly for the QA work! Yes, it (the corruption with simultaneous installs/erases) happens on Red Hat 9 with LD_ASSUME_KERNEL=2.2.5. However, the more interesting aspect of this bug IMO is that you can have a database that rpmdb_verify sees nothing wrong with -- and then, after rpm -e (with or without LD_ASSUME_KERNEL), rpmdb_verify suddenly sees problems. even though rpm -e showed no error messages. The easiest way to reproduce this is: 1. Move your existing copy of /var/lib/rpm elsewhere. 2. Make a new directory /var/lib/rpm. 3. Extract http://math.uci.edu/~bnathan/.vlr/vlr4.tar.bz2 (or .gz) into /var/lib/rpm. 4. Run rpmdb_verify; no error messages. 5. rpm -e --justdb kernel-utils; no error messages. 6. Run rpmdb_verify again; error messages appear now. Reproduced: # /usr/lib/rpm/rpmdb_verify Packages # rpm -e --justdb --noscripts --notriggers kernel-utils # /usr/lib/rpm/rpmdb_verify Packages db_verify: Page 5427: overflow page of invalid type 0 db_verify: DB->verify: Packages: DB_VERIFY_BAD: Database verification failed And not cache related: # rm __db* rm: remove regular file `__db.001'? y rm: remove regular file `__db.002'? y rm: remove regular file `__db.003'? y # /usr/lib/rpm/rpmdb_verify Packages # rpm -e --justdb --noscripts --notriggers kernel-utils # /usr/lib/rpm/rpmdb_verify Packages db_verify: Page 5427: overflow page of invalid[root@yarmouth rpm] db_verify: DB->verify: Packages: DB_VERIFY_BAD: Database verification failed Fix is pretty simple however: # mv Packages Packages-ORIG # /usr/lib/rpm/rpmdb_dump Packages-ORIG | /usr/lib/rpm/rpmdb_load Packages # /usr/lib/rpm/rpmdb_verify Packages type 0 I'll try to take a look at an strace, but I suspect that there's a bug, probably because of the use of a hash to store headers; large parts of the header are kept mostly in overflow pages. Here's what I'm talking about: # /usr/lib/rpm/rpmdb_stat -d Packages 61561 Hash magic number. 8 Hash version number. Flags: 4096 Underlying database page size. 0 Specified fill factor. 679 Number of keys in the database. 1 Number of data items in the database. 3 Number of hash buckets. 1024 Number of bytes free on bucket pages (92% ff). 5427 Number of overflow pages. 1395134 Number of bytes free in overflow pages (94% ff). 1 Number of bucket overflow pages. 1004 Number of bytes free in bucket overflow pages (75% ff). 0 Number of duplicate pages. 0 Number of bytes free in duplicate pages (0% ff). 0 Number of pages on the free list. Lots and lots of data on overflow pages. But, yes, there's a bug here too. After 2+ years of thrashing this problem around, it turns out that indeed, rpm as built since RHL 9 works *only* on NPTL systems because, well, that's how rpm is built. There's some hackery in RHEL packages to work around the problem for those who *must* run with LD_ASSUME_KERNEL, but there plain and simply ain't no reason to try to fix this problem in FC since NPTL is in kernel-2.6.x The precise reproducer is (and was) gratefully received. You're also more than welcome on <rpm-devel.duke.edu> even if you prefer lurking ;-) |