Description of problem: After updating packages yesterday, the RPM database is corrupt, and --rebuild can't rebuild it. Version-Release number of selected component (if applicable): rpm-4.7.0-0.beta1.7.fc11.x86_64 How reproducible: Always fails; not sure how it got into this state Steps to Reproduce: 1. Any rpm query 2. 3. Actual results: >> rpm -qa rpmdb: Thread/process 3408/140496957273840 failed: Thread died in Berkeley DB library error: db4 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery error: cannot open Packages index using db3 - (-30974) error: cannot open Packages database in /var/lib/rpm rpmdb: Thread/process 3408/140496957273840 failed: Thread died in Berkeley DB library error: db4 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery error: cannot open Packages database in /var/lib/rpm >> rpm --rebuilddb rpmdb: Thread/process 3408/140496957273840 failed: Thread died in Berkeley DB library error: db4 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery error: cannot open Packages index using db3 - (-30974) Expected results: RPM works normally Additional info: >> cd /var/lib/rpm; ls -l total 85548 -rw-r--r--. 1 root root 5771264 Apr 1 14:53 Basenames -rw-r--r--. 1 root root 12288 Apr 1 14:53 Conflictname -rw-r--r--. 1 root root 2342912 Apr 1 14:53 Dirnames -rw-r--r--. 1 root root 10608640 Apr 1 14:53 Filedigests -rw-r--r--. 1 root root 32768 Apr 1 14:53 Group -rw-r--r--. 1 root root 20480 Apr 1 14:53 Installtid -rw-r--r--. 1 root root 86016 Apr 1 14:53 Name -rw-r--r--. 1 root root 67358720 Apr 1 14:53 Packages -rw-r--r--. 1 root root 671744 Apr 1 14:53 Providename -rw-r--r--. 1 root root 229376 Apr 1 14:53 Provideversion -rw-r--r--. 1 root root 12288 Mar 29 23:44 Pubkeys -rw-r--r--. 1 root root 606208 Apr 1 14:53 Requirename -rw-r--r--. 1 root root 421888 Apr 1 14:53 Requireversion -rw-r--r--. 1 root root 167936 Apr 1 14:53 Sha1header -rw-r--r--. 1 root root 159744 Apr 1 14:53 Sigmd5 -rw-r--r--. 1 root root 12288 Mar 31 23:22 Triggername -rw-r--r-- 1 root root 0 Apr 1 21:05 __db.000 -rw-r--r-- 1 root root 24576 Apr 1 21:11 __db.001 -rw-r--r-- 1 root root 229376 Apr 1 21:11 __db.002 -rw-r--r-- 1 root root 1318912 Apr 1 21:11 __db.003 -rw-r--r-- 1 root root 753664 Apr 1 20:25 __db.004
This isn't db corruption as such, it means that rpm died with (write) locks held and automatic cleanup fails to work. 'rm -f /var/lib/rpm/__*' should clear up the situation. Did something unusual, like a yum traceback, happen in the last update?
There were no yum errors that I know of, but I did control-C out of a few large "yum update" downloads. A reboot (which I had to do for other reasons) seems to have removed the lock; the __* files are still there, but rpm commands now work fine. Instead of just closing this issue, can we use it to track a request to have better error reporting in this situation? Receiving a "could not get a write lock" error message would have been much better than "run database recovery"! I've bumped the severity down to "low".
DB_RUNRECOVERY means exactly that; internal database structure is inconsistent. DB_RUNRECOVERY does not mean "could not acquire a write lock". Comment #2 is (presumably) referring only to the beginnning part of the message: rpmdb: Thread/process 3408/140496957273840 failed: Thread died in Berkeley DB library which is quite common when, say, rpm has an exceptional exit, including segfaults, reboots, and kill -9 termination as causes. Whether the root cause of DB_RUNRECOVERY is "corruption" or something else cannot be answered without identifying the cause of the flaw.
Would a copy of the /var/lib/rpm directory be useful to anyone? I have a copy saved while it was reporting errors.
I've never succeeded in doing forensics on damaged rpmdb's. YMMV. Even if the flaw triggering DB_RUNRECOVERY is diagnosed, one has to then devise how the flaw was triggered, and that's not at all easy. But if you have a reproducible issue please report. FWIW, rpm's needs from a database are modest, and the usage of primary key index with a secondary lookup of a header are demonstrably robust, or rpm would have melted down years ago. All that means (in short) is that rm -f /var/lib/rpm/__db* # eliminate cache and locks rpm --rebuilddb -vv # recreate primary key indcies is usually enough to repair damage sufficiently.
Although RPM seems to be working now, I ran rpm --rebuilddb -vv just in case. This problem doesn't seem to reproducible, but I'll keep watching to see if the frequent package updates trigger it again.
10 days later, lots of package updates, and the problem hasn't occurred again. Let's close this one.