493520 – RPM reports database corrupt; rebuilddb can't rebuild

Bug 493520 - RPM reports database corrupt; rebuilddb can't rebuild

Summary: RPM reports database corrupt; rebuilddb can't rebuild

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	rpm
Sub Component:
Version:	rawhide
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Panu Matilainen
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-04-02 04:14 UTC by Neil
Modified:	2009-04-13 20:41 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-04-13 20:41:32 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Neil 2009-04-02 04:14:35 UTC

Description of problem:
After updating packages yesterday, the RPM database is corrupt, and --rebuild can't rebuild it.

Version-Release number of selected component (if applicable):
rpm-4.7.0-0.beta1.7.fc11.x86_64

How reproducible:
Always fails; not sure how it got into this state

Steps to Reproduce:
1. Any rpm query
2.
3.
  
Actual results:
>> rpm -qa
rpmdb: Thread/process 3408/140496957273840 failed: Thread died in Berkeley DB library
error: db4 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db3 -  (-30974)
error: cannot open Packages database in /var/lib/rpm
rpmdb: Thread/process 3408/140496957273840 failed: Thread died in Berkeley DB library
error: db4 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages database in /var/lib/rpm

>> rpm --rebuilddb
rpmdb: Thread/process 3408/140496957273840 failed: Thread died in Berkeley DB library
error: db4 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db3 -  (-30974)

Expected results:
RPM works normally

Additional info:
>> cd /var/lib/rpm; ls -l
total 85548
-rw-r--r--. 1 root root  5771264 Apr  1 14:53 Basenames
-rw-r--r--. 1 root root    12288 Apr  1 14:53 Conflictname
-rw-r--r--. 1 root root  2342912 Apr  1 14:53 Dirnames
-rw-r--r--. 1 root root 10608640 Apr  1 14:53 Filedigests
-rw-r--r--. 1 root root    32768 Apr  1 14:53 Group
-rw-r--r--. 1 root root    20480 Apr  1 14:53 Installtid
-rw-r--r--. 1 root root    86016 Apr  1 14:53 Name
-rw-r--r--. 1 root root 67358720 Apr  1 14:53 Packages
-rw-r--r--. 1 root root   671744 Apr  1 14:53 Providename
-rw-r--r--. 1 root root   229376 Apr  1 14:53 Provideversion
-rw-r--r--. 1 root root    12288 Mar 29 23:44 Pubkeys
-rw-r--r--. 1 root root   606208 Apr  1 14:53 Requirename
-rw-r--r--. 1 root root   421888 Apr  1 14:53 Requireversion
-rw-r--r--. 1 root root   167936 Apr  1 14:53 Sha1header
-rw-r--r--. 1 root root   159744 Apr  1 14:53 Sigmd5
-rw-r--r--. 1 root root    12288 Mar 31 23:22 Triggername
-rw-r--r--  1 root root        0 Apr  1 21:05 __db.000
-rw-r--r--  1 root root    24576 Apr  1 21:11 __db.001
-rw-r--r--  1 root root   229376 Apr  1 21:11 __db.002
-rw-r--r--  1 root root  1318912 Apr  1 21:11 __db.003
-rw-r--r--  1 root root   753664 Apr  1 20:25 __db.004

Comment 1 Panu Matilainen 2009-04-02 06:00:45 UTC

This isn't db corruption as such, it means that rpm died with (write) locks held and automatic cleanup fails to work. 'rm -f /var/lib/rpm/__*' should clear up the situation.

Did something unusual, like a yum traceback, happen in the last update?

Comment 2 Neil 2009-04-03 18:02:56 UTC

There were no yum errors that I know of, but I did control-C out of a few large "yum update" downloads.

A reboot (which I had to do for other reasons) seems to have removed the lock; the __* files are still there, but rpm commands now work fine.

Instead of just closing this issue, can we use it to track a request to have better error reporting in this situation?  Receiving a "could not get a write lock" error message would have been much better than "run database recovery"!  I've bumped the severity down to "low".

Comment 3 Jeff Johnson 2009-04-03 20:49:11 UTC

DB_RUNRECOVERY means exactly that; internal database
structure is inconsistent.

DB_RUNRECOVERY does not mean "could not acquire a write lock". Comment #2
is (presumably) referring only to the beginnning part of the message:
  rpmdb: Thread/process 3408/140496957273840 failed: Thread died in Berkeley DB
library
which is quite common when, say, rpm has an exceptional exit, including
segfaults, reboots, and kill -9 termination as causes.

Whether the root cause of DB_RUNRECOVERY is "corruption" or something else
cannot be answered without identifying the cause of the flaw.

Comment 4 Neil 2009-04-03 21:25:35 UTC

Would a copy of the /var/lib/rpm directory be useful to anyone?  I have a copy saved while it was reporting errors.

Comment 5 Jeff Johnson 2009-04-03 22:16:26 UTC

I've never succeeded in doing forensics on damaged rpmdb's. YMMV.

Even if the flaw triggering DB_RUNRECOVERY is diagnosed, one
has to then devise how the flaw was triggered, and
that's not at all easy.

But if you have a reproducible issue please report.

FWIW, rpm's needs from a database are modest, and
the usage of primary key index with a secondary lookup
of a header are demonstrably robust, or rpm would have
melted down years ago.

All that means (in short) is that
    rm -f /var/lib/rpm/__db*    # eliminate cache and locks
    rpm --rebuilddb -vv             # recreate primary key indcies
is usually enough to repair damage sufficiently.

Comment 6 Neil 2009-04-03 22:43:09 UTC

Although RPM seems to be working now, I ran rpm --rebuilddb -vv
just in case.

This problem doesn't seem to reproducible, but I'll keep watching to see if the frequent package updates trigger it again.

Comment 7 Neil 2009-04-13 20:41:32 UTC

10 days later, lots of package updates, and the problem hasn't occurred again.
Let's close this one.

Note You need to log in before you can comment on or make changes to this bug.