Bug 672932 - New rpm corrupts rpm database:error: db4 error(-30974) from dbenv->failchk
Summary: New rpm corrupts rpm database:error: db4 error(-30974) from dbenv->failchk
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: rpm
Version: 19
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Fedora Packaging Toolset Team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 680624 695599 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-26 18:12 UTC by Adam Williamson
Modified: 2013-05-20 10:19 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-05-20 10:19:33 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Adam Williamson 2011-01-26 18:12:06 UTC
Since the new rpm landed in Rawhide, my database has been corrupted two days in a row. Clearing /var/lib/rpm/__db* and doing rpm --rebuilddb seems to fix it, but it's worrying / annoying.

Error is:

error: db4 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db4 -  (-30974)
error: cannot open Packages database in /var/lib/rpm

I don't do anything particularly odd with rpm, I just update my system and install stuff with yum, really.

I don't really know how to debug rpm issues, but I did take a copy of the /var/lib/rpm/__db* files this last time, I can provide that for download if it'd help diagnose the problem.

Comment 1 Panu Matilainen 2011-01-27 12:00:44 UTC
I suppose the full error message also has a "Thread died in Berkeley DB library" message? This is typically because of process getting killed unexpectly while accessing the rpmdb, either will eg kill -9 (PackageKit used to do this but it was fixed a long time ago) or by crashing. The message is an indication that something bad happened with whatever previously accessed the rpmdb, and is probably not database "corruption" in the traditional sense.

First see if there are any segfault indications in /var/log/messages* related to rpm, yum and the like - that would be a rather obvious cause for the errors.

Another possibility is that there's something in current rawhide that's causing bad things one way or the other. The new rpm has been stable for me throughout its development but this has been on F13 and F14, I haven't been brave enough to run rawhide at this point.

Comment 2 Adam Williamson 2011-01-27 17:17:22 UTC
"I suppose the full error message also has a "Thread died in Berkeley DB
library" message?"

I've rebooted since, but I don't remember that, no. I don't have any segfaults in rpm-related tools in /var/log/messages, only flash (quelle surprise) and gnome-settings-daemon (which crashes all over the place in Rawhide). I guess it *could* somehow be caused by the gnome-settings-daemon crashes, if they take down something else, maybe PackageKit's icon?

It doesn't seem to be corrupted today, so that's progress. =) If the corruption happens again I'll check for that.

Comment 3 Joachim Frieben 2011-02-05 06:52:07 UTC
Although running "rawhide" for a week now, db corruption only occurred yesterday (and again this morning).

  $ rpm -q kernel
  error: db4 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error,
    run database recovery
  error: cannot open Packages index using db4 -  (-30974)
  error: cannot open Packages database in /var/lib/rpm
  error: db4 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error,
    run database recovery
  error: cannot open Packages database in /var/lib/rpm
  package kernel is not installed

Comment 4 Joachim Frieben 2011-02-08 10:13:09 UTC
No db corruption during the last days. Currently running "rawhide" including 
rpm-4.9.0-0.beta1.7.fc15.

Comment 5 Adam Williamson 2011-02-08 16:56:34 UTC
yeah, doesn't seem to have happened to me again either. should we close it?

Comment 6 Panu Matilainen 2011-02-11 07:55:54 UTC
Lets keep this open for now, rpm hasn't been "fixed" as we don't even know what's causing it, just need to keep an eye on it.

Comment 7 Adam Williamson 2011-02-12 05:49:40 UTC
roger, I'll let you know if it happens again.

Comment 8 Joachim Frieben 2011-02-14 14:00:33 UTC
This is what I got today after reinstalling a system yesterday from scratch from the F15 tree plus updates-testing:

$ yum install librsvg2-devel
error: db4 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run d
atabase recovery
error: cannot open Packages index using db4 -  (-30974)
error: cannot open Packages database in /var/lib/rpm
CRITICAL:yum.main:

Installed packages include:
- db4-4.8.30-2.fc14.x86_64
- kernel-2.6.38-0.rc4.git7.1.fc15.x86_64
- rpm-4.9.0-0.beta1.7.fc15.x86_64

Comment 9 Panu Matilainen 2011-02-14 14:11:50 UTC
The important part would be finding out what happened /before/ you get that message. The message itself isn't that interesting, it's just BDB's way of telling there has been an unclean shutdown so the environment is in "recovery needed" state (which is much like "fsck needed" flag on filesystems from times before journaling and doesn't mean there's "corruption" in the usual sense, just that a sanity check should be done).

Comment 10 Andy Lawrence 2011-02-15 23:29:52 UTC
This has happened to me several times as well.  Today when it happened I rebooted while the mandb was rebuilding?  Had just finished a yum update, did a kernel git pull/build etc.  Was a clean shutdown.

No yum/rpm related activity, that was initiated by me anyhow.

Nothing interesting in any log file, but i'm happy to post anything by request.

Comment 11 Andy Lawrence 2011-02-15 23:31:57 UTC
^ Duh, sorry..

rpm-4.9.0-0.beta1.7.fc15.1.x86_64
db4-4.8.30-3.fc15.x86_64

Comment 12 Panu Matilainen 2011-02-21 09:49:10 UTC
(In reply to comment #10)
> This has happened to me several times as well.  Today when it happened I
> rebooted while the mandb was rebuilding?  Had just finished a yum update, did a
> kernel git pull/build etc.  Was a clean shutdown.

Note that with "clean shutdown" I mean cleanly shutting down the rpmdb from any programs that use it, not system shutdown.

Anyway, please update to https://admin.fedoraproject.org/updates/rpm-4.9.0-0.rc1.3.fc15. There's a db-handling related bug fix that might be related to the "corruption" seen here, and it also gives more detailed error messages on rpmdb open failures than the beta did (so if you do see more db open errors with the updated version above, please copy-paste the entire message)

Comment 13 Joachim Frieben 2011-02-22 07:46:30 UTC
Database corruption still occurs for current F15 including packages:
- glibc-2.13.90-4.x86_64
- kernel-2.6.38-0.rc5.git5.1.fc15
- rpm-4.9.0-0.rc1.3.fc15

Comment 14 Panu Matilainen 2011-02-22 11:44:14 UTC
What's the exact error message you're getting (all of it, please)?

Comment 15 Michal Schmidt 2011-02-23 09:33:55 UTC
# rpm -q rpm
error: rpmdb: Thread/process 1354/140304961525504 failed: Thread died in Berkeley DB library
error: db4 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db4 -  (-30974)
error: cannot open Packages database in /var/lib/rpm
error: rpmdb: Thread/process 1354/140304961525504 failed: Thread died in Berkeley DB library
error: db4 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages database in /var/lib/rpm
package rpm is not installed

This is with rpm-4.9.0-0.rc1.3.fc15.x86_64.

Comment 16 Panu Matilainen 2011-02-23 11:01:33 UTC
Right, so it's the "thread died in bdb" thing. Which basically means that while executing BDB code within rpm, a process either was killed or crashed.

Crashes should be visible in /var/log/messages*, please check those out for segfaults. If there's nothing there in the window between last successful use and the point where rpm started whining about DB_RUNRECOVERY, it should be a case of getting killed.

The trick is finding what pid 1354 was... hmm, audit seems to do the trick that inotify can't - tell you the pid (+ process name). To all of you seeing this, please do the following (if you've disabled auditd service, then re-enable it first):

# auditctl -w /var/lib/rpm/Packages -p war -k rpmdb

With that in place, the next time you get DB_RUNRECOVERY, the next time you get the "thread died" error it's possible to see what it was, for example in the case of comment #15 with

# ausearch -k rpmdb --pid 1354

That'll at least give a starting point. For example PackageKit used to do kill -9 on its backend on search cancel (long since fixed, but maybe there's a new rpm killer on the loose :), that would've been easy to track down had I realized the audit trick above back then.

Comment 17 Panu Matilainen 2011-02-24 12:43:05 UTC
(In reply to comment #16)
> 
> # auditctl -w /var/lib/rpm/Packages -p war -k rpmdb

Actually better yet, make that

# echo "-w /var/lib/rpm/Packages -p war -k rpmdb" >> /etc/audit/audit.rules
# service auditd restart

This will make the rule survive boots and other auditd restarts.

Comment 18 Joachim Frieben 2011-02-24 19:05:50 UTC
(In reply to comment #14)
% yum update
error: rpmdb: Thread/process 3552/140081155237664 failed: Thread died in Berkeley DB library
error: db4 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db4 -  (-30974)
error: cannot open Packages database in /var/lib/rpm
CRITICAL:yum.main:

Error: rpmdb open failed

System got installed yesterday from the F15 Alpha RC1 x86_64 live CD. Error was reported for current F15 including packages:
- glibc-2.13.90-4.x86_64
- kernel-2.6.38-0.rc6.git2.1.fc15.x86_64
- rpm-4.9.0-0.rc1.3.fc15.x86_64

Comment 19 Michal Schmidt 2011-02-25 12:08:30 UTC
Joachim,
did you have the audit rule activated as suggested by Panu? If yes, please do:
ausearch -k rpmdb --pid 3552

If not, please follow the advice given in comment #17 to catch the event next time.

(I have two Rawhide systems. Enabled the audit rule on both of them. The error has not appeared yet since then.)

Comment 20 Andy Lawrence 2011-02-26 16:23:57 UTC
*** Bug 680624 has been marked as a duplicate of this bug. ***

Comment 21 Michal Schmidt 2011-03-01 10:23:57 UTC
Cought one today:

[root@dhcp-29-224 ~]# rpm -q rpm
error: rpmdb: Thread/process 2003/139838881830688 failed: Thread died in Berkeley DB library
error: db4 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db4 -  (-30974)
error: cannot open Packages database in /var/lib/rpm
error: rpmdb: Thread/process 2003/139838881830688 failed: Thread died in Berkeley DB library
error: db4 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages database in /var/lib/rpm
package rpm is not installed
[root@dhcp-29-224 ~]# ausearch -k rpmdb --pid 2003
----
time->Tue Mar  1 11:17:22 2011
type=PATH msg=audit(1298974642.741:104): item=0 name="/var/lib/rpm/Packages" inode=1440550 dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00
type=CWD msg=audit(1298974642.741:104):  cwd="/"
type=SYSCALL msg=audit(1298974642.741:104): arch=c000003e syscall=2 success=yes exit=8 a0=2a2c060 a1=0 a2=0 a3=0 items=1 ppid=1999 pid=2003 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="gdb" exe="/usr/bin/gdb" key="rpmdb"
----
time->Tue Mar  1 11:17:22 2011
type=PATH msg=audit(1298974642.741:105): item=0 name="/var/lib/rpm/Packages" inode=1440550 dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00
type=CWD msg=audit(1298974642.741:105):  cwd="/"
type=SYSCALL msg=audit(1298974642.741:105): arch=c000003e syscall=2 success=yes exit=8 a0=2a2c060 a1=0 a2=0 a3=20 items=1 ppid=1999 pid=2003 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="gdb" exe="/usr/bin/gdb" key="rpmdb"


I did not run gdb by hand today. I suspect it was called from abrt.

Comment 22 Michal Schmidt 2011-03-01 10:28:36 UTC
I looked at /var/log/messages and the timestamp 11:17:22 corresponds to shutting down of the system before reboot. At that time NetworkManager had an assertion failure (there's a known bug about that), ended with signal 6. This may have invoked abrt.

Comment 23 Michal Schmidt 2011-03-01 10:33:30 UTC
Actually, the log says:

Mar  1 11:17:22 dhcp-29-224 kernel: [   44.938009] NetworkManager[857]: <warn> caught signal 6. Generating backtrace...

so it looks like it's NM itself doing the backtrace, not abrt.

Comment 24 Panu Matilainen 2011-03-01 11:03:20 UTC
Excellent, thanks! I need to take a closer look, but there are couple of immediate suspects here: gdb appears to be overriding rpm's signal handling, and also rpm doesn't catch and cleanup on SIGABRT anyway as it doesn't expect to end up dying with SIGABRT from some faraway unrelated process.

Comment 25 Panu Matilainen 2011-03-01 13:04:43 UTC
This must be one of the wackiest rpm bugs I've met so far: rpmdb getting into DB_RUNRECOVERY state because of an assert() in NetworkManager. Gdb disabling rpm's signal handling might well be related, or then at least in the case of comment #21 gdb might have gotten killed by the shutdown process (and also comment #10 seems to be related to reboot).

Another thing is that since the switch to systemd, there's no more automatic removal of /var/lib/rpm/__db* files on startup, so reboot doesn't clear the DB_RUNRECOVERY state like it used to. It's quite possible this has been happening in F14 too for a while now but just gone unnoticed because of the cleanup done from rc.sysinit.

Comment 26 Michal Schmidt 2011-03-03 09:42:40 UTC
Is it always safe to remove the /var/lib/rpm/__db* files? Couldn't rpm itself do that whenever it notices the inconsistency?

Comment 27 Peter H. Jones 2011-04-12 12:49:10 UTC
*** Bug 695599 has been marked as a duplicate of this bug. ***

Comment 28 alan 2011-09-30 09:53:13 UTC
Well it's not caused by shutdowns

It just happened on my main box which has been ticking along for 43 days

Comment 29 Panu Matilainen 2011-09-30 10:27:02 UTC
Oh, "thread died in..." error can be caused by any number of things, at the time of the report one such thing was the bizarre interaction with NetworkManager crashing at shutdown triggering a chain of events leading to unclean rpmdb shutdown. What the "thread died in..." message says is that the rpmdb was uncleanly shutdown on the *previous* invocation (whether due to crash, forced killing, bug in rpm or other such issue)

One possible (likely even) cause for it to show up recently was bug 739492, fixed in rpm >= 4.9.1.1-3 for rawhide + f16, rpm >= 4.9.1.1-2 for f15.

Comment 30 Sami Farin 2011-10-03 17:06:38 UTC
It's better to run "cd /var/lib/rpm && db_recover" than to rm -f __db*

Comment 31 Fedora Admin XMLRPC Client 2012-04-13 23:05:50 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 32 Fedora Admin XMLRPC Client 2012-04-13 23:08:47 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 33 Fedora End Of Life 2013-04-03 16:45:41 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 34 Panu Matilainen 2013-05-20 10:19:33 UTC
The "rpm is broken after reboot because NetworkManager crashes at shutdown invoking gdb which queries rpmdb and gets killed because of shutdown" hysteria was addressed in 4.9.0-0.rc1.4 a long, long time ago:

commit 15f0143ab5c032e6d3c57c49f5e392811a6f96bd
Author: Panu Matilainen <pmatilai>
Date:   Tue Mar 1 16:00:42 2011 +0200

    - wipe out BDB environment at boot via tmpfiles.d


Note You need to log in before you can comment on or make changes to this bug.