Red Hat Bugzilla – Bug 208674
RPM database is too easily corrupted
Last modified: 2008-08-02 19:40:34 EDT
Description of problem:
I've been bitten by this bug twice: I've got my computer busy building packages,
something crashes, and the next time I try to run rpm I get the error:
rpmdb: PANIC: fatal region error detected; run recovery
error: db4 error(-30977) from dbenv->open: DB_RUNRECOVERY: Fatal error, run
error: cannot open Packages index using db3 - (-30977)
error: cannot open Packages database in /var/lib/rpm
If I follow the instructions noted on some mailing lists to remove the __db.00?
files and run "rpm --rebuilddb", it might make rpm usable again, but I find that
a great many packages that used to be in the database are now missing.
I've encountered this problem both on Fedora Core 5 (with no updates) and Fedora
Core 6 RC2.
I should note that when I encountered the problem on FC6RC2, my computer was in
the middle of an rpmbuild when the system locked up (video driver issue). It
had not written any rpm's yet, but it appeared that the compile stage was
complete. No other rpm-related programs were running.
This is most likely the same problem that was reported in bug #115182, which had
been closed as not-a-bug. In my opinion however, rpm should *never* leave the
database in an unrecoverable state, especially if (as appeared to be the case)
it is not making any changes to the data. I don't know what the db's
capabilities are, but at a minimum I would suggest that a working copy of the
database files be made so that if something crashes, the original data can be
Version-Release number of selected component (if applicable):
P.S.: If it would help, I can upload a copy of the rpm database files that had
gotten corrupted. After my prior experience with trying to fix the database
under FC5, when the same thing happened in FC6 I made a copy of the files
*before* trying to recover them. It's 29MB bzipped though.
While you're at it, you should file this bug against the kernel too, since
losing power during disk operations can result in the need to fsck the
filesystem, and files may end up stuck in /lost+found instead of where they
belong. Definitely an ext2 bug.
Although disk corruption can and does happen after a kernel crash or power loss,
the corruption is usually limited to files which were being written at the
moment the system went down, not the entire filesystem. Plus, e2fsck more often
than not will successfully restore your filesystem to working order with minimal
or no loss of files. This is especially true of journaling filesystems. So no,
I don't see a problem with ext2. Just rpm.
(P.S.: if you lose more than just the files you were working on when your system
goes down, then that would be an ext2 or e2fsck bug, and you should report it.)
(In reply to comment #3)
> Plus, e2fsck more often than not will successfully restore your filesystem
> to working order with minimal or no loss of files. This is especially true
> of journaling filesystems.
The same is true of RPM. The appropriate tools used according to
well-publicized instructions almost never result in the loss of data.
> So no, I don't see a problem with ext2. Just rpm.
That's your experience. I've had far more data loss due to filesystem failures
Fedora Core 5 and Fedora Core 6 are, as we're sure you've noticed, no longer
test releases. We're cleaning up the bug database and making sure important bug
reports filed against these test releases don't get lost. It would be helpful if
you could test this issue with a released version of Fedora or with the latest
development / test release. Thanks for your help and for your patience.
[This is a bulk message for all open FC5/FC6 test release bugs. I'm adding
myself to the CC list for each bug, so I'll see any comments you make after this
and do my best to make sure every issue gets proper attention.]
Considering the timing, this is most likely yet-another-dupe of the mmap() issue
of 2.6.18-19 kernels
*** This bug has been marked as a duplicate of 213963 ***