Bug 169145

Summary: rpm - a nasty database corruption incident
Product: [Fedora] Fedora Reporter: Michal Jaegermann <michal>
Component: rpmAssignee: Paul Nasrat <nobody+pnasrat>
Status: CLOSED CANTFIX QA Contact: Mike McLean <mikem>
Severity: medium Docs Contact:
Priority: medium    
Version: 4   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-11-28 17:52:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michal Jaegermann 2005-09-23 17:05:33 UTC
Description of problem:

While attempting to install the latest kernel from updates rpm refused
to cooperate with the following error messages:

rpmdb: PANIC: fatal region error detected; run recovery
error: db4 error(-30977) from dbenv->open: DB_RUNRECOVERY: Fatal error, run
database recovery
error: cannot open Packages index using db3 -  (-30977)
error: cannot open Packages database in /var/lib/rpm

The first attempt to 'rpm --rebuilddb' resulted in the same error as above.
The second attempt destroyed the whole database leaving in it two random
packages.  In a retrospect "run recovery" probably meant something else
although it is not that clear what. 'db_dump' followed by 'db_load'?

Luckily a pretty recent backup was available and after restoring /var/lib/rpm/
the whole system was brought back into a consistent state.

Version-Release number of selected component (if applicable):
rpm-4.4.1-22

How reproducible:
Hopefuly not often.

Comment 1 Paul Nasrat 2005-10-25 20:44:26 UTC
What was the initial system state? What packages were in the upgrade transaction?

Comment 2 Michal Jaegermann 2005-10-25 22:53:06 UTC
> What was the initial system state?

You mean what was installed?  Everything was updated to the current
available level at that time.  My logs show that on Sept-22, before I got
hit by that error the following packages were installed:

man-pages.noarch 1.67-8
xorg-x11-Mesa-libGL.i386 6.8.2-37.FC4.49.2
xorg-x11-Mesa-libGLU.i386 6.8.2-37.FC4.49.2
xorg-x11-deprecated-libs-devel.i386 6.8.2-37.FC4.49.2
xorg-x11-deprecated-libs.i386 6.8.2-37.FC4.49.2
xorg-x11-devel.i386 6.8.2-37.FC4.49.2
xorg-x11-font-utils.i386 6.8.2-37.FC4.49.2
xorg-x11-libs.i386 6.8.2-37.FC4.49.2
xorg-x11-tools.i386 6.8.2-37.FC4.49.2
xorg-x11-twm.i386 6.8.2-37.FC4.49.2
xorg-x11-xauth.i386 6.8.2-37.FC4.49.2
xorg-x11-xdm.i386 6.8.2-37.FC4.49.2
xorg-x11-xfs.i386 6.8.2-37.FC4.49.2
xorg-x11.i386 6.8.2-37.FC4.49.2

> What packages were in the upgrade transaction?

Again from what I see in my logs these were, the next day,

shadow-utils.i386 2:4.0.12-4.FC4
kernel.i686 2.6.12-1.1456_FC4

and the problem struck during a new kernel installation.

After that I restored rpm databases from a backup, resynchronized with
the real state of the system and so far, knock-on-the-wood, everything
in that area works fine. 'shadow-utils' package in the meantime got
replaced with 4.0.12-5.FC4 and kernel is also not the same (2.6.13-1.1532_FC4
at this moment).

Both rpm, 4.4.1-22, and db4, 4.3.27-3, are still the same as at the time
of that trouble.


Comment 3 Jeff Johnson 2005-11-14 03:03:18 UTC
This problem is unlikely to be reproducible. A fix is exactly as likely as a reproducer.

Comment 4 Michal Jaegermann 2005-11-14 07:25:52 UTC
> This problem is unlikely to be reproducible.
I think so too.  The main reason behind this report was that maybe somebody
have seen something similar.  I would actually suspect some rare gotchas in
an underlying database.

Comment 5 Paul Nasrat 2005-11-28 17:52:02 UTC
Closing out.  If you get a reproducible case please reopen.