Bug 143532 - rpm update from U4 does not delete /var/lib/rpm/__db*
rpm update from U4 does not delete /var/lib/rpm/__db*
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: up2date (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Pradeep Kilambi
Brandon Perkins
: Reopened
: 143563 (view as bug list)
Depends On:
Blocks: 218663
  Show dependency treegraph
 
Reported: 2004-12-21 18:03 EST by Milan Kerslager
Modified: 2014-01-21 17:50 EST (History)
22 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-22 11:52:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Install log (8.67 KB, text/plain)
2004-12-22 10:23 EST, Milan Kerslager
no flags Details
up2date log (4.75 KB, text/plain)
2004-12-22 10:26 EST, Milan Kerslager
no flags Details
up2date log from /var/log/up2date (43.60 KB, text/plain)
2004-12-22 16:41 EST, Scott Russell
no flags Details
pkgs before and after rpm db failure (5.20 KB, text/plain)
2004-12-22 18:15 EST, Scott Russell
no flags Details
Script to attempt to replicate (1.38 KB, text/plain)
2004-12-23 10:47 EST, Paul Nasrat
no flags Details
output of rpm --last -qa (53.69 KB, text/plain)
2005-01-05 20:12 EST, Scott Thistle
no flags Details

  None (edit)
Description Milan Kerslager 2004-12-21 18:03:16 EST
After automatic update by up2date I've got broken rpm because U4
update did not deleted /var/lib/rpm/__db* even %post tryed to.

This could be related to my LANG=cs_CZ.UTF-8 in /etc/sysconfig/i18n
but I did not tested it.

Manual removing of /var/lib/rpm/__db* works.
Comment 1 Joe Orton 2004-12-21 18:07:37 EST
Reproduced on systems without Czech locale.
Comment 2 John Dickerson 2004-12-22 01:13:47 EST
We've been seeing problems related to this since U4 hit started
rolling out earlier today.  It appears that RHN is pushing out U4 in a
series of updates, rather than one mega updates.  

As best I can tell, if a machine is rebooted after the first wave of
updates (which includes kernel-2.4.21-27.EL and rpm-4.2.3-13) the
subsequent update bundles are fine.  But if a machine is still running
the 2.4.21-20.0.1.EL kernel after it has been installed with the first
wave of updates, then RPM will fail with a db4 error.  Which means
that up2date or rhn_check will also fail.

The 10 cent solution is to reboot into the latest kernel, then run
up2date or rhn_check, and everything is happy.

If you can't reboot, you can do this (bleh):
/bin/rm /var/lib/rpm/__db.00* ; rpm --rebuilddb

Comment 3 Henrik Schack Jensen 2004-12-22 02:08:50 EST
Well rebooting doesn't help in all cases, I had all my servers
rebooted after the first wave of updates yesterday (include the new
kernel)

my guess is that :
rm /var/lib/rpm/__db.00* ; rpm --rebuilddb
is the only thing working
Comment 4 Milan Kerslager 2004-12-22 04:17:49 EST
rpm --rebuilddb is only worth to convert whole database IMHO.
Simple rm /var/lib/rpm/__db.00* allowed me to use rpm database again.
Comment 5 Paul Nasrat 2004-12-22 09:14:14 EST
The reason reboots work for some will be the rm -f /var/lib/rpm/__db*
in /etc/rc.d/rc.sysinit I believe - but as noted simple doing that
should be enough.
Comment 6 Xander D Harkness 2004-12-22 09:33:13 EST
*** Bug 143563 has been marked as a duplicate of this bug. ***
Comment 7 Paul Nasrat 2004-12-22 09:34:06 EST
If anyone has a machine that just starts with this can you rm -f
/var/lib/rpm/__db* and attach the output of rpm --last -qa here please.
Comment 8 Paul Nasrat 2004-12-22 09:48:39 EST
comment #2 - John was glibc also in that first batch of updates?
Comment 10 Milan Kerslager 2004-12-22 10:23:03 EST
Created attachment 109022 [details]
Install log

As you see the problem arised Tue Dec 21 17:56:51 2004.
Comment 11 Milan Kerslager 2004-12-22 10:26:53 EST
Created attachment 109024 [details]
up2date log

There is the output of: grep install /var/log/up2date | tac
Comment 12 Milan Kerslager 2004-12-22 10:35:50 EST
The machine has been rebooted Tue Dec 21 01:28 (before accident) and
Wed Dec 22 00:40 (after accident because I compiled own kernel without
siimage driver, just in time I filled this bug - my timezone is MET -
just 1 hour ahead of GMT [with DST so 2 hours for now]).

This seems that rhnsd worked since own restart or something like this
because there was some updates after new rpm has been installed.
Comment 13 Paul Nasrat 2004-12-22 12:51:32 EST
Was everyone experiencing this using rhnsd to get updates as they
appear, or polling fairly regularly?

I've tried on 2.4.21-20.0.1.EL doing the updates from the attachment
in comment #10 for the window where it stopped working, but I didn't
hit it. I'm going continue to try and get a reliable reproducer.
Comment 14 Xander D Harkness 2004-12-22 13:10:12 EST
My servers were set to update automatically and connect every two
hours to RHN.

Approximately half the servers updated successfully, the other half
had exactly the same problem as described.
Comment 16 Milan Kerslager 2004-12-22 16:31:42 EST
The same for me. I'm using rhnsd in standard configuration with active
autoupdates.
Comment 17 Scott Russell 2004-12-22 16:38:55 EST
Seen at IBM for several RHEL 3 systems. All systems were configured to
use rhnsd + auto updates. /var/log/update attached.
Comment 18 Scott Russell 2004-12-22 16:41:42 EST
Created attachment 109051 [details]
up2date log from /var/log/up2date
Comment 19 Paul Nasrat 2004-12-22 17:17:04 EST
Scott what time did the failure occur so I can compare against the logs.
Comment 20 Adrian Likins 2004-12-22 17:25:33 EST
for folks that see the problem... any info about what state
the box was in before the issue might be useful. kernel
versions, approximate update level (u2/u3/etc). 

Comment 22 Milan Kerslager 2004-12-22 18:03:01 EST
My systems was fully updated. I have some non-RH (or modified)
packages in the system too but not related to rpm or glibc. The
connection to the Internet is pernament.
Comment 23 Scott Russell 2004-12-22 18:15:33 EST
Created attachment 109057 [details]
pkgs before and after rpm db failure

Comment #19: 

Paul - I'm not sure what time it failed. I went to look at the U4 update status
today and noted the RPM DB was fouled. 

From the looks of it U4 started coming down at Dec 20 15:05:39. The U4 rpm
package came down at Dec 21 11:06:34 and then later on, according to the log,
additional installs took place. I'm assuming that rpm was still functional at
that point unless this is a logging bug of up2date. All log entries on Dec 22
(today) were done by me after clearing /var/lib/rpm/__db*. I've attached a
cleaned up version of /var/log/up2date which should clearly show the packages
that were updated prior to and after the rpm db failure.

Comment #20:

System for attached log was booted to kernel-smp-2.4.21-20.EL booted and had
kernel-smp-2.4.21-20.0.1.EL installed but not yet booted. All other packages
were at U3 + current errata levels.
Comment 24 jeff 2004-12-22 21:37:59 EST
Same problem here on several servers...

Latest RHE3 patched whit plain RPM from up2date, 
problem came right after the upgrade of the latest
up2date bunch of rpm..

even after a rebuilddb they apear to corrupt later on
by itself..

Comment 25 Paul Nasrat 2004-12-23 10:04:33 EST
Comment #24 did you rm -f /var/lib/__db*, rebuilddb should not be
needed.  Also can you define "corrupt" with actual error messages so
we can confirm it's the same problem
Comment 26 Paul Nasrat 2004-12-23 10:47:31 EST
Created attachment 109083 [details]
Script to attempt to replicate

This is what I've been using to try and replicate on RHEL3 U3 to reproduce the
transactions illustrated by various peoples logs, however I've not hit the
error.
Comment 27 Tomas Mraz 2004-12-26 16:29:40 EST
Same problem here. The update was scheduled from RHN web. rhnsd
updated many packages one among them was rpm. Then other bug came in
(bind update stopped named and so it was the last one which happened).
After I've logged in the rpm was broken -> I removed db cache and I
did rpm --rebuilddb.
Comment 28 Lance Davis 2004-12-28 21:35:42 EST
I have had the same problem test upgrading CentOS 3.1 to 3.4 (which is
U4) using yum.

Will try to reproduce.
Comment 29 Milan Kerslager 2004-12-29 02:24:26 EST
I tryed to reproduce with yum with 1/3 fail of 10. It seems that this 
may be related to multithreaded support in rpm. Ie. when %post is 
executed, it depends on writing timing according to state of RPM 
database (state of updating of RPM database)...
Comment 30 Tom Diehl 2004-12-29 15:29:17 EST
FWIW, I just used used yum to upgrade a machine to U4 and it left the
locks behind. Removing the locks appears to have resolved the issue
for me.
Comment 31 Lance Davis 2004-12-29 22:25:11 EST
This bug seems related to #115152 .

I am seeing the problem after just updating rpm from 4.2.3-10 to 4.2.3-13

kernel is   2.4.21-27.0.1.EL  glibc 2.3.2-95.30

Immediately after running rpm -Fhv rpm-* the rpm database cannot be
opened :-

---------------------------------------------------------------------
[root@centos-athlon 4build]# rpm -Fhv ../u4build/i386/RedHat/RPMS/rpm-*
warning: ../u4build/i386/RedHat/RPMS/rpm-4.2.3-13.i386.rpm: V3 DSA
signature: NOKEY, key ID 025e513b
Preparing...               
########################################### [100%]
   1:rpm-libs              
########################################### [ 20%]
   2:rpm                   
########################################### [ 40%]
   3:rpm-build             
########################################### [ 60%]
   4:rpm-devel             
########################################### [ 80%]
   5:rpm-python            
########################################### [100%]
[root@centos-athlon 4build]# /usr/lib/rpm/rpmdb_stat -CA -h /var/lib/rpm
db_stat: DB_ENV->open: /var/lib/rpm: No such file or directory
---------------------------------------------------------------------

because the files do not exist

---------------------------------------------------------------------
[root@centos-athlon 4build]# ls /var/lib/rpm
Basenames     Dirnames  Group       Name      Providename     Pubkeys
     Requireversion  Sigmd5
Conflictname  Filemd5s  Installtid  Packages  Provideversion 
Requirename  Sha1header      Triggername
---------------------------------------------------------------------

but rpm doesnt work :-

---------------------------------------------------------------------

[root@centos-athlon 4build]# rpm -q rpm
rpmdb: Program version 4.2 doesn't match environment version
error: db4 error(22) from dbenv->open: Invalid argument
error: cannot open Packages index using db3 - Invalid argument (22)
error: cannot open Packages database in /var/lib/rpm
package rpm is not installed

---------------------------------------------------------------------

although the _-db files have reappeared

---------------------------------------------------------------------

[root@centos-athlon 4build]# ls /var/lib/rpm
Basenames     __db.001  __db.003  Filemd5s  Installtid  Packages    
Provideversion  Requirename     Sha1header  Triggername
Conflictname  __db.002  Dirnames  Group     Name        Providename 
Pubkeys         Requireversion  Sigmd5

---------------------------------------------------------------------

But they are still no good :-

---------------------------------------------------------------------

[root@centos-athlon 4build]# rpm -q rpm
rpmdb: Program version 4.2 doesn't match environment version
error: db4 error(22) from dbenv->open: Invalid argument
error: cannot open Packages index using db3 - Invalid argument (22)
error: cannot open Packages database in /var/lib/rpm
package rpm is not installed

---------------------------------------------------------------------

But removing them now fixes it :-

[root@centos-athlon 4build]# rm /var/lib/rpm/__db*
rm: remove regular file `/var/lib/rpm/__db.001'? y
rm: remove regular file `/var/lib/rpm/__db.002'? y
rm: remove regular file `/var/lib/rpm/__db.003'? y
[root@centos-athlon 4build]# rpm -q rpm
rpm-4.2.3-13

---------------------------------------------------------------------

Hope this helps ...

Lance
Comment 32 Lance Davis 2005-01-04 08:17:17 EST
I have now seen this behaviour on two real registered Rhel-3 boxes

When up2date updated rpm ...

Comment 33 Scott Thistle 2005-01-05 20:04:49 EST
I have same problem with two RHEL-3 servers as they did not check in
today for updates. I have rebooted one, and it is fixed. The other has
not been rebooted yet. I will run the "rm -f
/var/lib/rpm/__db* and attach the output of rpm --last -qa" and post
here if it helps now.
Comment 34 Scott Thistle 2005-01-05 20:08:45 EST
Actually, I was incorrect. It did not reboot. I have a call open as
the server is in another data center and I do not have access to see
the console. Good thing I picked the DEV server first :( I am now
really concerned about the PROD box. If it reboots, I know it will nto
come back up without assistance. I will post when/if I find out anything.
Comment 35 Scott Thistle 2005-01-05 20:12:33 EST
Created attachment 109407 [details]
output of rpm --last -qa
Comment 36 WZ 2005-01-06 10:52:03 EST
Wired thing is when logged in as a regular user, no problem at all,
but not so as root. Reboot does solved the problem
Comment 37 Scott Thistle 2005-01-06 10:56:25 EST
I had the issue with the DEV server, and a reboot did not fix it (the
server was init 0, instead of init 6 (doh!). Anyway, I rebooted the
dev, and up2date would now work. But after it ran, same problem about
1/2 way through. I ran the "rm -f /var/lib/rpm/__db*", then up2date
worked. So I continued on and it now works fine (all updated). The
PROD box also worked fine when I ran the rm command and then proceeded
with the updates.
Comment 38 Cristian Gafton 2005-01-12 09:55:20 EST
Ok, so rpm does have in its %post script a feeble attempt of cleaning up the
__db* files in preparation for the DB upgrade. That works perfect if installing
the rpm updates from the command line. As a result, the rhnsd/rhn_check succeeds
in installing this errata.
                                                                               
                                                                               
               
The problem is more subtle and happens after that successful run of rhn_check
has happened:
                                                                               
                                                                               
               
- rhnsd starts up rhn_check, which loads the python bindings for rpm, thus
locking in memory the old version of librpm.
                                                                               
                                                                               
               
- the rpm upgrade happens, and the %post script in rpm cleans up (removes) the
__db* files. Keep in mind, this is all happening under rhn_check.
                                                                               
                                                                               
               
- now after the upgrade is done, rhn_check does not simply exit - it does
another query or two on the rpm database, to get and report transactions, or
whatever.
                                                                               
                                                                               
               
- not having any __db* files around - since rpm cleaned them up, the librpm
bindings it will happily trigger the creation of another set of __db files, but
using the librpm it has locked in memory - the old one
                                                                               
                                                                               
               
- that works perfectly, and rhn_check exits, leaving the __db* environment files
for the old db version around.
                                                                               
                                                                               
               
After that, the new rpm has an old environment to deal with. Instead of dealing,
it bails out in the most spectacularly uninteresting way, rendering rhn_check
disabled and requiring manual intervention for cleaning the __db* files.
Comment 40 Adrian Likins 2005-01-12 17:49:03 EST
more specifially, rhn_check runs multiple rpm transactions in
a single execution (aka, one for each action scheduled). 

So the transaction with rpm in it gets ran, rpm changes
the db format, rm's the old lock files, etc. Everythings
happy. 

rhn_check, with the old rpm still in memory, then goes
and gets the next action, runs a new rpm transaction,
and in doing so, creates lock files of the old format.

Since rhn_check is probably the only app (with the
possible execption of s-c-packages) that runs multiple
rpm transactions in the same processes, it's the
first to see this. 
Comment 41 Adrian Likins 2005-01-12 17:53:27 EST
More on the above... 

The problem doesn't seem to be that using the rpm module itself
after installing the new rpm is creating the stale locks (up2date
does this in many ways). But the creation of a new rpm transaction
object and/or running it. 

Which is why up2date doesn't see this, it only ones one
transaction in a single exec. rhn_check creates and runs
multiple rpm transactions in one exec. 
Comment 42 Max Spevack 2005-01-18 17:00:24 EST
reproduction information:

1.  install (or kickstart) a rhel-3-u3 system
2.  register with RHN, and schedule an upgrade of rpm and at least one
other package (vim-common for example)
3.  run rhn_check, everything looks good
4.  run rpm -qa and you will see the database errors

NOTE: if you only schedule an upgrade of rpm and nothing else, it does
not break the database.  Scheduling another action and running
rhn_check a second time will pass, and nothing will break.  The bug
only happens when you upgrade rpm and something else within a single
execution of rhn_check

I have a box available for developers to look at.  It's a vmware box
snapshotted in such a manner that you can reproduce the bug as much as
you like and rollback very easily.
Comment 43 Nic Doye 2005-01-21 10:16:23 EST
Bitten by this on 2 servers (so far). Just the simple "rm
/var/lib/rpm/__db*" worked though. (Phew!)
Comment 48 Joshua Jensen 2005-07-28 11:10:26 EDT
I can confirm that just "rm -f /var/lib/rpm/__db*" corrects this problem... sure
would be nice of the %post script of the new RPM packages would do this for us, no?
Comment 49 Milan Kerslager 2005-07-28 14:05:34 EDT
As of comment #38 from Cristian Gafton, this is a transaction problem (race
condition). The rpm package has %post script for removing lock files already.
Just install from current ISO images and you will be happy because no rpm update
will be needed.
Comment 50 Need Real Name 2006-02-01 04:02:15 EST
Same problem here using a RHEL AS 3 (Update 6)
I tried to remove the __db* files but when I execute a rpm -Fvh to update a
package the problem (error 22) appears a again.
Comment 52 Milan Kerslager 2006-08-01 19:34:54 EDT
As of comment #1 and the header of the bug, this is related to RHEL3 U4 (18
months ago). This problem seems to be resolved as librmp has been updated (see
comment #38 and #49).

The bug is seems to be still here for users who install RHEL3U3 or older and
trying to update via RHN.
Comment 54 Bret McMillan 2006-12-08 14:01:29 EST
Yes, it's an issue.  Reassigning to pkilambi.
Comment 55 Pradeep Kilambi 2006-12-14 13:37:57 EST
It works fine for me on rhel4, tried rhel4, rhel4u2, rhel4u4

steps:
- kickstarted a rhel4u2 box (also rhel4 and rhel4u4)
- scheduled an upgrade of:  rpm-4.3.3-18_nonptl, vim-common-6.3.046-0.40E.7:1
- on client ran rhn_check -vvv
- upgrade went fine
- then ran rpm -qa and no db errors.

Another test i did was:
tried upgrading thr rpmdb package itself

- scheduled an upgrade of:  rpm-libs-4.3.3-18_nonptl, rpmdb-redhat-4-0.20060803,
rsh-0.17-25.4

- D: Package ['rsh', '0.17', '25.4', '', 'i386', '40586', 'rhel-i386-as-4']
Fetched via: get
Preparing              ########################################### [100%]

Installing...
   1:rpm-libs               ########################################### [100%]
   2:rpmdb-redhat           ########################################### [100%]
   3:rsh                    ########################################### [100%]
D: Sending back response (0, 'Packages were installed successfully', {})
D: do_call packages.checkNeedUpdate ('rhnsd=1',)
D: Called refresh_rpmlist
D: local action status:  (0, 'rpmlist refreshed', {})
[root@www2 ~]# rpm -qa |less
[root@www2 ~]# 

and it just went fine even upgrading the rpm and rpmdb packages with others.

as per comment#42 this hsould have given me an error. Probably it might be an
issue on 3u8 many be. But for 4u5 this works fine from my above tests.
Comment 56 Pradeep Kilambi 2006-12-14 13:43:37 EST
The next set of tests i'm going to try are for rhel3, rhel3U4, rhel3U8 in
bug#218663 which is aligned to rhel3.9. But for 4.5 this works.
Comment 57 Pradeep Kilambi 2006-12-14 15:43:06 EST
I confirm from my test that its not an issue for 4.5, it probably still is an
issue for rhel-3 but thats a seperate bug(bug#218663). closing this as works.
Comment 58 Pradeep Kilambi 2007-01-02 13:55:17 EST
This issue happens only when we have a rpm db change which did from 3.3 to 3.4.
Fix should probably be in rpm itself, so that whenever its updated to newer
versions it should try to clear the lock files as well.
Comment 59 Pradeep Kilambi 2007-01-02 14:10:43 EST
though this is not an issue specific to 4.5, i'm reopening it untill we know
which direction we want to go..
Comment 60 Fanny Augustin 2007-01-12 16:16:03 EST
Moving to rhn-uncommitted from 4.5 and 3.9 respectively. Bret and pkilambi think
that the course of action on this bug is unclear and the fix probably might have
to go in a major release such as FC7/8(not sure yet). Its better we move it to
rhn-uncommitted until we have a concrete plan on this
Comment 61 Red Hat Bugzilla 2007-04-11 20:09:00 EDT
User bnackash@redhat.com's account has been closed

Note You need to log in before you can comment on or make changes to this bug.