Bug 52379

Summary: rpm sticks which can prevent the system booting
Product: [Retired] Red Hat Public Beta Reporter: Michael Young <m.a.young>
Component: rpmAssignee: Jeff Johnson <jbj>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: roswell   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-08-24 22:16:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael Young 2001-08-23 11:18:18 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.75 [en] (X11; U; SunOS 5.6 sun4u)

Description of problem:
I have had the rpm command stick on me a few times, usually after an
up2date or similar failed for some reason (you run rpm -q kernel for
example and it sits there until I get fed up of waiting). Having read some
other bug reports, I tried removing the __db files in /var/lib/rpm, which
fixes the problem. However I have discovered today that if rpm is sticking,
you can't boot the system (as rpm is called in /sbin/mkkerneldoth in the
startup scripts). This is still true if you try to boot single user, or use
a boot disk.

How reproducible:
Didn't try

Steps to Reproduce:

Comment 1 Jeff Johnson 2001-08-23 15:51:40 UTC
Removing /var/lib/rpm/__db* files at rpm startup is
done as of rpm-4.0.3-0.88.

Comment 2 Michael Young 2001-08-23 17:08:01 UTC
But I have rpm-4.0.3-0.93 on my system!

Comment 3 Jeff Johnson 2001-08-23 17:16:28 UTC
Do you have /var/lib/rpm__db* files? They should always
be gone after the next successful execution. Yes,
they may be present if you ^C out of a command,
will be removed by next rpm command.

Otherwise I have no idea how to reproduce "sticks"

Comment 4 Michael Young 2001-08-23 17:41:57 UTC
I did have before I deleted them (__db.001 and __db.002 I think), and from what
I remember of the time stamps, they had survived several successful upgrades of
batches of packages while I was upgrading roswell to roswell2 (with up2date). I
did have a problem upgrading one package (Bug 52302) which could plausibly
correspond to what I remember the time stamps were, so I will try to see if this
recreates the problem tomorrow.

Comment 5 Michael Young 2001-08-24 08:34:50 UTC
It doesn't seem to be connected to my problems with up2date libao, however ^C
out of an rpm -qa does leave these two __db files and subsequent runs of rpm
don't delete them.

Comment 6 Glen Foster 2001-08-24 15:54:02 UTC
We (Red Hat) should try to fix this before next release.

Comment 7 Jeff Johnson 2001-08-24 17:22:50 UTC
So ther original problem, system doesn't reboot becuase
rpm -q is failing, has been fixed by doing rm /var/lib/rpm/__db*
files.

And, even though the files are present, rpm is not hanging.

I call this WORKSFORME.

Comment 8 Michael Young 2001-08-24 17:59:35 UTC
Except I expect rpm to hang again in the future, since it is not removing the
__db files as you previously claimed. There was roughly 24 hours between the
__db files being created and the reboot hang, so I wouldn't be surprised if the
problem reoccurs when I go into work on Tuesday. Secondly, I actually tried the
^C trick twice on rpm -qa, the first time a subsequent run of rpm -qa hung at
the same point.
I agree that my initial problem is solved, and I now know how to solve it when
it happens again, but if it can happen to me then it can happen to others less
skilled at picking up the pieces (it took me about 1/2 hr to find the problem
and I have a lot of experience of system administration and a spare linux
partition on the machine which made finding the problem considerably easier).

Comment 9 Jeff Johnson 2001-08-24 18:03:12 UTC
So find the place where the boot sequence is using rpm -q
and add
	rm -f /var/lib/rpm/__db*
just before. You can also add a daily cron script
to do the same.

Comment 10 Michael Young 2001-08-24 21:14:24 UTC
You are missing the point. If it can happen to me it can happen to someone else,
and you have said nothing to reassure me that this won't occur. I personally can
come up with several ways of avoiding the problem, but I doubt the next person
who suffers this problem will be as prepared. Remember, this problem gives you
an unbootable system, so even if you know to remove the __db files, it is quite
tricky to do so. I didn't report this problem because I needed to fix it; I had
already worked out how to do this; but I thought you might have some concern to
prevent other people having to suffer it.

Comment 11 Jeff Johnson 2001-08-24 22:13:05 UTC
This is *not* an rpm problem, but rather a problem with
using rpm -q on a boot critical pathway, Off to initscripts
for resolution ...

Comment 12 Bill Nottingham 2001-08-24 22:15:57 UTC
initscripts already *does* remove the files, as of 6.21-1 or so.

Comment 13 Michael Young 2001-08-28 14:06:36 UTC
A postscript: It seems the bug I reported is repeatable (at least with the
roswell2 initscripts-6.20-1) and that the __db files can cause rpm to stick
under normal usage if left around for long enough. Also the booting problem
isn't quite as bad as I thought; typing ^C a few times gets the boot moving
again.