Red Hat Bugzilla – Bug 52379
rpm sticks which can prevent the system booting
Last modified: 2008-05-01 11:38:00 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.75 [en] (X11; U; SunOS 5.6 sun4u)
Description of problem:
I have had the rpm command stick on me a few times, usually after an
up2date or similar failed for some reason (you run rpm -q kernel for
example and it sits there until I get fed up of waiting). Having read some
other bug reports, I tried removing the __db files in /var/lib/rpm, which
fixes the problem. However I have discovered today that if rpm is sticking,
you can't boot the system (as rpm is called in /sbin/mkkerneldoth in the
startup scripts). This is still true if you try to boot single user, or use
a boot disk.
Steps to Reproduce:
Removing /var/lib/rpm/__db* files at rpm startup is
done as of rpm-4.0.3-0.88.
But I have rpm-4.0.3-0.93 on my system!
Do you have /var/lib/rpm__db* files? They should always
be gone after the next successful execution. Yes,
they may be present if you ^C out of a command,
will be removed by next rpm command.
Otherwise I have no idea how to reproduce "sticks"
I did have before I deleted them (__db.001 and __db.002 I think), and from what
I remember of the time stamps, they had survived several successful upgrades of
batches of packages while I was upgrading roswell to roswell2 (with up2date). I
did have a problem upgrading one package (Bug 52302) which could plausibly
correspond to what I remember the time stamps were, so I will try to see if this
recreates the problem tomorrow.
It doesn't seem to be connected to my problems with up2date libao, however ^C
out of an rpm -qa does leave these two __db files and subsequent runs of rpm
don't delete them.
We (Red Hat) should try to fix this before next release.
So ther original problem, system doesn't reboot becuase
rpm -q is failing, has been fixed by doing rm /var/lib/rpm/__db*
And, even though the files are present, rpm is not hanging.
I call this WORKSFORME.
Except I expect rpm to hang again in the future, since it is not removing the
__db files as you previously claimed. There was roughly 24 hours between the
__db files being created and the reboot hang, so I wouldn't be surprised if the
problem reoccurs when I go into work on Tuesday. Secondly, I actually tried the
^C trick twice on rpm -qa, the first time a subsequent run of rpm -qa hung at
the same point.
I agree that my initial problem is solved, and I now know how to solve it when
it happens again, but if it can happen to me then it can happen to others less
skilled at picking up the pieces (it took me about 1/2 hr to find the problem
and I have a lot of experience of system administration and a spare linux
partition on the machine which made finding the problem considerably easier).
So find the place where the boot sequence is using rpm -q
rm -f /var/lib/rpm/__db*
just before. You can also add a daily cron script
to do the same.
You are missing the point. If it can happen to me it can happen to someone else,
and you have said nothing to reassure me that this won't occur. I personally can
come up with several ways of avoiding the problem, but I doubt the next person
who suffers this problem will be as prepared. Remember, this problem gives you
an unbootable system, so even if you know to remove the __db files, it is quite
tricky to do so. I didn't report this problem because I needed to fix it; I had
already worked out how to do this; but I thought you might have some concern to
prevent other people having to suffer it.
This is *not* an rpm problem, but rather a problem with
using rpm -q on a boot critical pathway, Off to initscripts
for resolution ...
initscripts already *does* remove the files, as of 6.21-1 or so.
A postscript: It seems the bug I reported is repeatable (at least with the
roswell2 initscripts-6.20-1) and that the __db files can cause rpm to stick
under normal usage if left around for long enough. Also the booting problem
isn't quite as bad as I thought; typing ^C a few times gets the boot moving