Bug 74726 - rpm hangs
rpm hangs
Status: CLOSED NOTABUG
Product: Red Hat Linux
Classification: Retired
Component: rpm (Show other bugs)
8.0
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Jeff Johnson
:
: 75393 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-10-01 02:00 EDT by djh
Modified: 2007-04-18 12:46 EDT (History)
31 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2002-11-08 17:11:22 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
strace of rpm -qa (19.25 KB, text/plain)
2002-10-05 16:19 EDT, Need Real Name
no flags Details

  None (edit)
Description djh 2002-10-01 02:00:32 EDT
After a fresh install of Psyche, rpm hung during the installation of an RPM.

# rpm -Uvh ~dave/rpmbuild/RPMS/qstat-2.5b-1.i386.rpm
Preparing...                ########################################### [100%]
   1:qstat                  ########################################### [100%]
<hung for hours>

This is all I got from stracing it..
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
...

Here's the backtrace from the hung rpm process...
(gdb) bt
#0  0x08196bbe in select ()
#1  0x082105f4 in _GLOBAL_OFFSET_TABLE_ ()
#2  0x080fc2a1 in __os_yield_rpmdb ()
#3  0x080c2f6f in __db_tas_mutex_lock_rpmdb ()
#4  0x080f3a87 in __lock_get_internal ()
#5  0x080f32bf in __lock_get_rpmdb ()
#6  0x080dd2a2 in __db_c_put_rpmdb ()
#7  0x0809285f in db3cput ()
#8  0x0808fadc in rpmdbAdd ()
#9  0x08062cba in rpmpsmStage ()
#10 0x080623c0 in rpmpsmStage ()
#11 0x080628d5 in rpmpsmStage ()
#12 0x0807d085 in rpmtsRun ()
#13 0x0806dbf1 in rpmInstall ()
#14 0x08048e4d in main ()
#15 0x0815ad62 in __libc_start_main ()

Had to SIGKILL and rm /var/lib/rpm/__* to use rpm again.


How Reproducible:
Don't know.  This bug was quite common in null, but I don't have a reliable
method of reproducing it.


Additional info:
rpm-4.1-1.06
Comment 1 Ali-Reza Anghaie 2002-10-01 07:14:12 EDT
Somebody pointed out bug: 
 
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=68056 
 
As a reference but I don't see it added here. So I'll add it. 
 
I to had the problem but it's not readily repeatable. Which sucks for the 
developer/QA folk at RH to try to track-down. 
 
Cheers, -Ali
Comment 2 Benjamin Kosnik 2002-10-01 12:31:36 EDT
I've had this problem too, after a clean install of 8.0. I cannot install
abiword's RPM, for instance. 

% rpm -Uvh foo.rpm
Preparing...                ########################################### [100%]

just hangs

-benjamin
Comment 3 Need Real Name 2002-10-01 19:36:37 EDT
I have the exact same problem on a vanilla install of Psyche.  Now that
everything is installed, I cannot upgrade or install new packages, as RPM hangs
on a select() call that continuously times out ...  

After killing rpm (with -9), I have to remove /var/lib/rpm/__db* manually so
that I can successfully query the database again.  Rebuilding the database does
not seem to solve the problem, either.
Comment 4 Need Real Name 2002-10-02 04:09:43 EDT
Me too!
One note: while rpm hung as root, I could still "rpm -qa" as me.
Comment 5 john arkim 2002-10-02 04:12:53 EDT
same problem here. i have to delete the __* files to get it work.redhat 8.0
Comment 6 Need Real Name 2002-10-02 23:48:16 EDT
I think I've found a temporary workaround.  I'm not sure how reliable it is, but
I've been able to install five or six packages now without any problems.  

Instead of doing a "rpm -Uvh blah", I used a more verbose output "rpm -Uvvvh
blah", and haven't had any problems.  I'm guessing that it's a race condition,
and by having rpm display a longer debugging trace, the race doesn't manifest
itself.

That said, I tried upgrading a package with just "rpm -U blah", and the crash
occurred.  After killing rpm and deleting the locks, I issued a "rpm -Uvvvh
blah" and it installed without any troubles.

HTH,
-Kris
Comment 7 Richard Allen 2002-10-03 07:38:52 EDT
I've also had this bug.  First I thought that it was an issue with my $HOME
being on a NFS mounted filesystem (I typicly download the rpm's as me into my
$HOME, su - and install) but I've verifyed that this has nothing to do with it.

Also I noticed that rpm began working again after a reboot (I had to reboot for
other reasons ;)   reboot was not to try and fix rpm)

Its also possible that this is not a rpm problem.  I've seen evolution and WineX
hang in the exact same way (didnt do any tracing at the time tho).  WineX was so
severly hung that not even kill -9 managed to kill it.   It just sat there
eating up my CPU.
Comment 8 Warren Togami 2002-10-05 03:47:42 EDT
Red Hat 8.0, my RPM hung right after it installed a single RPM package with -ivh.

I attached strace to the pid and it repeats the following message forever:
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
Comment 9 Need Real Name 2002-10-05 16:19:10 EDT
Created attachment 78965 [details]
strace of rpm -qa
Comment 10 Need Real Name 2002-10-07 19:27:46 EDT
I'm reporting the same issue with RH 8.0 and the default
version of rpm that ships with it, version 4.1-1.06.

I run rpm -e to remove all the unnecessary software on
systems that will act as a server.  After removal of a
package or [2|3|4|...|n], rpm hangs.  The strace
eventually shows

<snipped>
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)

looping over and over (rinse, lather repeat)

As the others are reporting, if I remove the __db* files
after killing the process, I can use rpm again.  If I do
not remove them, I have to restart the system.  Taking
it to single user and back again does not fix it.

-Pat


Comment 11 Tom Wood 2002-10-07 21:47:10 EDT
*** Bug 75393 has been marked as a duplicate of this bug. ***
Comment 12 Need Real Name 2002-10-08 16:46:14 EDT
I've been getting the exact same strace output and the same problems. The hangs
are quite, quite frequent for me, however, more or less hangs every second time.
Any news on when/if this particular issue might be resolved, or if it's indeed
being worked on? I do realize that it might not be as high priority since not
all users are experiencing it, but it seems to be serious enough a problem
worthy of more attention :)
Comment 13 Jeff Johnson 2002-10-08 17:06:15 EDT
Try rpm-4.1-9 packages from
	ftp://people.redhat.com/jbj/test-4.1

Please give me explicit WORKSFORME to expedite errata
release.
Comment 14 Jordan Russell 2002-10-09 02:30:56 EDT
rpm-4.1-9 WORKSFORME. I haven't encountered a single hang while upgrading 
hundreds of packages.
Comment 15 Scott Dowdle 2002-10-09 11:36:24 EDT
Updated to the rpm-4.1-9 test packages and I haven't had the problem since...
although I've only updated a few packages since and I wouldn't call my
experience a full test... but so far so good.
Comment 16 Need Real Name 2002-10-09 13:17:37 EDT
I also updated to version 4.1-9 test packages
but I'm still seeing the same problem (see previous
post above, RH 8.0).  I managed to successfully remove six
packages with rpm -e but then immediately tried to
remove two more and it hung again  If I just kill 
the proc with kill -9, rpm will not function.
Once I remove the __db* files, rpm will function again.

(strace follows)

...
open("/var/lib/rpm/Packages", O_RDONLY|O_LARGEFILE) = 3
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
fstat64(3, {st_mode=S_IFREG|0644, st_size=10727424, ...}) = 0
brk(0x8260000)                          = 0x8260000
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 8000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 16000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 64000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 128000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 256000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 512000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
[continues]

I guess I'd have to say version 4.1-9  WORKSFORME_NOT_

-Pat
Comment 17 Jeff Johnson 2002-10-09 13:22:06 EDT
preich: you have a different problem, please
open a different bug
Comment 18 Jordan Russell 2002-10-09 15:08:59 EDT
I think I spoke too soon. :\

On a different system also running 4.1-9, this rpm command just hung: "rpm -ivh 
squid-2.4.STABLE7-4.i386.rpm". strace reports:

select(0, NULL, NULL, NULL, {0, 20000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
...

I deleted the __* files before upgrading to 4.1-9, but not after. Maybe I 
should have?
Comment 19 Need Real Name 2002-10-09 16:01:07 EDT
Still not working for me, same problem as the last poster (strace and all). I
did delete those particular files beforehand, however, since RPM hung while
trying to install the new version of RPM, if that makes any sense ;>
Comment 20 Jeff Johnson 2002-10-09 16:07:01 EDT
Go read #73097, and figger out which type of
hang you have. I'm trying to sort out the
missed SIGCHLD at the moment, other "Me too"
hang reports are not gonna help.
Comment 21 lukeh 2002-10-12 00:38:57 EDT
I have observed this lockup several times, on two separate Psyche installs (a
desktop Athlon machine and a Dell laptop).  I didn't know about deleting the __*
files (how many users actually would?), and so have had to resort to restarting
the machine each time.

I have *not* killed RPM before this happened, which would have resulted in a
stale lock.  I don't think I have ever killed RPM before it has finished, for
fear of something like this happening.

Thus if there is a stale lock problem, RPM is leaving the stale locks itself.

If stale locks are the problem, it would be nice if RPM at least (a) reported
that it was waiting for locks to be freed, and/or (b) didn't sit in an
un-killable state while waiting, and/or (c) checked to see if other instances of
RPM were running, to see if locks were actually valid.  (There are lots of
potential problems with (c) though.)

It may not be a problem with this at all; perhaps for example when RPM does an
ldconfig, that is locking-up instead.
Comment 22 Need Real Name 2002-10-12 04:00:42 EDT
Kill -9 doesn't seem to want to do anything.
cannot rebuild the database (hangs)
deleting the _* files doesn't help
-----------------
All of the above both before and after installing the above-mentioned RPM
rpm-4.1-9 (which did NOT hang)
Comment 23 Benjamin Kosnik 2002-10-13 16:36:44 EDT
bkoz WORKSFORME thanks. 

After updating to all the new rpm's, I tried installing glibc-2.3.1-1 and
everything worked.

-benjamin
Comment 24 Warren Togami 2002-10-13 22:55:28 EDT
Just in case people think this problem is fixed, I ran into it several times on
my laptop and a dual processor Pentium3.  Fresh install + this test package.

Does not work for me.
Comment 25 Gene Czarcinski 2002-10-28 06:52:05 EST
Bad news ,, after no problems for quite a while, I just got a hang with 4.1-9.

I did a kill -9; rm /var/lib/rpm/__* and then reran the hung command which 
updates to small noarch rpm with no problems.

Note that this was NOT on a slow processor -- dual 933MHz PIII with 1GB ram.
Comment 26 Gene Czarcinski 2002-10-28 06:54:37 EST
Bad news ,, after no problems for quite a while, I just got a hang with 4.1-9.

I did a kill -9; rm /var/lib/rpm/__* and then reran the hung command which 
updates to small noarch rpm with no problems.

Note that this was NOT on a slow processor -- dual 933MHz PIII with 1GB ram.
Comment 27 Need Real Name 2002-10-28 17:29:17 EST
I have seen this behaviour before in other applications using BerkeleyDB. 
BerkeleyDB uses on disk memory regions for IPC, when lock state gets out of
sync, and there is no _detectable_ deadlock, it is possible for new processes to
deadlock on a single held lock of a program that is no longer running (i.e.
crashed hard and did not clean up).  It _should_ be safe to remove the __db.00?
files as long as no other copies of RPM are running.  This is my experience anyway.
Comment 28 djh 2002-11-01 01:42:09 EST
I've just reproduced the bug with the test-4.1 RPMs.

Same details as my initial report - stuck on select(), backtrace is the same.
Comment 29 Nathan G. Grennan 2002-11-04 11:51:22 EST
  I have seen this with the 4.1-9 test rpms and the newer rpm-4.2-0.5+glibc 2.3.1.
The more I upgrade the rarer it seems to become.

  I currently have two boxes running RedHat 8.0 with rpm-4.2-0.5 and glibc
2.3.1. My personal box has had the combination for a week and hasn't seen a hang
yet. The other, a server, I just installed the new packages yesterday and I have
seen a hang today.

  Is there any hope in sight for this bug? I have started recommending people
stay with RedHat 7.3 till this bug is fixed. I am also regretting upgrading
myself, because of this bug and others problems I have had.
Comment 30 Nathan Olla 2002-11-06 15:37:00 EST
We are seeing exactly the same thing.  We've not tried any of the work arounds,
but will be.  This bug is annoying enough to cause us to delay our 8.0
workstation rollout until it is resolved.
Comment 31 Dan Hollis 2002-11-08 16:27:15 EST
we are seeing exactly the same bug on nearly every 8.0 machine we have.
our 8.0 deployment is halted until this bug gets resolved...
Comment 32 Jeff Johnson 2002-11-08 16:37:21 EST
Again, lest it be lost in the noise:
Try rpm-4.1-9 packages from
         ftp://people.redhat.com/jbj/test-4.1

 Please give me explicit WORKSFORME to expedite errata
 release.

There are far too many bugs (with different root causes)
here to sort out. Feel free to reopen individual reports.
Comment 33 Jason Merrill 2002-11-08 17:01:03 EST
Er, there are several reports that the 4.1-9 packages do not, in fact, fix this bug.
Comment 34 Jordan Russell 2002-11-08 17:04:38 EST
Erm...where do you get the idea that we're all reporting different bugs here? 
I'm experiencing the problem *exactly* as djdave@bigpond.net.au originally 
reported it (same strace, same backtrace). And it has already been confirmed by 
djdave@bigpond.net.au and myself that rpm-4.1-9 does NOT fix it.
Comment 35 Nathan G. Grennan 2002-11-08 17:11:15 EST
A few hours after I last commented the machine that hadn't experienced a hang in
a week, did.
Comment 36 Jeff Johnson 2002-11-08 17:22:36 EST
Again, 

There are far too many bugs (with different root causes)
 here to sort out. Feel free to reopen individual reports.
Comment 37 Jordan Russell 2002-11-08 18:37:54 EST
Again, what makes you think people are discussing different/multiple bugs here? 
I'm not, and scanning over the comments I don't see anyone else doing so either.
(And even if someone did post a comment that wasn't related to the original 
problem, how can that possibly render the original report NOTABUG?)

Also, what does "Feel free to reopen individual reports" mean? This 
(djdave@bigpond.net.au's) report looks like an "individual report" to me, and 
it describes the problem I've been experiencing perfectly.
Comment 38 Warren Togami 2002-11-08 20:17:23 EST
I think he means open another Bugzilla bug # specific to 4.1-9.  I'll check if
it has already been done, and if not I'm opening it myself.  All of my systems
are experiencing this same problem, although more rarely, with 4.1-9.  This is
the only thing preventing me from telling people "RH 8.0 is ready."
Comment 39 djh 2002-11-08 20:38:23 EST
I've just opened a fresh bugreport for this - bug #77562.
Comment 40 Rusty Sasiain 2003-03-16 13:22:36 EST
Just an FYI, If anyone is using synaptic while trying to rpm via cli try
stopping Synaptic/apt-get or red-carpet. I have tested this scenario and had
several problems and in most cases te hang was due to multiple calls to the db
lock files which as all of know does not like to play with other kids. Once I
had stopped all apps trying to access the db removed all the _* files out of
/var/lib/rpm, I was able to successfully remove/install whatever I wanted. Again
this is one persons evaluation and may or may not shed any light on the issue at
hand, but definately woth a try. Peace!
Comment 41 Wayne Schuller 2003-07-08 05:34:50 EDT
yes Rusty that is good advice.

I turned off the red-carpet daemon, and the problem stopped.

I think it is related to multiple clients querying the rpm database at the same
time.

thank you!
Comment 42 teuben 2003-12-21 22:57:24 EST
a reboot is a nice workaround, but if you remove the
stale lock files /var/lib/rpm/__db*  you will also 
get rpm to work again!

Note You need to log in before you can comment on or make changes to this bug.