Bug 85429 - mozilla hangs on futex(2)
Summary: mozilla hangs on futex(2)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: glibc
Version: 9
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-03-02 15:56 UTC by Kjetil T. Homme
Modified: 2016-11-24 15:24 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2004-05-26 11:00:06 UTC
Embargoed:


Attachments (Terms of Use)

Description Kjetil T. Homme 2003-03-02 15:56:46 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030206

Description of problem:
mozilla hangs randomly every 15 minutes or so when using the stock Phoebe5
kernel and glibc.  name resolving seems to be an aggravating factor.  Mozilla
works fine with the Red Hat 8.0 errata kernel.  the problem was present in
Phoebe3 as well, but then the entire machine would eventually hang -- Phoebe5 is
a nice improvement in that respect :-)


Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
go to a busy web page with external adverts.  it seems like the combination of
CPU load and name resolving gives the best odds of hanging Mozilla.


Actual Results:  the arguments to the hanging futex varies.

: [kjetilho@groucho ~]; strace -f -p 15964
futex(0x80d30f0, FUTEX_WAIT, 0, NULL)   = -1 EINTR (Interrupted system call)
--- SIGTERM (Terminated) @ 0 (0) ---

: [kjetilho@groucho ~]; strace -f -p 11622
futex(0x41d3bf38, FUTEX_WAIT, 11629, NULL) = -1 EINTR (Interrupted system call)
--- SIGTERM (Terminated) @ 0 (0) ---

: [kjetilho@groucho ~]; strace -f -p 13161
futex(0x42131300, FUTEX_WAIT, -2, NULL <unfinished ...>


Additional info:

my system is a BP6 with 2x366 MHz Celeron (not overclocked).

kernel-smp-2.4.20-2.48
glibc-2.3.1-46
mozilla-1.2.1-20

Comment 1 Kjetil T. Homme 2003-03-07 00:52:50 UTC
I made a snapshot of a page which seems to make Mozilla either crash or hang
consistently.

http://heim.ifi.uio.no/~kjetilho/tmp/mozillabug/www.nettavisen.no/servlets/page%3fsection=3&item=257901


Comment 2 Kjetil T. Homme 2003-03-07 00:54:29 UTC
I guess I should mention, with LD_ASSUME_KERNEL=2.2.5, mozilla works fine.


Comment 3 Ulrich Drepper 2003-04-14 07:11:08 UTC
Try glibc 2.3.2-27.9.

Comment 4 Kjetil T. Homme 2003-04-15 23:33:00 UTC
thanks, but it does not help.  the snapshot I made still makes Mozilla hang
rather consistently.

kernel-smp-2.4.20-2.48
glibc-2.3.2-27.9
mozilla-1.2.1-26

one thing I noticed is that this always happens:
# strace -p 10649
futex(0x42132320, FUTEX_WAIT, -3, NULL <unfinished ...>
# strace -p 10649
futex(0x42132320, FUTEX_WAIT, -5, NULL <unfinished ...>
# strace -p 10649
futex(0x42132320, FUTEX_WAIT, -7, NULL <unfinished ...>

ie., the third argument is decremented by two for each time I attach using
strace.  I have no idea if this is relevant at all :-)


Comment 5 Ihar Filipau 2003-09-10 17:45:35 UTC
I'm experiencing similar problem.
I'm running up-to-date RHL9 (I've up2date in my crontab).
The only not normal component is Mozilla 1.4, installed from rpms available from
www.mozilla.org.

The problem: twice a week (or so) Mozilla stops to resolv names. After restart
of mozilla name resolution starts to work once again. But it turns out that
previous instance of mozilla can still hang in the memory. Previously I was just
killing it - bat today I decide to see what is the problem. To my surprise it
was (again, and over again - from hanging rpm story) futex(2):

[ifilipau@hera ~]$ strace -p 17876
futex(0x42934d78, FUTEX_WAIT, 17906, NULL <unfinished ...> # ^C
[ifilipau@hera ~]$

pid 17906 is already gone, but mozilla waits for something.

killall mozilla-bin helps, but this is not nice.
FYI.

P.S. BTW RHL9 misses the man pages for futex(2)/(4)

Comment 6 Pádraig Brady 2003-09-11 09:16:50 UTC
Hi, me too.
mozilla dns thread seems to hang up at:

futex(0x42932c88, FUTEX_WAIT, 12418, NULL

mozilla-1.4-0
kernel-smp-2.4.20-20.9
glibc-2.3.2-11.9

Note the problem did NOT occur with
kernel-smp-2.4.20-8

I'll upgrade glibc to see if it helps

Comment 7 Pádraig Brady 2003-09-25 11:31:52 UTC
I upgraded glibc, which seemed to help actually,
but the problem just happened again. (First
time in 2 weeks).

mozilla-1.4-0
kernel-smp-2.4.20-20.9
glibc-2.3.2-27.9


Comment 8 Pádraig Brady 2003-10-03 17:40:27 UTC
I've started maxing out my CPU now with 2 math calculation
processes, and this mozilla/futex bug seems to trigger
much more frequently.


Comment 9 Pádraig Brady 2003-11-18 09:43:47 UTC
mozilla-1.5-1
glibc-2.3.2-27.9
2.4.20-20.9smp

Hmm I thought I resolved this a while ago, saying the new glibc
didn't cause it? Anyway it's much more difficult to reproduce
now, but it happened again with the above combination.

Comment 10 Alessandro Suardi 2004-05-25 18:37:55 UTC
Ximian Mozilla 1.4.2 / Galeon 1.3.7 under kernel 2.6.6 and later hang
 on futex() on a page containing Java code after upgrading to Sun JRE
 1.5.0-beta. JRE 1.4.2_03-fcs does not suffer from this issue.

Mozilla survives with LD_ASSUME_KERNEL=2.4.1, Galeon has java_vm going
 into a CPU spin even with LD_ASSUME_KERNEL=2.4.1.

Before adding more detail, I'd like to know whether you're interested
 in such detail given that I do have a RH9 base distro but as you see
 I'm using XD2 and a beta JRE from Sun - so I'm unsure about the fact
 that my environment can be a candidate for your investigation.

Comment 11 Pádraig Brady 2004-05-26 08:45:11 UTC
Have to say I've updated to mozilla-1.6-0.rh90.dag
as soon as it was available and have not noticed
the problem since.

Comment 12 Kjetil T. Homme 2004-05-26 11:00:06 UTC
it hasn't happened me in a long time, and never in RHEL WS3, FC1 nor
FC2.  RHL9 is discontinued anyway.  I'm taking the liberty of closing it.

Comment 13 Alessandro Suardi 2004-06-02 15:23:19 UTC
...and the problem went away for me upgrading to Sun JRE 1.5.0-beta2.
So now Ximian Mozilla 1.4.2 works for me too under kernel 2.6.7-rc2 :)


Note You need to log in before you can comment on or make changes to this bug.