345301 – NFS client, NLM, java 1.4.2_08, readlock hang on futex

Bug 345301 - NFS client, NLM, java 1.4.2_08, readlock hang on futex

Summary: NFS client, NLM, java 1.4.2_08, readlock hang on futex

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.5
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Michal Schmidt
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:	166292
Blocks:	422551 430698
TreeView+	depends on / blocked

Reported:	2007-10-22 14:46 UTC by Guy Streeter
Modified:	2018-11-14 18:19 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-03-13 20:36:04 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Guy Streeter 2007-10-22 14:46:21 UTC

Bringing this RHEL3 BZ forward to RHEL4, since the problem still exists

+++ This bug was initially created as a clone of Bug #166292 +++

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050721
CentOS/1.0.6-1.4.1.centos4 Firefox/1.0.6

Description of problem:
A java based program heavily using file locking over NFS (NFS Server is RH AS
2.1, Veritas Cluster 2.2) hangs regularly (not allways) during nfs read and/or
write locks. 

The NFS Client (udp) is running RedHat EL3, kernel-smp-2.4.21-32.0.1.EL
nfs-utils-1.0.6-33EL, iptables (no rejects in log, all rejects logged), SUN
j2re1.4.2_08 (reproducable with IBM JRE, from RedHat EL Extras), bonding-1.0.4t-1. 
Hardware: HP Proliant DL580G3, Intel e100 dual port and e1000 dual port
(e1000-6.0.60-1 e100-3.1.4q-1)


A strace of the java software shows that the futex call sometimes just hangs for
10sec, 20sec, 30secs or even up to 60secs. The higher the nfs-load the more
often it hangs. 

  13:59:36 gettimeofday({1124366376, 393387}, NULL) = 0
  13:59:36 futex(0x807dbc4, FUTEX_WAIT, 5203, {0, 544848000}) = -1 ETIMEDOUT   
    (Connection timed out)
  13:59:36 futex(0x805bf14, FUTEX_WAKE, 1) = 0
  13:59:36 gettimeofday({1124366376, 952476}, NULL) = 0
> 13:59:36 futex(0x809248c, FUTEX_WAIT, 2, NULL) = 0    
  13:59:46 futex(0x809248c, FUTEX_WAIT, 2, NULL) = 0
  13:59:46 futex(0x809248c, FUTEX_WAIT, 2, NULL) = 0
  13:59:46 gettimeofday({1124366386, 977241}, NULL) = 0

During the hanging state only the following part of the marked line is displayed: 
> 13:59:36 futex(0x809248c, FUTEX_WAIT, 2, NULL

A simultaneous tcpdump on the NFS client shows that _all_ NLM Lock requests are
answered within max. 300msecs, typically within 1-5 msecs. Thus, the nfs server
does not seem to be the problem.

The NFS mount options are (rw,rsize=8192,wsize=8192,hard,intr,addr=10.x.x.x).

The problem occurs more often if there is more traffic. It can be reproduced on
two seperate, but equivalent environments (pre-live and live).

Version-Release number of selected component (if applicable):
kernel-smp-2.4.21-32.0.1.EL nfs-utils-1.0.6-33EL

How reproducible:
Sometimes

Steps to Reproduce:
1. heavily use file locking from two nfs clients to the same nfs server, on the
same filesystem
2. measure the time required for a lock,
3. strace your program
  

Actual Results:  as above

Additional info:

Test program output
[      7ms] Read: 154854:nodeB - the time is: Thu Aug 18 13:59:35 CEST 2005
[     19ms] Aquired write-lock.
[     12ms] Released lock.
[    150ms] written line.
[     87ms] ReadLock
[  10365ms] Read: 154856:nodeB - the time is: Thu Aug 18 13:59:36 CEST 2005
[    169ms] ReadLock
[    184ms] Read: 154857:nodeB - the time is: Thu Aug 18 13:59:46 CEST 2005
[      4ms] Aquired write-lock.
[      8ms] Released lock.
[  10038ms] written line.
[    120ms] ReadLock
[    129ms] Read: 154858:nodeB - the time is: Thu Aug 18 13:59:46 CEST 2005
[    176ms] ReadLock
[    188ms] Read: 154859:nodeB - the time is: Thu Aug 18 13:59:47 CEST 2005
[      9ms] Aquired write-lock.
[     21ms] Released lock.
[     32ms] written line.
[      6ms] ReadLock
[     11ms] Read: 154860:nodeB - the time is: Thu Aug 18 13:59:47 CEST 2005

strace:
13:59:36 gettimeofday({1124366376, 393235}, NULL) = 0
13:59:36 clock_gettime(0, 0xbfffbbac)   = -1 ENOSYS (Function not implemented)
13:59:36 gettimeofday({1124366376, 393387}, NULL) = 0
13:59:36 futex(0x807dbc4, FUTEX_WAIT, 5203, {0, 544848000}) = -1 ETIMEDOUT (Conn
ection timed out)
13:59:36 futex(0x805bf14, FUTEX_WAKE, 1) = 0
13:59:36 gettimeofday({1124366376, 952476}, NULL) = 0
13:59:36 futex(0x809248c, FUTEX_WAIT, 2, NULL) = 0
13:59:46 futex(0x809248c, FUTEX_WAIT, 2, NULL) = 0
13:59:46 futex(0x809248c, FUTEX_WAIT, 2, NULL) = 0
13:59:46 gettimeofday({1124366386, 977241}, NULL) = 0
13:59:46 _llseek(3, 0, [8870654], SEEK_CUR) = 0
13:59:46 _llseek(3, 0, [8870654], SEEK_END) = 0
13:59:46 _llseek(3, 8870654, [8870654], SEEK_SET) = 0
13:59:46 fcntl64(3, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=8870654, len
=2138612993}) = 0

-- Additional comment from supermike on 2005-09-12 12:31 EST --
this may be related to 141282, I'm seeing this problem still in kernel
2.4.21-32.0.1.ELsmp
http://seer.support.veritas.com/docs/277241.htm

Process 3759 attached - interrupt to quit
futex(0xb65a5844, FUTEX_WAIT, 2, NULL

and also something similar to the above, on a different server with a java
thread (no nfs involved)

--- SIGUSR2 (User defined signal 2) @ 0 (0) ---
<... rt_sigsuspend resumed> )           = -1 EINTR (Interrupted system call)
rt_sigreturn(0x791fe28c)                = -1 EINTR (Interrupted system call)
rt_sigreturn(0x800)                     = 1098
--- SIGILL (Illegal instruction) @ 0 (0) ---
rt_sigreturn(0x800)                     = 1098
futex(0x805b174, FUTEX_WAIT, 2, NULL)   = 0
futex(0x88b309c, 0x4 /* FUTEX_??? */, 1) = 0
futex(0x805b174, FUTEX_WAKE, 1)         = 1
futex(0x805b1d4, FUTEX_WAIT, 2, NULL)   = 0
futex(0x805b1d4, FUTEX_WAKE, 1)         = 0
--- SIGUSR2 (User defined signal 2) @ 0 (0) ---
rt_sigprocmask(SIG_BLOCK, NULL, [QUIT USR2], 8) = 0
rt_sigsuspend([QUIT] <unfinished ...>
--- SIGUSR2 (User defined signal 2) @ 0 (0) ---
<... rt_sigsuspend resumed> )           = -1 EINTR (Interrupted system call)
rt_sigreturn(0x791fe28c)                = -1 EINTR (Interrupted system call)
rt_sigreturn(0x800)                     = 1098
--- SIGILL (Illegal instruction) @ 0 (0) ---
rt_sigreturn(0x800)                     = 1098
futex(0x805b174, FUTEX_WAIT, 2, NULL)   = 0
futex(0x805b174, FUTEX_WAKE, 1)         = 1
futex(0x805b1d4, FUTEX_WAIT, 2, NULL)   = 0
futex(0x805b1d4, FUTEX_WAKE, 1)         = 0


-- Additional comment from supermike on 2005-09-14 15:01 EST --
also please see Bug 153722 and Bug 145021
appears the same issue is in the 2.6 kernel, as the problem has been reported in
FC4, I've just seen it on 2.6.12-1.1447_FC4; range of apps now reported with
same issue include veritas netbackup, java, up2date, yum, evolution

-- Additional comment from supermike on 2005-10-04 12:27 EST --
Just found another case where I can reproduce it, this time on the latest kernel
from update6 2.4.21-37.ELsmp
 
running the java installer for oracle 9i:
 
/tmp/OraInstall2005-10-04_09-10-14AM/jre/bin/i386/native_threads/java
 
Process 5649 attached - interrupt to quit
futex(0xa2b6bbac, FUTEX_WAIT, 1211, NULL
 
"out of the box" install for oracle9i on unmodified rhel3 update6 

once again I got around this by reverting to linuxthreads (LD_ASSUME_KERNEL=2.4.19)
it works fine when not using nptl.


-- Additional comment from tao on 2005-10-04 16:18 EST --
From User-Agent: XML-RPC

I don't see any evidence that the patch provided by the developers in bug
141282 was proven to fix the issue. I'll go ahead and send this up, since
it does seem like the same problem.




This event sent from IssueTracker by streeter
 issue 80399

-- Additional comment from supermike on 2005-10-07 11:34 EST --
I ran into this issue again with oracle application server 10g (9.0.4) on rhel3u6
2.4.21-37.ELsmp on 3 servers, all default installs of os and oas.  Java process
hangs on futex_wait; the version shipped with oracle is:
java version "1.4.2_02"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_02-b03)
Java HotSpot(TM) Client VM (build 1.4.2_02-b03, mixed mode)
 
It happens exactly the same way each time, on 3 separate servers and I can
reproduce it at will.

-- Additional comment from mark.meierjohann on 2005-10-18 10:08 EST --
We have meanwhile switched to NFS over tcp and are seeing the same problem. This
bug seems to be somewhere in the backports of the 2.6 features to the 2.4 kernel
(nptl), etc.

We will now test a plain vanilla kernel.org kernel and see if that helps us.

Since this is a large enterprise customer we will have to re-open the calls we
had open, before.


-- Additional comment from steved on 2005-10-28 05:58 EST --
Ingo,

It really seem like this is a dup of bz141282 but
they are claiming the bug still exists.... do you 
think it still is a bug with futexs or something
else like NLM....

-- Additional comment from sarah.e.thompson on 2005-11-01 15:00 EST --
I also ran into this problem with Unidata's ldm.  I tried 2.4 and 2.6 kernels,
every flavor of OS.  The upstream ldm server would serve data for awhile, and
then the child process' would report as "defunct" and the downstream servers
couldn't reconnect, stopping all ldm communication.  When doing an strace on the
pid I see the futex wait.  Sometimes I could reproduce it in a day, other times
it would take weeks, but always the same result.  I put in the ld_assume_kernel
env. variable and it seems much more stable.  

-- Additional comment from sarah.e.thompson on 2005-11-01 15:00 EST --
I also ran into this problem with Unidata's ldm.  I tried 2.4 and 2.6 kernels,
every flavor of OS.  The upstream ldm server would serve data for awhile, and
then the child process' would report as "defunct" and the downstream servers
couldn't reconnect, stopping all ldm communication.  When doing an strace on the
pid I see the futex wait.  Sometimes I could reproduce it in a day, other times
it would take weeks, but always the same result.  I put in the ld_assume_kernel
env. variable and it seems much more stable.  

-- Additional comment from andy.fletcher on 2005-11-01 15:38 EST --
I was seeing the problem with Apache+PHP. I disabled nptl and updated everything
to the very latest packages from RHN and this appears to have solved the
problem, at least for me and at least for the moment (1 week+ and counting).

-- Additional comment from emmerson on 2005-11-02 12:14 EST --
(In reply to comment #9)

I wish to expand on Sarah Thompson's information.  The Unidata LDM is coded to
to the _XOPEN_SOURCE=500 standard and makes extensive use of read and write
locking (via fcntl()) of an mmap()ed file.  Fedora is the only known
operating-system on which the LDM exhibits this problem.

Regards,
Steve Emmerson
LDM Developer
University Corporation for Atmospheric Research

-- Additional comment from jsquires on 2005-11-08 11:49 EST --
I find this problem any time I run any version of java (tried a variety of 1.4.2
32-bit versions as well as 1.5.0_03 64-bit) in cron or batch; strangely, java
works fine interactively.  This is on AS3 update 6, kernel 2.4.21-37EL on x86_64.

For example, to reproduce:

jsquires@bobo:~$ batch
at> /opt/jdk1.4.2_09/bin/java -version
at> <EOT>
job 4 at 2005-11-08 14:15
<wait>
jsquires@bobo:~$ ps x | grep java
 4035 ?        SN     0:00 /opt/jdk1.4.2_09/bin/java -version
jsquires@bobo:~$ strace -p 4035
Process 4035 attached - interrupt to quit
[ Process PID=4035 runs in 32 bit mode. ]
futex(0x80a9b84, FUTEX_WAIT, 1, NULL <unfinished ...>
Process 4035 detached

Using LD_ASSUME_KERNEL=2.4.1 fixes this for me, but it would be nice to find out
what the root of the problem is.

-- Additional comment from tao on 2006-06-06 14:17 EST --
Follow on to yesterday's call... when we originally opened this case, we
had a reproducer.  Turned out to be poor programming (malloc in a signal
handler) that was causing the problem so we closed it.  Problem is that
we're now hitting this w/ Veritas netbackup and the signature looks the
same (but, of course, we can't confirm w/o code).

perl5.6 -le 'print $$; $SIG{CHLD}="ignore"; system "false"'

Hangs about 1 in 10 times (the typo, using ignore rather than IGNORE is
intentional) becuase 5.6 tries to alloc a string for the error message in
the signal handler.  5.8 does not do this.  Could probably hack up a C
example pretty quickly if you have a hard time w/ this example.


This event sent from IssueTracker by streeter 
 issue 81271

-- Additional comment from tao on 2006-06-06 14:17 EST --
File uploaded: sysrq.out

This event sent from IssueTracker by streeter 
 issue 81271
it_file 62490

-- Additional comment from tao on 2006-06-06 14:17 EST --
Attached, the perl backtrace is as follows (including the zombie which
inevitably occurs):

perl          S 00000001  3620  1448   1388  1449               (NOTLB)
Call Trace:   [<c013562c>] schedule_timeout [kernel] 0xbc (0xdbcbfe58)
[<c013d910>] futex_wait [kernel] 0x390 (0xdbcbfe90)
[<c013cca0>] futex_vcache_callback [kernel] 0x0 (0xdbcbfea0)
[<c0144668>] handle_mm_fault [kernel] 0x288 (0xdbcbfeb8)
[<c013cca0>] futex_vcache_callback [kernel] 0x0 (0xdbcbfef4)
[<c010c01e>] do_signal [kernel] 0x8e (0xdbcbff20)
[<c013ceb7>] do_futex [kernel] 0x157 (0xdbcbff58)
[<c0128137>] __mmdrop [kernel] 0x47 (0xdbcbff6c)
[<c013cf99>] sys_futex [kernel] 0xb9 (0xdbcbff88)
[<c02af06f>] no_timing [kernel] 0x7 (0xdbcbffc0)

false         Z DBCBE000     8  1449   1448                     (L-TLB)
Call Trace:   [<c012dc20>] do_exit [kernel] 0x370 (0xdbaedf90)
[<c012ddcb>] do_group_exit [kernel] 0x8b (0xdbaedfac)
[<c02af06f>] no_timing [kernel] 0x7 (0xdbaedfc0)



Internal Status set to 'Waiting on SEG'

This event sent from IssueTracker by streeter 
 issue 81271

-- Additional comment from streeter on 2006-06-06 14:20 EST --
customer has a simple perl script reproducer in comment 14.
sysrq-t while hung attached in comment 15.


-- Additional comment from staubach on 2006-06-16 15:54 EST --
This is not an NFS bug.  It is something to do with the futex support.
I also have a reproducer too, if another one would be desired.

-- Additional comment from emmerson on 2007-09-12 15:36 EST --
Prarit,

I can't speak for Mark, but at least one LDM user has recently encountered this
bug.  I've given him this URL and asked if I can post his email address.

Steve Emmerson

-- Additional comment from fhirtz on 2007-09-12 15:41 EST --
We have a reproduction case which shows it on RHEL3 and RHEL4 (though less
frequent there), so it's still existent.

-- Additional comment from emmerson on 2007-09-12 19:06 EST --
Prarit,

James M. Pelagatti <jamie.edu> is the one who encountered this bug.  You
may contact him.  He doesn't encounter it often.  Here's his "uname -a":

 Linux llwxldm2 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007 x86_64
x86_64 x86_64 GNU/Linux

--Steve Emmerson

Comment 3 Michal Schmidt 2008-03-05 15:38:20 UTC

Several different issues are mixed together in this BZ and related ITs.

The original report was about file locking over NFS (using NLM). In some cases
it takes tens of seconds to obtain the lock. It could be caused by a race
condition where NLM_GRANTED_MSG from the server arrives before the client
prepares for it. That would be bug 432855. A patched kernel for testing is
available from Jeff Layton's page: http://people.redhat.com/jlayton/
Could someone with the long NFS locking delays test it?

Then we have some futex-related reports here, e.g. the hangs with LDM. This
could be explained by bug 217067 which was closed in November. Could an LDM user
retest the futex problem with a current RHEL4 kernel?

I've been working on the NFS locking bug before bug 432855 came to my attention
today. During that, I wrote a testcase for NLM which shows another bug. When a
client waiting for NLM_GRANTED_MSG is interrupted by a signal, the file stays
locked forever. The bug is present in RHEL5 and current upstream. I suspect this
code in do_setlk() is harmful:
if (status == -EINTR || status == -ERESTARTSYS) {
do_vfs_lock(filp, fl);
}

Comment 4 Michal Schmidt 2008-03-13 19:27:24 UTC

The patch for bug 432855 is now present in current RHEL4 development builds:
http://people.redhat.com/vgoyal/rhel4/
It should fix the 30 seconds NFS locking delays. Please test.

The futex hang is something else completely and should have been in a separate
BZ. Possibly fixed already in kernel-2.6.9-67.EL.

I'm discussing the problem of files remaining locked forever on NFS on upstream
mailing list. This will be another BZ.

Devel NAK: this bug is either a duplicate of 432855 or 217067 (then it doesn't
really matter) or it's not (then I won't be able to provide a fix in 4.7).

Comment 5 RHEL Program Management 2008-03-13 20:36:04 UTC

Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.