446777 – pthread_mutex_unlock returns EPERM due to earlier EFAULT from futex lock

Bug 446777 - pthread_mutex_unlock returns EPERM due to earlier EFAULT from futex lock

Summary: pthread_mutex_unlock returns EPERM due to earlier EFAULT from futex lock

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	realtime-kernel
Sub Component:
Version:	beta
Hardware:	i686
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	1.0.1
Target Release:	---
Assignee:	Luis Claudio R. Goncalves
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-05-15 23:06 UTC by David Holmes
Modified:	2008-08-26 19:57 UTC (History)
CC List:	5 users (show)
Fixed In Version:	2.6.24.7-72.el5rt
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-08-26 19:57:21 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
script to run Sun Java testcase on rhelrt-3.farm.hsv.redhat.com (568 bytes, application/x-shellscript) 2008-06-09 15:43 UTC, Clark Williams	no flags	Details
output from /debug/tracing/trace following test run by run_test.sh (9.69 MB, text/plain) 2008-06-09 15:46 UTC, Clark Williams	no flags	Details
Latency trace output from /debug/tracing/latency_trace for test run (8.94 MB, text/plain) 2008-06-09 16:37 UTC, Clark Williams	no flags	Details
Strace from the java reproducer thread that receives EPERM (400.14 KB, text/plain) 2008-06-09 22:21 UTC, Luis Claudio R. Goncalves	no flags	Details
strace output for successful run showing a curious EPERM on FUTEX_UNLOCK_PI (236.33 KB, text/plain) 2008-06-09 23:59 UTC, David Holmes	no flags	Details
strace output for failed run where pthread_mutex_unlock returned EPERM (99.17 KB, text/plain) 2008-06-11 05:40 UTC, David Holmes	no flags	Details
debug patch to get more info about the EPERM issue (1.12 KB, patch) 2008-06-11 19:29 UTC, Thomas Gleixner	no flags	Details \| Diff
Dmesg output for b4482614 using Glibc futex API (10.06 KB, application/octet-stream) 2008-06-11 21:05 UTC, Luis Claudio R. Goncalves	no flags	Details
Dmesg output for b4482614 using Glibc FastFutexes API (48.38 KB, text/plain) 2008-06-11 22:48 UTC, Luis Claudio R. Goncalves	no flags	Details
add debug output for lookup_pi_state (1.49 KB, patch) 2008-06-12 08:14 UTC, Thomas Gleixner	no flags	Details \| Diff
Dmesg output for b4482614 using Glibc futex API (3.80 KB, text/plain) 2008-06-12 16:26 UTC, Luis Claudio R. Goncalves	no flags	Details
Dmesg output for b4482614 using FastFutexes API (75.03 KB, text/plain) 2008-06-12 18:33 UTC, Luis Claudio R. Goncalves	no flags	Details
gdb stacktraces for all threads in hung execution of ExecTest application (73.14 KB, text/plain) 2008-07-02 05:09 UTC, David Holmes	no flags	Details
kernel trace of hung java process (38.21 KB, text/plain) 2008-07-02 06:49 UTC, David Holmes	no flags	Details
testcase .jar file and wrapper scripts for reproducing bug (611.41 KB, application/x-bzip2) 2008-08-07 14:24 UTC, Clark Williams	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2008:0585	0	normal	SHIPPED_LIVE	Important: kernel security and bug fix update	2008-08-26 19:56:57 UTC

Description David Holmes 2008-05-15 23:06:24 UTC

Description of problem:

When we started testing the 2.6.24-30.el5rt kernel we found we were getting a
number of failures due to pthread_mutex_unlock returning EPERM. These are
PRIO_INHERIT mutexes.

Roland Westrelin did further investigation and found that a futex lock
pi syscall fails with EFAULT. He states: 
"My understanding of the libc code is that this return code is ignored. So I
guess the user code proceeds as if it had successfully locked the mutex and
later when it tries to unlock it, the user app gets the EPERM."

One of the main fixes in the -30 kernel was for:

https://bugzilla.redhat.com/show_bug.cgi?id=400541

so that might be related.

There has been some discussion of this on the mailing list:

https://www.redhat.com/mailman/private/rhel-rt-sun/2008-May/msg00012.html

Version-Release number of selected component (if applicable):

2.6.24-30.el5rt

Comment 1 Clark Williams 2008-06-09 15:38:59 UTC

Loaded the reproducer sent to us in email from Sun onto
rhelrt-3.farm.hsv.redhat.com (Dual 2.0GHz Opterons w/12GB RAM) as well as the RT
JVM and ran on a 2.6.24.7-65 trace kernel. The reproducer was run using the
attached run_test.sh and produced a series of the following tracebacks on the
serial console:

WARNING: at kernel/futex.c:543 lookup_pi_state()
Pid: 2486, comm: java Not tainted 2.6.24.7-65.el5rttrace #1

Call Trace:
 [<ffffffff8105ee5c>] futex_lock_pi+0x35d/0x9b0
 [<ffffffff8108beee>] ? ftrace_now+0x9/0xb
 [<ffffffff8105f4c8>] ? do_futex+0x19/0xb47
 [<ffffffff810605ae>] ? compat_sys_futex+0xf2/0x110
 [<ffffffff8105f4c8>] ? do_futex+0x19/0xb47
 [<ffffffff8105ffd6>] do_futex+0xb27/0xb47
 [<ffffffff810604d2>] ? compat_sys_futex+0x16/0x110
 [<ffffffff81028e29>] ? cstar_do_call+0x8c/0x148
 [<ffffffff810604d2>] ? compat_sys_futex+0x16/0x110
 [<ffffffff8100c224>] ? ftrace_call+0x5/0x2b
 [<ffffffff810605ae>] compat_sys_futex+0xf2/0x110
 [<ffffffff81028df2>] ? cstar_do_call+0x55/0x148
 [<ffffffff81028e29>] cstar_do_call+0x8c/0x148

---------------------------
| preempt count: 00000000 ]
| 0-level deep critical section nesting:
----------------------------------------

Comment 2 Clark Williams 2008-06-09 15:43:59 UTC

Created attachment 308695 [details]
script to run Sun Java testcase on rhelrt-3.farm.hsv.redhat.com

Comment 3 Clark Williams 2008-06-09 15:46:37 UTC

Created attachment 308696 [details]
output from /debug/tracing/trace following test run by run_test.sh

This is the output of /debug/tracing/trace following a test run on rhelrt-3
using the run_test.sh on a 2.6.24.7-65.el5rt trace kernel

Comment 4 Luis Claudio R. Goncalves 2008-06-09 16:26:02 UTC

Clark, could you please create an attachment with the latency-trace output of
that very same test? I guess the gurus out there prefer the latency_trace
format. I learned that the hard way :)

Comment 5 Clark Williams 2008-06-09 16:37:42 UTC

Created attachment 308719 [details]
Latency trace output from /debug/tracing/latency_trace for test run

This is the output from /debug/tracing/latency_trace for a test run of the
reproducer (same test run as for the attaced "trace" output).

Comment 6 Clark Williams 2008-06-09 18:12:38 UTC

David,

Am I correct in presuming you only see this on 64-bit systems? The reason I ask
is that the stack trace is going through the compat layer, meaning it's a 32-bit
app running on a 64-bit kernel. 

Or do you see this on 32-bit kernels too?

Clark

Comment 7 Thomas Gleixner 2008-06-09 21:11:34 UTC

I'd be interested in a strace output which shows the EFAULT return

Comment 8 Luis Claudio R. Goncalves 2008-06-09 21:40:29 UTC

Thomas, I am working on it. Meanwhile, strace gives some interesting info:

...
[ Process PID=21729 runs in 64 bit mode. ]
[ Process PID=21731 runs in 32 bit mode. ]
[ Process PID=21729 runs in 64 bit mode. ]
[ Process PID=21731 runs in 32 bit mode. ]
[ Process PID=21729 runs in 64 bit mode. ]
[ Process PID=21731 runs in 32 bit mode. ]
[ Process PID=21729 runs in 64 bit mode. ]
[ Process PID=21731 runs in 32 bit mode. ]
[ Process PID=21729 runs in 64 bit mode. ]
[ Process PID=21731 runs in 32 bit mode. ]
...

Comment 9 Luis Claudio R. Goncalves 2008-06-09 22:21:09 UTC

Created attachment 308757 [details]
Strace from the java reproducer thread that receives EPERM

This is one of the several threads created by the test. I would like to
highlight this detail:

...
futex(0x4813406c, FUTEX_WAKE, 2147483647) = 0
...
futex(0xffc5e838, FUTEX_UNLOCK_PI, 0)	= -1 EPERM (Operation not permitted)
...

This is the only reference to EPERM in all the logs. Interesting enough, it is
trying to unlock a lock apparently owned by another thread.

I can try to gather more logs if you need better strace info.

Comment 10 David Holmes 2008-06-09 23:56:40 UTC

Clark: this happens on 32-bit and 64-bit kernels. I'm currently reproducing with
the -65  32-bit kernel

Thomas: I'm trying to generate strace data for a failure. I will upload once I
have it. Meanwhile I'm uploading the strace log for a successful run as I
noticed that there was an EPERM error in there too, which seemed odd - but I
have no idea from where in the VM execution it occurs and it did not trigger our
fatal error handler, so I assume it's a low-level unlock not a direct
pthread_mutex_unlock.

Luis: As per mail on the mailing list, the issue is the EFAULT on the original
lock more than the (expected) EPERM on the subsequent unlock.

Comment 11 David Holmes 2008-06-09 23:59:53 UTC

Created attachment 308759 [details]
strace output for successful run showing a curious EPERM on FUTEX_UNLOCK_PI

Comment 12 David Holmes 2008-06-10 00:04:13 UTC

The strace log I just attached is showing the same issue that Luis previously
reported.

Comment 13 Thomas Gleixner 2008-06-10 10:03:39 UTC

The EPERM really looks like some bogus unlock attempt. Can you find the
corresponding lock operation when you strace with the -f option ?

Comment 14 Luis Claudio R. Goncalves 2008-06-10 13:05:06 UTC

Thomas, running the test under strace -f (instead of -ff) also shows that unlock
returning EPERM and no relative lock operation.

Two other thoughts: that unlock happens in the very beginning of the test... the
hang seems to happens dozens of seconds after that. So, it is not the unbalanced
unlock operation that causes the hang.

Maybe the other threads are waiting for the lock that was not unlocked
(picturing that an unbalanced locks exist along with the unbalanced unlock).

Comment 15 David Holmes 2008-06-11 00:14:31 UTC

I think the EPERM in the strace log is a red-herring. I ran on a 2.6.22 kernel
from a different vendor, as well as earlier RH kernels and that EPERM is always
there.

It may also be that the original problem description here is incorrect. The
EPERM failure and the EFAULT failure occurred under different conditions. I'm
doing further testing to clarify the two scenarios.

One thing strange I did encounter on the new kernel is that strace -f causes the
test application to fail due to a strange "Unknown error 513" that seems to
originate from the low-level process.waitFor() code. This doesn't occur on an
earlier 2.6.24.4-30 kernel.

Comment 16 David Holmes 2008-06-11 05:40:04 UTC

Created attachment 308893 [details]
strace output for failed run where pthread_mutex_unlock returned EPERM

It seems there was a mis-communication between Roland and myself when I put the
bug report together. The EPERM and the EFAULT have not actually been seen at
the same time.

In normal mode our code uses pthread_mutex_lock/unlock and that code
encountered the EPERM failure as reported on the unlock. Examination of the
code in question shows that the unlock only occurs after a previous lock.

We have a "fast mutex" mode in which we replace pthread_mutex use with a small
API built directly on top of the futex syscall. In this mode Roland observed an
EFAULT from the futex call when trying to perform a lock. Roland hypothesized
that if a pthread_mutex_lock encountered the same EFAULT from the futex call,
then the EFAULT would be ignored and so the code would think it had locked a
mutex that it had not, and hence the later unlock would (correctly) fail with
EPERM. But we have not actually observed the EFAULT occurring when using
pthread_mutex_lock.

The "fast mutex" code can be enabled using -XX:+LinuxUsesFastMutex on the
"java" invocation.

My apologies for the confusion this has caused. 

I've attached strace output from a failing test case. The interesting part is
near the end. Search for:
 
stat64("/.automount/mackdrive.france/root/export/jrts/work/dh198349/rt-tests/ExecTest$1.class",
{st_mode=S_IFREG|0644, st_size=419, ...}) = 0

as this is where the loading of the application class occurs. After this the
log shows mainly thread creation and starting - which involves timer_create,
sched_setscheduler etc as well as the actual "clone" system call. The crash
occurs while one thread is starting another. If we look at the futex operations
we see a few EAGAIN "errors":

mmap2(NULL, 528384, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xa507b000
futex(0x466e65b8, FUTEX_WAIT, 2, NULL)	= -1 EAGAIN (Resource temporarily
unavailable)
futex(0x466e65b8, FUTEX_WAKE, 1)	= 0
clone(child_stack=0xa50fb4b4,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
parent_tidptr=0xa50fbbd8, {entry_number:6, base_addr:0xa50fbb90, limit:1048575,
seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0,
useable:1}, child_tidptr=0xa50fbbd8) = 9500

Then the last syscall we see before the VM error handler has called abort is:

futex(0xa530c52c, FUTEX_WAIT, 3, {1152843388, 420991068}) = ?
ERESTART_RESTARTBLOCK (To be restarted)
+++ killed by SIGABRT (core dumped) +++

Comment 17 Thomas Gleixner 2008-06-11 06:19:58 UTC

> futex(0x466e65b8, FUTEX_WAIT, 2, NULL)	= -1 EAGAIN (Resource temporarily
unavailable)

I have a hard time to see how EAGAIN can be returned from futex(FUTEX_WAIT).

> futex(0xa530c52c, FUTEX_WAIT, 3, {1152843388, 420991068}) = ?
ERESTART_RESTARTBLOCK (To be restarted)
> ++ killed by SIGABRT (core dumped) +++

Can you read the stack back trace from the core dump ?

Comment 18 David Holmes 2008-06-11 06:34:18 UTC

The stacktrace only shows the call from failed mutex-locking wrapper to our
fatal error handler (which encounters a secondary error):

Program terminated with signal 6, Aborted.
#0  0xb7f8c410 in __kernel_vsyscall ()
(gdb) where
#0  0xb7f8c410 in __kernel_vsyscall ()
#1  0x465d1ba0 in raise () from /lib/libc.so.6
#2  0x465d34b1 in abort () from /lib/libc.so.6
#3  0xb776658c in os::abort (dump_core=1) at
/usr/re/jrts2.1/release/kereon/src/os/linux/vm/os_linux.cpp:1995
#4  0xb78b0767 in VMError::report_and_die () at
/usr/re/jrts2.1/release/kereon/src/share/vm/utilities/vmError.cpp:698
#5  0xb753c5fb in report_assertion_failure (
    file_name=0xb7a149c0
"/usr/re/jrts2.1/release/kereon/src/os/linux/vm/mutex_linux.cpp", line_no=874,
    message=0xb7a154a0 "assert(((1<<1) & self->_internal_state) == 0,\"already
locking a PIP mutex\")")
    at /usr/re/jrts2.1/release/kereon/src/share/vm/utilities/debug.cpp:212
#6  0xb7731e5c in os::PIPMutex::lock (this=0x8073320, self=0xa530f3f0,
interruptible=
      {_interrupt = 0, _async = 0, _aie = 0}) at
/usr/re/jrts2.1/release/kereon/src/os/linux/vm/mutex_linux.cpp:874
#7  0xb772bfb9 in Mutex::wait_for_lock_implementation ()
    at /usr/re/jrts2.1/release/kereon/src/os/linux/vm/mutex_linux.inline.hpp:34
#8  0xb772b104 in Mutex::lock_without_safepoint_check (this=0x80732a0)
    at /usr/re/jrts2.1/release/kereon/src/share/vm/runtime/mutex.cpp:303
#9  0xb73d1c2b in MutexLockerEx::MutexLockerEx (this=0xa4df4060,
mutex=0x80732a0, no_safepoint_check=1, spin=0)
    at /usr/re/jrts2.1/release/kereon/src/share/vm/runtime/mutexLocker.hpp:173
#10 0xb7876783 in Threads::print_on_error (st=0xa4df41e0, current=0xa530cba0,
buf=0xb7f5f900 "SIGTERM", buflen=2000)
    at /usr/re/jrts2.1/release/kereon/src/share/vm/runtime/thread.cpp:4445
#11 0xb78afc9e in VMError::report (this=0xa4df4250, st=0xa4df41e0)
    at /usr/re/jrts2.1/release/kereon/src/share/vm/utilities/vmError.cpp:406
#12 0xb78b04f7 in VMError::report_and_die () at
/usr/re/jrts2.1/release/kereon/src/share/vm/utilities/vmError.cpp:644
#13 0xb753c66b in report_fatal (file_name=0xb7977360
"/usr/re/jrts2.1/release/kereon/src/os/linux/vm/os_linux.hpp",
    line_no=198, message=0xa4df42b0 "pthread_mutex_unlock: Operation not permitted")
    at /usr/re/jrts2.1/release/kereon/src/share/vm/utilities/debug.cpp:218
#14 0xb753c6d3 in report_fatal_vararg (
    file_name=0xb7977360
"/usr/re/jrts2.1/release/kereon/src/os/linux/vm/os_linux.hpp", line_no=198,
    format=0xb7977ace "pthread_mutex_unlock: %s") at
/usr/re/jrts2.1/release/kereon/src/share/vm/utilities/debug.cpp:227
#15 0xb7531fc3 in os::Linux::mutex_unlock () at
/usr/re/jrts2.1/release/kereon/src/os/linux/vm/os_linux.hpp:198
---Type <return> to continue, or q <return> to quit---
#16 0xb7531e44 in os::Linux::LLMutex_slow::unlock (this=0xa530c1e8)
    at /usr/re/jrts2.1/release/kereon/src/os/linux/vm/os_linux.hpp:452
#17 0xb7531933 in os::Linux::OSMutex::unlock () at
/usr/re/jrts2.1/release/kereon/src/os/linux/vm/os_linux.hpp:573
#18 0xb7732ce4 in os::PIPMutex::lock (this=0x8073250, self=0xa530f3f0,
interruptible=
      {_interrupt = 0, _async = 0, _aie = 0}) at
/usr/re/jrts2.1/release/kereon/src/os/linux/vm/mutex_linux.cpp:1289
#19 0xb772bfb9 in Mutex::wait_for_lock_implementation ()
    at /usr/re/jrts2.1/release/kereon/src/os/linux/vm/mutex_linux.inline.hpp:34
#20 0xb772b104 in Mutex::lock_without_safepoint_check (this=0x80731d0)
    at /usr/re/jrts2.1/release/kereon/src/share/vm/runtime/mutex.cpp:303
#21 0xb73d1c2b in MutexLockerEx::MutexLockerEx (this=0xa4df45e0,
mutex=0x80731d0, no_safepoint_check=1, spin=0)
    at /usr/re/jrts2.1/release/kereon/src/share/vm/runtime/mutexLocker.hpp:173
#22 0xb764600c in JVM_SetThreadPriority (env=0xa530ccf4, jthread=0xa4df4690, prio=5)
    at /usr/re/jrts2.1/release/kereon/src/share/vm/prims/jvm.cpp:3263
#23 0xb2246ea9 in ?? ()
#24 0xa530ccf4 in ?? ()
#25 0xa4df4690 in ?? ()
#26 0x00000005 in ?? ()
#27 0xa4df465c in ?? ()
#28 0x00000000 in ?? ()

Comment 19 Thomas Gleixner 2008-06-11 16:15:11 UTC

> #13 0xb753c66b in report_fatal (file_name=0xb7977360
> "/usr/re/jrts2.1/release/kereon/src/os/linux/vm/os_linux.hpp",
>  line_no=198, message=0xa4df42b0 "pthread_mutex_unlock: Operation not 
>  permitted")

So that's the EPERM again. But where is the EFAULT happening ? I still see no
connection at all. Can you please print out the futex value and the TID of the
thread which gets the EPERM in your error handler?

That should give us an hint which thread is holding the mutex and which one is
trying to release it.

Comment 20 Thomas Gleixner 2008-06-11 18:39:46 UTC

I just read the comments again and I have to admit that I'm pretty confused.

According to the original bug report there is a EFAULT returned from a
pthread_mutex_lock which is ignored by glibc and later on the
pthread_mutex_unlock returns EPERM. 

I have not yet seen any indication for that EFAULT happening, just the EPERM
shows up.

Now you have your own fast mutex implementation which seems to have the same
EPERM issue.

I'm a bit confused about the failure description. What's the common failure mode
of glibc and your implemenation ? What are the differences ?

Thanks,
        tglx

Comment 21 Thomas Gleixner 2008-06-11 19:29:49 UTC

Created attachment 308982 [details]
debug patch to get more info about the EPERM issue

Luis,

can you please apply the attached patch to the -rt kernel and report the kernel
output for the EPERM case including information about the threads (TIDs)
involved in that test.

Thanks,
       tglx

Comment 22 Luis Claudio R. Goncalves 2008-06-11 19:39:21 UTC

Patchin'and Compilin' :)

Comment 23 Luis Claudio R. Goncalves 2008-06-11 21:05:34 UTC

Created attachment 308997 [details]
Dmesg output for b4482614 using Glibc futex API

Comment 24 Luis Claudio R. Goncalves 2008-06-11 22:48:03 UTC

Created attachment 309012 [details]
Dmesg output for b4482614 using Glibc FastFutexes API

Comment 25 David Holmes 2008-06-11 22:55:44 UTC

Thomas,

As per my update yesterday there is in fact no observed connection between the
EPERM and the EFAULT - they are two different failure modes: 
- The EPERM shows up only with pthread_mutex_unlock. 
- The EFAULT has only been seen on a futex-lock using our own "fast-mutex" that
uses futex directly.

It was hypothesized that if an EFAULT also occurred inside a pthread_mutex_lock
then glibc would ignore that error and act as if the lock had been successful.
That would then explain why the subsequent unlock gave EPERM. But this has not
actually been observed.

Again my apologies for the confusion here.

David

Comment 26 Thomas Gleixner 2008-06-12 07:12:34 UTC

No problem. The hypothesis of EFAULT was the misleading part. If the EFAULT
would happen then we'd see it in the strace outputs.

Luis run my debug patch and what we see is:
futex_unlock_pi: uval 0x00000000, uad 00000000ffef72a8 TID 3194

That means something is trying to unlock a futex which is not locked or the
pointer is simply wrong. The address is not used anywhere else, but we do not
see the lock operation as it is in the fast path which does not enter the
kernel. It might be worthwhile for debugging to disable the user space fast path
so we can see all futex operations in strace.

What worries me more is the lookup_pi_state back trace in Luis's dmesg outputs:
WARNING: at kernel/futex.c:547 lookup_pi_state()
Pid: 14109, comm: java Not tainted 2.6.24.7-65_dbg #1

This looks like memory corruption or some complete confused state.

I'll come up with debug patch for that as well.

Thanks,

       tglx

Comment 27 Thomas Gleixner 2008-06-12 08:14:28 UTC

Created attachment 309042 [details]
add debug output for lookup_pi_state

Luis,

can you please apply that on top of the previous one and provide the output ?

Thanks,
       tglx

Comment 28 Luis Claudio R. Goncalves 2008-06-12 16:26:59 UTC

Created attachment 309095 [details]
Dmesg output for b4482614 using Glibc futex API 

Thomas, I have applied futex-debug2.patch and this is the result of the first
test.

Comment 29 Luis Claudio R. Goncalves 2008-06-12 18:33:46 UTC

Created attachment 309124 [details]
Dmesg output for b4482614 using FastFutexes API 

Interesting enough, the test failed but didn't hit the new debug code.

Comment 30 Thomas Gleixner 2008-06-12 18:35:40 UTC

Thanks Luis,

lookup_pi: pid mismatch uval 0x800066f6 pid 26358 ctid 26365 keyw 0x83be000 keyp
ffff81037d44ba00 keyo 3990 pio ffff81027e1e2140 piop 26363

The TID stored in the user space futex is 26358, but the pi state is owned by
TID 26363.

/me twists brain :)

Comment 31 Luis Claudio R. Goncalves 2008-06-13 20:06:24 UTC

What's the condition for the test to be considered failed?

Comment 32 David Holmes 2008-06-14 06:32:21 UTC

I'm unclear whom the previous question is directed at. :) 

Our test program in the current context should only "fail" if it encounters the
EPERM error and so terminates the VM. 

The failure mode for that test was that the test hung rather than completing. It
could hang under two conditions:
a) due to some incorrect logic in Process.waitFor
b) because the exec'ed program produced too much output, resulting in the pipe
filling and blocking the process. As the main thread was waiting for the process
to finish and the process was blocked on a pipe that nothing was reading from,
the test hung.

Comment 33 Thomas Gleixner 2008-06-16 10:59:40 UTC

The EPERM which shows with each run of the test program:

futex_unlock_pi: uval 0x00000000, uad 00000000ffef72a8 TID 3194

looks like memory corruption, which is either caused by a memset or a call to
pthread_mutex_init() while the futex is held by a task which tries to release it
later.

David, is there a chance that you can instrument this in your test code / java vm ?

Comment 34 David Holmes 2008-06-16 11:46:37 UTC

Can you clarify exactly what you would like instrumented please. I will try to
run with such code in the morning.

The mutex in question is initialized when the VM is started and lasts as long as
the VM does.

I'm suspicious of the fact we only see this with this test that involves
fork/exec. Is there a possibility of some connection there?

David

Comment 35 Thomas Gleixner 2008-06-16 12:55:24 UTC

> Can you clarify exactly what you would like instrumented please. I will try to
> run with such code in the morning.

Instrumentation of the pthread_mutex related calls.
 
> The mutex in question is initialized when the VM is started and lasts as long as
> the VM does.

Ah, so you know which particular mutex that is ? 

> I'm suspicious of the fact we only see this with this test that involves
> fork/exec. Is there a possibility of some connection there?

Hmm, I think this one happens way before fork/exec happen

Comment 36 Thomas Gleixner 2008-06-16 15:25:32 UTC

The EPERM which happens once per start is harmless. glibc is built against 2.6.9
kernel headers. So __ASSUME_FUTEX_LOCK_PI is not set as this was introduced in
2.6.18. pthread_mutex_init does therefor:

    case PTHREAD_PRIO_INHERIT << PTHREAD_MUTEXATTR_PROTOCOL_SHIFT:
#ifndef __ASSUME_FUTEX_LOCK_PI
      if (__builtin_expect (tpi_supported == 0, 0))
        {
          int lock = 0;
          INTERNAL_SYSCALL_DECL (err);
          int ret = INTERNAL_SYSCALL (futex, err, 4, &lock, FUTEX_UNLOCK_PI,
                                      0, 0);
          assert (INTERNAL_SYSCALL_ERROR_P (ret, err));
          tpi_supported = INTERNAL_SYSCALL_ERRNO (ret, err) == ENOSYS ? -1 : 1;
        }
      if (__builtin_expect (tpi_supported < 0, 0))
        return ENOTSUP;
#endif
      break;

which checks whether PI support is available in the kernel or not. the lock
value is 0, so the EPERM is not a surprise.

Comment 37 David Holmes 2008-06-17 06:48:06 UTC

The mutex always seems to be what we call the Safepoint_mutex, and it is used to
protect various operations on threads.

I'm trying to glean more information from the VM but for some reason today the
bug refuses to reproduce on either of the machines I've been testing on.

Comment 38 Thomas Gleixner 2008-06-25 05:54:25 UTC

The EFAULT return is really happening due to a missing fault fixup in the return
path of futex_lock_pi, which later triggers the WARN_ON in lookup_pi_state.

Fix for this problem is in mainline and in the latest -rt release.

David, Roland, thanks for the analysis and the reproducer.

Comment 39 David Holmes 2008-06-25 06:04:16 UTC

Thomas,

I'm unclear whether this EFAULT relates to the EPERM we are seeing in
pthread_mutex_unlock, or to the EFAULT we saw in the code that used futex directly ?

Thanks for the clarification.
David

Comment 40 Thomas Gleixner 2008-06-25 08:20:38 UTC

David,

those are related. What happens is:

T1,T2 are blocked on the futex.

T3 releases the futex and T1 is the assigned pending owner

T2 acquires the rtmutex

fork happens and maps the anon memory read only for COW

Either T2 or T1 wants to fixup the user space variable and fault. That leaves
the internal status inconsistent. EFAULT is returned to user space, but user
space has no way to fix that up. T2 thinks it got the futex and unlocks it,
which leads to EPERM due to the mismatch of the user space variable.

The patch fixes the fault issue, which is the root cause for all of those problems.

Comment 41 David Holmes 2008-06-25 08:49:46 UTC

Okay - but why didn't the EFAULT show up in the strace output ?

Comment 42 Thomas Gleixner 2008-06-25 12:03:07 UTC

Because I focussed on the first EPERM which is ok due to glibc probing the
availability of PI futex support in the kernel.

Comment 43 David Holmes 2008-07-02 05:09:46 UTC

Created attachment 310758 [details]
gdb stacktraces for all threads in hung execution of ExecTest application

I tested the new -69 kernel today to see if this bug had been resolved. I did
not reproduce it in 15 runs of the ExecTest application but the 16th execution
hung. I've attached the gdb thread dump. There seem to be a number of threads
stuck in fork-related malloc/free calls.

Comment 44 Thomas Gleixner 2008-07-02 06:05:21 UTC

Hmm, would have been interesting to see where the threads are stuck in the
kernel. If you can reproduce it, please do

echo t >/proc/sysrq_trigger

and provide the kernel output.

Comment 45 David Holmes 2008-07-02 06:34:43 UTC

I did not reproduce this in another 200 runs but I still have the hung process
active.

I tried the command suggested and it after a long delay nothing happens. I have
set /proc/sys/kernel/sysrq to 1 first. Does this command send its output
somewhere other than the current terminal?

Comment 46 David Holmes 2008-07-02 06:49:09 UTC

Created attachment 310767 [details]
kernel trace of hung java process

Discovered the output via dmesg. Attached is the info for the Java process and
the perl process it was execing (shown as defunct by ps).

Thanks.

Comment 47 Luis Claudio R. Goncalves 2008-07-09 15:00:41 UTC

Have you been a chance to use any kernel newer than -67? I guess Clark have made
-69 available for your testing. These kernels have Thomas' fixes for this futex
issues.

Comment 48 David Holmes 2008-07-09 22:39:11 UTC

As per comment #43 we tested -69. The EPERM issue seems to have gone, but we did
run into the hang discussed above.

Comment 49 Luis Claudio R. Goncalves 2008-07-09 22:43:50 UTC

Thanks, I missed that comment.

Comment 50 David Holmes 2008-07-16 05:12:30 UTC

I can not reproduce the hang now we have the -72 kernel.

I think this bug can be closed.

Comment 51 Clark Williams 2008-07-16 15:16:55 UTC

closing...

Comment 52 Beth Uptagrafft 2008-07-28 16:35:41 UTC

Need to reopen and set state to Modified.

Comment 53 Clark Williams 2008-08-07 14:24:38 UTC

Created attachment 313695 [details]
testcase .jar file and wrapper scripts for reproducing bug

Unpack after installing RT JVM, adjust the script run_the_java_thing.sh and do_the_java_thing.sh to point to the installation directory for the JVM and then run the scripts.

Comment 57 errata-xmlrpc 2008-08-26 19:57:21 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0585.html

Note You need to log in before you can comment on or make changes to this bug.