Bug 443963

Summary: Java application hang with fastgup patches
Product: Red Hat Enterprise MRG Reporter: IBM Bug Proxy <bugproxy>
Component: realtime-kernelAssignee: Peter Zijlstra <pzijlstr>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: betaCC: bhu, lwang, mschmidt, srostedt, williams
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: 2.6.24.7-62.el5rt Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-06-02 22:33:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
gdb debug log of the jxeinajar hang
none
Tapset for futex_wait/futex_wake none

Description IBM Bug Proxy 2008-04-24 12:40:41 UTC
=Comment: #0=================================================
Dinakar Guniguntala <Dinakar.G.com> - 2008-04-23 15:23 EDT
Problem description:
jxeinajar hangs just before application exit when running with the latest MRG
kernel. Without fastgup, jxeinajar runs to completion.
These patches were removed from the 2.6.24.4-46.el5rt MRG kernel for the
application to work fine

#mm-introduce-fast_gup.patch
#futex-gup.patch
#futex-fast_gup.patch
#fast_gup_patchlet.patch

Provide output from "uname -a", if possible:

Linux rt-oak.austin.ibm.com 2.6.24.4-46ibmrt2.2 #1 SMP PREEMPT RT Tue Apr 22
18:45:43 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

2.6.24.4-46ibmrt2.2 is exactly same as the MRG 2.6.24.4-46.el5rt kernel

Hardware Environment
LS21 blade

Is this reproducible?
    If so, how long does it (did it) take to reproduce it?
    Describe the steps:

Run release-testing on the above mentioned kernel

Is the system (not just the application) hung?
    If so, describe how you determined this:

System is up and running. Only the test hangs.
=Comment: #1=================================================
Dinakar Guniguntala <Dinakar.G.com> - 2008-04-23 15:25 EDT

gdb debug log of the jxeinajar hang

We have 4 threads stuck in a pthread_cond_wait and all on different cond
variables. Currently thread 1 is executing a exitJavaVM and is probably waiting

for all other threads to finish, but it looks like somebody missed a
cond_signal and so we have an application hang

Comment 1 IBM Bug Proxy 2008-04-24 12:40:43 UTC
Created attachment 303619 [details]
gdb debug log of the jxeinajar hang

Comment 2 IBM Bug Proxy 2008-04-28 18:24:38 UTC
------- Comment From Dinakar.G.com 2008-04-28 14:20 EDT-------
There are a couple of places in futex.c where on error we return without doing a
put_futex_key (futex_wake_op and futex_requeue). However changing that did not
help and now I am not so sure we need to do a put_futex_key there anymore

Comment 3 IBM Bug Proxy 2008-04-29 08:32:44 UTC
------- Comment From sripathi.com 2008-04-29 04:28 EDT-------
1) I removed requeue code from the equation by returning ENOSYS from
futex_requeue. This did not fix the problem.

2) I put a check for error condition from get_user_pages() in get_futex_key(). I
have not hit this check, so get_user_pages seems to be succeeding.

Comment 4 IBM Bug Proxy 2008-04-29 13:40:55 UTC
------- Comment From sripathi.com 2008-04-29 09:35 EDT-------
I looked at some strace outputs from hanging runs. (The test sometimes doesn't
hang while running under strace.)

I looked at futex calls made by all threads in the process. Only the main thread
(tid 5716) seems to be interesting. Below is a snippet from strace:

<snip>
5193  futex(0x82b9720, FUTEX_UNLOCK_PI, 1257074676) = 0
5185  futex(0x805f034, FUTEX_WAIT, 1933, {0, 9948249} <unfinished ...>
5176  futex(0x80f24c8, FUTEX_WAIT, 1, NULL <unfinished ...>   <==== Last call by
5176
5193  futex(0x80f24c8, FUTEX_WAKE_OP, 1, 1, 0x80f24c4, {FUTEX_OP_SET, 0,
FUTEX_OP_CMP_EQ, 0}) = 0  <=== 5193 is doing WAKE_OP on the same futex.
5185  <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
5185  futex(0x805f060, FUTEX_UNLOCK_PI, 1257074676) = 0
<snip>

FUTEX_WAKE_OP returns the number of threads actually woken up. All other
FUTEX_WAKE_OP calls in my strace outputs have returned 1, whereas this
particular one has returned 0. So it looks like the code doesn't handle properly
if WAKE_OP returns 0. I will look at both kernel and glibc code to understand
what might be going wrong.

Comment 5 IBM Bug Proxy 2008-04-30 06:40:46 UTC
------- Comment From sripathi.com 2008-04-30 02:33 EDT-------
I wanted to verify whether FUTEX_WAKE_OP is part of the problem.
pthread_cond_signal() first calls FUTEX_WAKE_OP and calls FUTEX_WAKE under some
conditions. I recompiled 32 bit glibc with a modified pthread_cond_signal(),
where instead of calling FUTEX_WAKE_OP, it directly calls FUTEX_WAKE. With this
glibc too, I recreated the problem. Snippet of strace that shows the problem:

10398 futex(0x80f1438, FUTEX_WAIT, 81, NULL <unfinished ...>
10390 futex(0x80f24b8, FUTEX_WAIT, 1, NULL <unfinished ...>  <== Main thread.
Never wakes up after this.
10399 <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
10399 futex(0x805f050, FUTEX_UNLOCK_PI, 4160131060) = 0
10399 futex(0x805f024, FUTEX_WAIT, 2211, {0, 9965948} <unfinished ...>
10407 futex(0x80f24b8, FUTEX_WAKE, 1)   = 0                  <=== Wake called on
the futex, but no thread is woken up.
10399 <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
10399 futex(0x805f050, FUTEX_UNLOCK_PI, 4160131060) = 0

From the above, it appears that even though main thread was waiting on the futex
when the FUTEX_WAKE call was made, it does not wake up. Considering this, I
think we need to concentrate on why the main thread doesn't wake up.

Comment 6 IBM Bug Proxy 2008-04-30 08:24:33 UTC
------- Comment From sripathi.com 2008-04-30 04:18 EDT-------
As suggested by PeterZ, I am going to try this test on a setup that has private
futexes, but no fastgup.

Comment 7 IBM Bug Proxy 2008-04-30 11:32:33 UTC
------- Comment From sripathi.com 2008-04-30 07:26 EDT-------
To get a system with private futexes but no fastgup, I installed a kernel
without fastgup on Fedora8. With this, the testcase still hangs. So I can
conclude that the fastgup patches by themselves are not the cause of the
problem, but they may be contributing to making it occur easily.

I then installed this kernel without fastgup on our R2 system and saw that the
problem is not seen. R2 system is based on RHEL5.1, so it's glibc does not have
private futexes.

Summarizing:

R2 system (based on RHEL5.1, no support for private futexes in glibc)
=====================================================================
2.6.24.4-46.el5rt             Problem seen
2.6.24.4-47 without fastgup   Problem not seen

Fedora8 system (based on RHEL5.1, private futexes support in glibc)
===================================================================
2.6.24.4-46.el5rt             Problem seen
2.6.24.4-47 without fastgup   Problem seen

Based on the above results, it is not possible to put the entire blade on either
private futexes or fastgup.

Comment 8 Peter Zijlstra 2008-05-01 15:13:58 UTC
Actually it would point to an upstream problem as the fast_gup() futex patches
make the shared futexes have pretty much the same locking as private futexes
have upstream.

That is, upstream shared futexes take the mmap_sem (and will thus for threaded
programs have a big serializing lock) whereas private futexes will not. The
whole purpose of the futex fast_gup() work was to take out that mmap_sem usage.

I'll review the condition variable code FUTEX_WAIT/FUTEX_WAKE{,_OP} operations
on race conditions.

Comment 9 IBM Bug Proxy 2008-05-12 08:40:45 UTC
------- Comment From sripathi.com 2008-05-12 04:35 EDT-------
I have a good idea about this problem now. I used strace and some printks in
futex_wait/wake to understand what is going on. To keep FUTEX_WAKE_OP out of the
picture, I am running a modified glibc in which pthread_cond_signal skips
calling FUTEX_WAKE_OP and defaults to calling FUTEX_WAKE every time.

When the hang occurs, I see the following threads:

tid   pid   ppid
5093  5093  5092
/home/rtuser/linux-rt-tests/internal/java/jvm/ibm-ws-rt-i386-1.0-3.0/bin/jxeinajar
-J-Xr
5096  5093  5092
/home/rtuser/linux-rt-tests/internal/java/jvm/ibm-ws-rt-i386-1.0-3.0/bin/jxeinajar
-J-Xr
5101  5093  5092
/home/rtuser/linux-rt-tests/internal/java/jvm/ibm-ws-rt-i386-1.0-3.0/bin/jxeinajar
-J-Xr
5102  5093  5092
/home/rtuser/linux-rt-tests/internal/java/jvm/ibm-ws-rt-i386-1.0-3.0/bin/jxeinajar
-J-Xr
5103  5093  5092
/home/rtuser/linux-rt-tests/internal/java/jvm/ibm-ws-rt-i386-1.0-3.0/bin/jxeinajar
-J-Xr
5104  5093  5092
/home/rtuser/linux-rt-tests/internal/java/jvm/ibm-ws-rt-i386-1.0-3.0/bin/jxeinajar
-J-Xr
5105  5093  5092
/home/rtuser/linux-rt-tests/internal/java/jvm/ibm-ws-rt-i386-1.0-3.0/bin/jxeinajar
-J-Xr
5106  5093  5092
/home/rtuser/linux-rt-tests/internal/java/jvm/ibm-ws-rt-i386-1.0-3.0/bin/jxeinajar
-J-Xr

Snippets of strace output:

5093  <... futex resumed> )             = 0
5109  futex(0xf229dd80, FUTEX_WAKE, 1)  = 0
5093  futex(0x80f24d0, FUTEX_WAIT, 1, NULL <unfinished ...>  <== Last FUTEX_WAIT
by 5093
5109  futex(0x80f24d0, FUTEX_WAKE, 1)   = 0 <== Wakeup on that futex by 5109
5102  <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
5102  futex(0x805f068, FUTEX_UNLOCK_PI, 4159623156) = 0
5102  futex(0x805f03c, FUTEX_WAIT, 271, {0, 9959177}) = -1 ETIMEDOUT (Connection
timed out)
5102  futex(0x805f068, FUTEX_UNLOCK_PI, 4159623156) = 0

5093 is the main thread which is sleeping endlessly in the kernel, having done
FUTEX_WAIT. Later, thread 5109 does FUTEX_WAKE on that very futex (0x80f24d0).
However, 5093 does not wake up. This very likely leads to the hang that we are
seeing.

Now lets look at my printks from futex_wait and futex_wake in the kernel. I have
these printks in futex_wait() and futex_wake(), after get_futex_key() is called.

wait: pid = 5093, uaddr = 00000000080f24d0, key.shared.page = ffffe200077724c0,
key.both.offset = 1233
wake: pid = 5109, uaddr = 00000000080f24d0, key.shared.page = ffffe200077981a0,
key.both.offset = 1233
wake: pid 5109 did not wake up anyone on uaddr 00000000080f24d0

In line 1, we see 5093 getting into futex_wait. In line 2, we see 5109 going
into wake, but line 3 says it did not wake up anyone. I have printed shared.page
and both.offset values from the futex_key used. Though the user address is same,
we see that the shared.page values are different. Because of this anamoly, 5109
does not find a matching futex_q (match_futex) when it searches through the
list. Hence it won't wake up 5093.

I have verified a number of times that whenever a hang occurs, the above
condition is true. The 'page' address being different in the two calls to
get_user_pages/fast_gup seems to be the root cause of the problem.

Peter, any ideas?

Comment 10 IBM Bug Proxy 2008-05-12 21:00:37 UTC
------- Comment From dvhltc.com 2008-05-12 16:59 EDT-------
Sripathi, Can you try this instrumentation on a kernel without fast_gup and a
glibc with private futexes?  Peter would like to know if the symptom is the same
(different shared page value).  If so, then it could be that removing the mmap
sem was serializing some race on this value.

Comment 11 Peter Zijlstra 2008-05-20 15:20:50 UTC
I might have an idea - but there have been stability issues wrt the fast_gup
stuff itself so the whole queue will be dropped for now - will work on getting
it back in and also address the futex issue.

Comment 12 Clark Williams 2008-05-22 20:52:38 UTC
fast_gup dropped for now; severity droppped to high

Comment 13 Clark Williams 2008-06-02 22:33:33 UTC
Any GUP related patches will be applied to a new source base and will not apply
to the GA kernel, so will not be relevant.

closing

Comment 14 IBM Bug Proxy 2008-06-03 06:00:41 UTC
------- Comment From sripathi.com 2008-06-03 01:57 EDT-------
Closing on our side as well.

Comment 15 IBM Bug Proxy 2008-11-07 09:06:20 UTC
Created attachment 322831 [details]
Tapset for futex_wait/futex_wake