Bug 443963
Summary: | Java application hang with fastgup patches | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | IBM Bug Proxy <bugproxy> | ||||||
Component: | realtime-kernel | Assignee: | Peter Zijlstra <pzijlstr> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
Severity: | high | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | beta | CC: | bhu, lwang, mschmidt, srostedt, williams | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | All | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | 2.6.24.7-62.el5rt | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2008-06-02 22:33:33 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
IBM Bug Proxy
2008-04-24 12:40:41 UTC
Created attachment 303619 [details]
gdb debug log of the jxeinajar hang
------- Comment From Dinakar.G.com 2008-04-28 14:20 EDT------- There are a couple of places in futex.c where on error we return without doing a put_futex_key (futex_wake_op and futex_requeue). However changing that did not help and now I am not so sure we need to do a put_futex_key there anymore ------- Comment From sripathi.com 2008-04-29 04:28 EDT------- 1) I removed requeue code from the equation by returning ENOSYS from futex_requeue. This did not fix the problem. 2) I put a check for error condition from get_user_pages() in get_futex_key(). I have not hit this check, so get_user_pages seems to be succeeding. ------- Comment From sripathi.com 2008-04-29 09:35 EDT------- I looked at some strace outputs from hanging runs. (The test sometimes doesn't hang while running under strace.) I looked at futex calls made by all threads in the process. Only the main thread (tid 5716) seems to be interesting. Below is a snippet from strace: <snip> 5193 futex(0x82b9720, FUTEX_UNLOCK_PI, 1257074676) = 0 5185 futex(0x805f034, FUTEX_WAIT, 1933, {0, 9948249} <unfinished ...> 5176 futex(0x80f24c8, FUTEX_WAIT, 1, NULL <unfinished ...> <==== Last call by 5176 5193 futex(0x80f24c8, FUTEX_WAKE_OP, 1, 1, 0x80f24c4, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_EQ, 0}) = 0 <=== 5193 is doing WAKE_OP on the same futex. 5185 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 5185 futex(0x805f060, FUTEX_UNLOCK_PI, 1257074676) = 0 <snip> FUTEX_WAKE_OP returns the number of threads actually woken up. All other FUTEX_WAKE_OP calls in my strace outputs have returned 1, whereas this particular one has returned 0. So it looks like the code doesn't handle properly if WAKE_OP returns 0. I will look at both kernel and glibc code to understand what might be going wrong. ------- Comment From sripathi.com 2008-04-30 02:33 EDT------- I wanted to verify whether FUTEX_WAKE_OP is part of the problem. pthread_cond_signal() first calls FUTEX_WAKE_OP and calls FUTEX_WAKE under some conditions. I recompiled 32 bit glibc with a modified pthread_cond_signal(), where instead of calling FUTEX_WAKE_OP, it directly calls FUTEX_WAKE. With this glibc too, I recreated the problem. Snippet of strace that shows the problem: 10398 futex(0x80f1438, FUTEX_WAIT, 81, NULL <unfinished ...> 10390 futex(0x80f24b8, FUTEX_WAIT, 1, NULL <unfinished ...> <== Main thread. Never wakes up after this. 10399 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 10399 futex(0x805f050, FUTEX_UNLOCK_PI, 4160131060) = 0 10399 futex(0x805f024, FUTEX_WAIT, 2211, {0, 9965948} <unfinished ...> 10407 futex(0x80f24b8, FUTEX_WAKE, 1) = 0 <=== Wake called on the futex, but no thread is woken up. 10399 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 10399 futex(0x805f050, FUTEX_UNLOCK_PI, 4160131060) = 0 From the above, it appears that even though main thread was waiting on the futex when the FUTEX_WAKE call was made, it does not wake up. Considering this, I think we need to concentrate on why the main thread doesn't wake up. ------- Comment From sripathi.com 2008-04-30 04:18 EDT------- As suggested by PeterZ, I am going to try this test on a setup that has private futexes, but no fastgup. ------- Comment From sripathi.com 2008-04-30 07:26 EDT------- To get a system with private futexes but no fastgup, I installed a kernel without fastgup on Fedora8. With this, the testcase still hangs. So I can conclude that the fastgup patches by themselves are not the cause of the problem, but they may be contributing to making it occur easily. I then installed this kernel without fastgup on our R2 system and saw that the problem is not seen. R2 system is based on RHEL5.1, so it's glibc does not have private futexes. Summarizing: R2 system (based on RHEL5.1, no support for private futexes in glibc) ===================================================================== 2.6.24.4-46.el5rt Problem seen 2.6.24.4-47 without fastgup Problem not seen Fedora8 system (based on RHEL5.1, private futexes support in glibc) =================================================================== 2.6.24.4-46.el5rt Problem seen 2.6.24.4-47 without fastgup Problem seen Based on the above results, it is not possible to put the entire blade on either private futexes or fastgup. Actually it would point to an upstream problem as the fast_gup() futex patches make the shared futexes have pretty much the same locking as private futexes have upstream. That is, upstream shared futexes take the mmap_sem (and will thus for threaded programs have a big serializing lock) whereas private futexes will not. The whole purpose of the futex fast_gup() work was to take out that mmap_sem usage. I'll review the condition variable code FUTEX_WAIT/FUTEX_WAKE{,_OP} operations on race conditions. ------- Comment From sripathi.com 2008-05-12 04:35 EDT------- I have a good idea about this problem now. I used strace and some printks in futex_wait/wake to understand what is going on. To keep FUTEX_WAKE_OP out of the picture, I am running a modified glibc in which pthread_cond_signal skips calling FUTEX_WAKE_OP and defaults to calling FUTEX_WAKE every time. When the hang occurs, I see the following threads: tid pid ppid 5093 5093 5092 /home/rtuser/linux-rt-tests/internal/java/jvm/ibm-ws-rt-i386-1.0-3.0/bin/jxeinajar -J-Xr 5096 5093 5092 /home/rtuser/linux-rt-tests/internal/java/jvm/ibm-ws-rt-i386-1.0-3.0/bin/jxeinajar -J-Xr 5101 5093 5092 /home/rtuser/linux-rt-tests/internal/java/jvm/ibm-ws-rt-i386-1.0-3.0/bin/jxeinajar -J-Xr 5102 5093 5092 /home/rtuser/linux-rt-tests/internal/java/jvm/ibm-ws-rt-i386-1.0-3.0/bin/jxeinajar -J-Xr 5103 5093 5092 /home/rtuser/linux-rt-tests/internal/java/jvm/ibm-ws-rt-i386-1.0-3.0/bin/jxeinajar -J-Xr 5104 5093 5092 /home/rtuser/linux-rt-tests/internal/java/jvm/ibm-ws-rt-i386-1.0-3.0/bin/jxeinajar -J-Xr 5105 5093 5092 /home/rtuser/linux-rt-tests/internal/java/jvm/ibm-ws-rt-i386-1.0-3.0/bin/jxeinajar -J-Xr 5106 5093 5092 /home/rtuser/linux-rt-tests/internal/java/jvm/ibm-ws-rt-i386-1.0-3.0/bin/jxeinajar -J-Xr Snippets of strace output: 5093 <... futex resumed> ) = 0 5109 futex(0xf229dd80, FUTEX_WAKE, 1) = 0 5093 futex(0x80f24d0, FUTEX_WAIT, 1, NULL <unfinished ...> <== Last FUTEX_WAIT by 5093 5109 futex(0x80f24d0, FUTEX_WAKE, 1) = 0 <== Wakeup on that futex by 5109 5102 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 5102 futex(0x805f068, FUTEX_UNLOCK_PI, 4159623156) = 0 5102 futex(0x805f03c, FUTEX_WAIT, 271, {0, 9959177}) = -1 ETIMEDOUT (Connection timed out) 5102 futex(0x805f068, FUTEX_UNLOCK_PI, 4159623156) = 0 5093 is the main thread which is sleeping endlessly in the kernel, having done FUTEX_WAIT. Later, thread 5109 does FUTEX_WAKE on that very futex (0x80f24d0). However, 5093 does not wake up. This very likely leads to the hang that we are seeing. Now lets look at my printks from futex_wait and futex_wake in the kernel. I have these printks in futex_wait() and futex_wake(), after get_futex_key() is called. wait: pid = 5093, uaddr = 00000000080f24d0, key.shared.page = ffffe200077724c0, key.both.offset = 1233 wake: pid = 5109, uaddr = 00000000080f24d0, key.shared.page = ffffe200077981a0, key.both.offset = 1233 wake: pid 5109 did not wake up anyone on uaddr 00000000080f24d0 In line 1, we see 5093 getting into futex_wait. In line 2, we see 5109 going into wake, but line 3 says it did not wake up anyone. I have printed shared.page and both.offset values from the futex_key used. Though the user address is same, we see that the shared.page values are different. Because of this anamoly, 5109 does not find a matching futex_q (match_futex) when it searches through the list. Hence it won't wake up 5093. I have verified a number of times that whenever a hang occurs, the above condition is true. The 'page' address being different in the two calls to get_user_pages/fast_gup seems to be the root cause of the problem. Peter, any ideas? ------- Comment From dvhltc.com 2008-05-12 16:59 EDT------- Sripathi, Can you try this instrumentation on a kernel without fast_gup and a glibc with private futexes? Peter would like to know if the symptom is the same (different shared page value). If so, then it could be that removing the mmap sem was serializing some race on this value. I might have an idea - but there have been stability issues wrt the fast_gup stuff itself so the whole queue will be dropped for now - will work on getting it back in and also address the futex issue. fast_gup dropped for now; severity droppped to high Any GUP related patches will be applied to a new source base and will not apply to the GA kernel, so will not be relevant. closing ------- Comment From sripathi.com 2008-06-03 01:57 EDT------- Closing on our side as well. Created attachment 322831 [details]
Tapset for futex_wait/futex_wake
|