Bug 750419
Summary: | Hang/livelock in pthread_create | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Todd Lipcon <todd> | ||||||
Component: | kernel | Assignee: | Aristeu Rozanski <arozansk> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 6.0 | CC: | eli, mfranc | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2012-02-09 20:52:01 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Todd Lipcon
2011-11-01 03:23:45 UTC
Another odd datapoint is that in gdb, the 'stepi' command just hangs. If I then hit ^C or send SIGCONT, it breaks back into gdb at the exact some instruction. What are the register contents and the memory map? Can you let me know which gdb commands to run for you to get this info? I haven't done much assembly-level debugging in gdb. For the memory map, you just want /proc/<pid>/maps? info registers info proc mappings (gdb) bt #0 0x000000341a606ea0 in pthread_create@@GLIBC_2.2.5 () from /lib64/libpthread.so.0 #1 0x00007f5fbb9d1759 in os::create_thread(Thread*, os::ThreadType, unsigned long) () from /usr/java/jdk1.6.0_21/jre/lib/amd64/server/libjvm.so #2 0x00007f5fbbacce0e in JavaThread::JavaThread(void (*)(JavaThread*, Thread*), unsigned long) () from /usr/java/jdk1.6.0_21/jre/lib/amd64/server/libjvm.so #3 0x00007f5fbbad09eb in CompilerThread::CompilerThread(CompileQueue*, CompilerCounters*) () from /usr/java/jdk1.6.0_21/jre/lib/amd64/server/libjvm.so #4 0x00007f5fbb69868f in CompileBroker::make_compiler_thread(char const*, CompileQueue*, CompilerCounters*, Thread*) () from /usr/java/jdk1.6.0_21/jre/lib/amd64/server/libjvm.so #5 0x00007f5fbb698889 in CompileBroker::init_compiler_threads(int) () from /usr/java/jdk1.6.0_21/jre/lib/amd64/server/libjvm.so #6 0x00007f5fbb69806e in CompileBroker::compilation_init() () from /usr/java/jdk1.6.0_21/jre/lib/amd64/server/libjvm.so #7 0x00007f5fbbad15c4 in Threads::create_vm(JavaVMInitArgs*, bool*) () from /usr/java/jdk1.6.0_21/jre/lib/amd64/server/libjvm.so #8 0x00007f5fbb7fefd0 in JNI_CreateJavaVM () from /usr/java/jdk1.6.0_21/jre/lib/amd64/server/libjvm.so #9 0x00000000400035f8 in InitializeJVM () #10 0x000000004000206e in JavaMain () #11 0x000000341a6077e1 in start_thread () from /lib64/libpthread.so.0 #12 0x000000341a2e18ed in clone () from /lib64/libc.so.6 (gdb) info registers rax 0x0 0 rbx 0x7f5fb8700700 140049092970240 rcx 0x341a2de3e7 223777514471 rdx 0x0 0 rsi 0x1000 4096 rdi 0x7f5fb8600000 140049091919872 rbp 0x7f5fbb3dd920 0x7f5fbb3dd920 rsp 0x7f5fbb3dd880 0x7f5fbb3dd880 r8 0x0 0 r9 0x0 0 r10 0x1000 4096 r11 0x246 582 r12 0x100000 1048576 r13 0x7f5fb87009c0 140049092970944 r14 0x4 4 r15 0x7 7 rip 0x341a606ea0 0x341a606ea0 <pthread_create@@GLIBC_2.2.5+1680> eflags 0x10246 [ PF ZF IF RF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 (gdb) info proc mappings process 26942 cmdline = '/usr/java/jdk1.6.0_21/jre/bin/java' cwd = '/data/4/todd/scratch/tt/local/taskTracker/todd/jobcache/job_201111041159_0002/attempt_201111041159_0002_m_006103_0/work' exe = '/usr/java/jdk1.6.0_21/jre/bin/java' Mapped address spaces: Start Addr End Addr Size Offset objfile 0x40000000 0x40009000 0x9000 0 /usr/java/jdk1.6.0_21/jre/bin/java 0x40108000 0x4010a000 0x2000 0x8000 /usr/java/jdk1.6.0_21/jre/bin/java 0x40223000 0x40244000 0x21000 0 [heap] 0x3419a00000 0x3419a1e000 0x1e000 0 /lib64/ld-2.12.so 0x3419c1e000 0x3419c1f000 0x1000 0x1e000 /lib64/ld-2.12.so 0x3419c1f000 0x3419c20000 0x1000 0x1f000 /lib64/ld-2.12.so 0x3419c20000 0x3419c21000 0x1000 0 0x3419e00000 0x3419e02000 0x2000 0 /lib64/libdl-2.12.so 0x3419e02000 0x341a002000 0x200000 0x2000 /lib64/libdl-2.12.so 0x341a002000 0x341a003000 0x1000 0x2000 /lib64/libdl-2.12.so 0x341a003000 0x341a004000 0x1000 0x3000 /lib64/libdl-2.12.so 0x341a200000 0x341a375000 0x175000 0 /lib64/libc-2.12.so 0x341a375000 0x341a575000 0x200000 0x175000 /lib64/libc-2.12.so 0x341a575000 0x341a579000 0x4000 0x175000 /lib64/libc-2.12.so 0x341a579000 0x341a57a000 0x1000 0x179000 /lib64/libc-2.12.so 0x341a57a000 0x341a57f000 0x5000 0 0x341a600000 0x341a617000 0x17000 0 /lib64/libpthread-2.12.so 0x341a617000 0x341a817000 0x200000 0x17000 /lib64/libpthread-2.12.so 0x341a817000 0x341a818000 0x1000 0x17000 /lib64/libpthread-2.12.so 0x341a818000 0x341a819000 0x1000 0x18000 /lib64/libpthread-2.12.so 0x341a819000 0x341a81d000 0x4000 0 0x341aa00000 0x341aa83000 0x83000 0 /lib64/libm-2.12.so 0x341aa83000 0x341ac82000 0x1ff000 0x83000 /lib64/libm-2.12.so 0x341ac82000 0x341ac83000 0x1000 0x82000 /lib64/libm-2.12.so 0x341ac83000 0x341ac84000 0x1000 0x83000 /lib64/libm-2.12.so 0x341b200000 0x341b216000 0x16000 0 /lib64/libnsl-2.12.so 0x341b216000 0x341b415000 0x1ff000 0x16000 /lib64/libnsl-2.12.so 0x341b415000 0x341b416000 0x1000 0x15000 /lib64/libnsl-2.12.so 0x341b416000 0x341b417000 0x1000 0x16000 /lib64/libnsl-2.12.so 0x341b417000 0x341b419000 0x2000 0 0x37f5c00000 0x37f5c07000 0x7000 0 /lib64/librt-2.12.so 0x37f5c07000 0x37f5e06000 0x1ff000 0x7000 /lib64/librt-2.12.so 0x37f5e06000 0x37f5e07000 0x1000 0x6000 /lib64/librt-2.12.so 0x37f5e07000 0x37f5e08000 0x1000 0x7000 /lib64/librt-2.12.so 0x7f5f4a16f000 0x7f5f50000000 0x5e91000 0 /usr/lib/locale/locale-archive 0x7f5f50000000 0x7f5f50021000 0x21000 0 0x7f5f50021000 0x7f5f54000000 0x3fdf000 0 0x7f5f54000000 0x7f5f54021000 0x21000 0 0x7f5f54021000 0x7f5f58000000 0x3fdf000 0 0x7f5f58000000 0x7f5f58021000 0x21000 0 0x7f5f58021000 0x7f5f5c000000 0x3fdf000 0 0x7f5f5c000000 0x7f5f5c021000 0x21000 0 0x7f5f5c021000 0x7f5f60000000 0x3fdf000 0 0x7f5f60000000 0x7f5f60021000 0x21000 0 0x7f5f60021000 0x7f5f64000000 0x3fdf000 0 0x7f5f64000000 0x7f5f64021000 0x21000 0 0x7f5f64021000 0x7f5f68000000 0x3fdf000 0 0x7f5f68000000 0x7f5f68021000 0x21000 0 0x7f5f68021000 0x7f5f6c000000 0x3fdf000 0 ---Type <return> to continue, or q <return> to quit--- 0x7f5f6d400000 0x7f5f6e8c0000 0x14c0000 0 0x7f5f6e8c0000 0x7f5f72800000 0x3f40000 0 0x7f5f72800000 0x7f5f91ec0000 0x1f6c0000 0 0x7f5f91ec0000 0x7f5f9c2b0000 0xa3f0000 0 0x7f5f9c2b0000 0x7f5fabe10000 0xfb60000 0 0x7f5fabe10000 0x7f5fb1000000 0x51f0000 0 0x7f5fb1000000 0x7f5fb1270000 0x270000 0 0x7f5fb1270000 0x7f5fb40a2000 0x2e32000 0 0x7f5fb40a2000 0x7f5fb8000000 0x3f5e000 0 0x7f5fb8600000 0x7f5fb8601000 0x1000 0 0x7f5fb8601000 0x7f5fb8701000 0x100000 0 0x7f5fb8701000 0x7f5fb8704000 0x3000 0 0x7f5fb8704000 0x7f5fb8802000 0xfe000 0 0x7f5fb8802000 0x7f5fb8805000 0x3000 0 0x7f5fb8805000 0x7f5fb8903000 0xfe000 0 0x7f5fb8903000 0x7f5fb8906000 0x3000 0 0x7f5fb8906000 0x7f5fb8a04000 0xfe000 0 0x7f5fb8a04000 0x7f5fb8a05000 0x1000 0 0x7f5fb8a05000 0x7f5fb9947000 0xf42000 0 0x7f5fb9947000 0x7f5fb9ade000 0x197000 0x3014000 /usr/java/jdk1.6.0_21/jre/lib/rt.jar 0x7f5fb9ade000 0x7f5fb9b06000 0x28000 0 0x7f5fb9b06000 0x7f5fb9b07000 0x1000 0 0x7f5fb9b07000 0x7f5fb9c07000 0x100000 0 0x7f5fb9c07000 0x7f5fb9c08000 0x1000 0 0x7f5fb9c08000 0x7f5fb9d08000 0x100000 0 0x7f5fb9d08000 0x7f5fb9d09000 0x1000 0 0x7f5fb9d09000 0x7f5fb9e09000 0x100000 0 0x7f5fb9e09000 0x7f5fb9e0a000 0x1000 0 0x7f5fb9e0a000 0x7f5fb9f0a000 0x100000 0 0x7f5fb9f0a000 0x7f5fb9f0b000 0x1000 0 0x7f5fb9f0b000 0x7f5fba00b000 0x100000 0 0x7f5fba00b000 0x7f5fba00c000 0x1000 0 0x7f5fba00c000 0x7f5fba10c000 0x100000 0 0x7f5fba10c000 0x7f5fba10d000 0x1000 0 0x7f5fba10d000 0x7f5fba20d000 0x100000 0 0x7f5fba20d000 0x7f5fba20e000 0x1000 0 0x7f5fba20e000 0x7f5fba30e000 0x100000 0 0x7f5fba30e000 0x7f5fba30f000 0x1000 0 0x7f5fba30f000 0x7f5fba40f000 0x100000 0 0x7f5fba40f000 0x7f5fba410000 0x1000 0 0x7f5fba410000 0x7f5fba510000 0x100000 0 0x7f5fba510000 0x7f5fba511000 0x1000 0 0x7f5fba511000 0x7f5fba611000 0x100000 0 0x7f5fba611000 0x7f5fba612000 0x1000 0 0x7f5fba612000 0x7f5fba712000 0x100000 0 0x7f5fba712000 0x7f5fba713000 0x1000 0 0x7f5fba713000 0x7f5fba81e000 0x10b000 0 0x7f5fba81e000 0x7f5fba83d000 0x1f000 0 0x7f5fba83d000 0x7f5fba939000 0xfc000 0 0x7f5fba939000 0x7f5fba98b000 0x52000 0 0x7f5fba98b000 0x7f5fba996000 0xb000 0 0x7f5fba996000 0x7f5fba9b5000 0x1f000 0 0x7f5fba9b5000 0x7f5fbaab1000 0xfc000 0 0x7f5fbaab1000 0x7f5fbab02000 0x51000 0 0x7f5fbab02000 0x7f5fbab81000 0x7f000 0 0x7f5fbab81000 0x7f5fbaba9000 0x28000 0 ---Type <return> to continue, or q <return> to quit--- 0x7f5fbaba9000 0x7f5fbabb4000 0xb000 0 0x7f5fbabb4000 0x7f5fbac6a000 0xb6000 0 0x7f5fbac6a000 0x7f5fbac78000 0xe000 0 /usr/java/jdk1.6.0_21/jre/lib/amd64/libzip.so 0x7f5fbac78000 0x7f5fbad7a000 0x102000 0xe000 /usr/java/jdk1.6.0_21/jre/lib/amd64/libzip.so 0x7f5fbad7a000 0x7f5fbad7d000 0x3000 0x10000 /usr/java/jdk1.6.0_21/jre/lib/amd64/libzip.so 0x7f5fbad7d000 0x7f5fbad7e000 0x1000 0 0x7f5fbad7e000 0x7f5fbad8a000 0xc000 0 /lib64/libnss_files-2.12.so 0x7f5fbad8a000 0x7f5fbaf89000 0x1ff000 0xc000 /lib64/libnss_files-2.12.so 0x7f5fbaf89000 0x7f5fbaf8a000 0x1000 0xb000 /lib64/libnss_files-2.12.so 0x7f5fbaf8a000 0x7f5fbaf8b000 0x1000 0xc000 /lib64/libnss_files-2.12.so 0x7f5fbaf95000 0x7f5fbaf9c000 0x7000 0 /usr/java/jdk1.6.0_21/jre/lib/amd64/native_threads/libhpi.so 0x7f5fbaf9c000 0x7f5fbb09d000 0x101000 0x7000 /usr/java/jdk1.6.0_21/jre/lib/amd64/native_threads/libhpi.so 0x7f5fbb09d000 0x7f5fbb09f000 0x2000 0x8000 /usr/java/jdk1.6.0_21/jre/lib/amd64/native_threads/libhpi.so 0x7f5fbb09f000 0x7f5fbb0a0000 0x1000 0 0x7f5fbb0a0000 0x7f5fbb0c8000 0x28000 0 /usr/java/jdk1.6.0_21/jre/lib/amd64/libjava.so 0x7f5fbb0c8000 0x7f5fbb1c8000 0x100000 0x28000 /usr/java/jdk1.6.0_21/jre/lib/amd64/libjava.so 0x7f5fbb1c8000 0x7f5fbb1cf000 0x7000 0x28000 /usr/java/jdk1.6.0_21/jre/lib/amd64/libjava.so 0x7f5fbb1cf000 0x7f5fbb1dc000 0xd000 0 /usr/java/jdk1.6.0_21/jre/lib/amd64/libverify.so 0x7f5fbb1dc000 0x7f5fbb2db000 0xff000 0xd000 /usr/java/jdk1.6.0_21/jre/lib/amd64/libverify.so 0x7f5fbb2db000 0x7f5fbb2de000 0x3000 0xc000 /usr/java/jdk1.6.0_21/jre/lib/amd64/libverify.so 0x7f5fbb2de000 0x7f5fbb2e1000 0x3000 0 0x7f5fbb2e1000 0x7f5fbb3df000 0xfe000 0 0x7f5fbb3df000 0x7f5fbbbb9000 0x7da000 0 /usr/java/jdk1.6.0_21/jre/lib/amd64/server/libjvm.so 0x7f5fbbbb9000 0x7f5fbbcbb000 0x102000 0x7da000 /usr/java/jdk1.6.0_21/jre/lib/amd64/server/libjvm.so 0x7f5fbbcbb000 0x7f5fbbe4c000 0x191000 0x7dc000 /usr/java/jdk1.6.0_21/jre/lib/amd64/server/libjvm.so 0x7f5fbbe4c000 0x7f5fbbe8a000 0x3e000 0 0x7f5fbbe8a000 0x7f5fbbe91000 0x7000 0 /usr/java/jdk1.6.0_21/jre/lib/amd64/jli/libjli.so 0x7f5fbbe91000 0x7f5fbbf92000 0x101000 0x7000 /usr/java/jdk1.6.0_21/jre/lib/amd64/jli/libjli.so 0x7f5fbbf92000 0x7f5fbbf94000 0x2000 0x8000 /usr/java/jdk1.6.0_21/jre/lib/amd64/jli/libjli.so 0x7f5fbbf94000 0x7f5fbbf95000 0x1000 0 0x7f5fbbf95000 0x7f5fbbf9d000 0x8000 0 /tmp/hsperfdata_todd/26939 0x7f5fbbf9d000 0x7f5fbbf9e000 0x1000 0 0x7f5fbbf9e000 0x7f5fbbf9f000 0x1000 0 0x7f5fbbf9f000 0x7f5fbbfa0000 0x1000 0 0x7fff90992000 0x7fff909a8000 0x16000 0 [stack] 0x7fff909ff000 0x7fff90a00000 0x1000 0 [vdso] 0xffffffffff600000 0xffffffffff601000 0x1000 0 [vsyscall] strace? Looking for an strace after the process is hung, or before it hangs? Given the low percentage of startups during which it hangs, it would be very difficult for me to get the latter, but I can get the former for you if it's useful. Anything is useful. Since RHEL 6.2 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. After it's hung, strace on the hung thread doesn't give anything: [todd@p0117 ~]$ strace -p 26942 Process 26942 attached - interrupt to quit <hangs> Let me see if I can write a simpler reproducer. We are also going to try to reproduce this on a different JVM version to see if it has something to do with the specific combo of Java 1.6.0u21 with RHEL 6.0. Created attachment 533599 [details]
strace log of hung process
Here's strace output from a hung jvm launch. This time was with 64-bit JDK 6u23. The spinning thread was pid 18207
Created attachment 533602 [details]
another strace
Another strace from a different node just in case it's helpful: 6661 is the hung process here.
Is there any way you can get us a method for reproducing this failure? I've got several pthread/mutex related fixes that are queued up for RHEL. It'd be good to be able to test them against this problem. Unfortunately not - I tried to write a standalone reproducer but couldn't get one to work. We can reproduce it easily, though, so if you have a candidate glibc RPM we can probably deploy it and give it a shot on a large QA cluster. Todd, I just realized you reported this against a Red Hat Enterprise Linux 6.0 glibc. It's probably worth testing if some of the fixes, particularly those which went into Red Hat Enterprise Linux 6.2 help. It's also possible this is a kernel issue of some kind. Hard to really know right now. We'd kind of be just shooting in the dark hoping to find something. The backtrace & straces don't really show any significant commonality. I'm assuming you've got the right spinning PID, though the straces would tend to indicate the PID should be sleeping in the kernel, not spinning. The backtrace doesn't make a lot of sense for a process that is spinning; the backtrace looks more like the process is sitting in the kernel. I guess perhaps it could be spinning in the kernel. I'll note that we don't necessarily need a standalone reproducer, though it obviously helps. Even a reproducer on top of the JVM would be a step forward. Even if it just triggers 1 in 100 times, I can reserve a machine in our farm and just have it keep restarting the jvm until it hangs. Definitely got the right pid - I saw this on a bunch of machines, so unless I'm systematically typoing, it's right ;-) I'm trying to put together a reproducer tarball which would include Hadoop but could run on one machine. But before I send it I want to make sure I can still repro on one of our boxes here that originally exhibited the problem. We've since upgraded to 2.6.32-71.14.1.el6.x86_64 so if it's a kernel bug it may be that we can't reproduce it anymore. Sounds good. Anxiously waiting... Well, in the process of putting together the repro for you, I actually determined this is in fact a kernel bug. After running the repro overnight on 2.6.32-71.el6 I had about 8 or so processes in the hung spinning state. When I tried again on 2.6.32-71.14.1.el6, the bug didn't reproduce. So, I suppose this can be resolved as fixed somewhere between those two kernel versions. Let me know if you need anything else from me, thanks for the help. (I'll also leave the repro job running on the new kernel for a few more days just to be entirely sure) I did a quick scan over the kernel changes in that range to see if any stood out as obvious candidates. In 2.6.32-71.11.1.el6 there's a rework posix-cpu-timers related to mt exec. The rest of the changes don't appear to be likely candidates. I'm going to go ahead and close, if your reproducer trips, we can reopen and dive further in. I'm going to use ERRATA as the resolution since it appears this was fixed in a kernel errata. Thanks for your time, Jeff Yep, I thought the same thing when I skimmed the changelog yesterday. Will keep you posted, thanks! Several days later, still no hung tasks, the kernel upgrade definitely fixed it. |