| Summary: | JVM exits and ESA tool stops abnormally while accessing ESA web UI for 2-5 minutes (using IBM Java 7 SR5) (glibc) | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | IBM Bug Proxy <bugproxy> | ||||
| Component: | glibc | Assignee: | Carlos O'Donell <codonell> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | qe-baseos-tools-bugs | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 6.5 | CC: | ashankar, fweimer, hannsj_uhl, pfrankli, spoyarek | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | ppc64 | ||||||
| OS: | All | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-11-20 09:02:55 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
IBM Bug Proxy
2013-11-20 08:53:05 UTC
---Problem Description--- Electronic Service Agent (ESA) is a tool and shipped as part of IBMIT. It is a web application that executes under the Java 7. The JVM crashes and generates a core while accessing ESA web UI for 2-5 minutes. Issue doesn't exist on RHEL 6.3 and 6.4 and occurs consistently in RHEL 6.5. ---uname output--- Linux kofi07.austin.ibm.com 2.6.32-421.el6.ppc64 #1 SMP Mon Sep 30 12:06:35 EDT 2013 ppc64 ppc64 ppc64 GNU/Linux To analyze the core, I installed the corresponding glibc debuginfo packages and the particular frame in question shows: (gdb) frame 13 #13 __nptl_deallocate_tsd () at pthread_create.c:154 154 __pthread_keys[idx].destr (data); which is invoking a user provided destructor since the value was not NULL. A call to pthread_key_create() would provide the address of the destructor function to invoke. (gdb) info registers r0 0x1 1 r1 0xfffa4afe6e0 17590654068448 r2 0x8075b890f0 551730843888 r3 0xfff3042f4d0 17588700771536 r4 0x0 0 r5 0x0 0 r6 0xfffa4afea70 17590654069360 r7 0xfffa4afb638 17590654055992 r8 0xfffa4afe880 17590654068864 r9 0x8075e6a050 551733862480 r10 0xfff30000030 17588696383536 r11 0x1 1 r12 0x44404424 1145062436 r13 0xfffa4b06910 17590654101776 r14 0x5c36d8 6043352 r15 0x3d5400 4019200 r16 0xfffa4ac0000 17590653812736 r17 0x4 4 r18 0x7 7 r19 0xfff6868f110 17589642785040 r20 0xfff6868ec90 17589642783888 r21 0xfffa7aef7b0 17590704338864 r22 0xfffa4aff910 17590654073104 r23 0x40000 262144 r24 0x8075b85bb0 551730830256 r25 0x0 0 r26 0x0 0 r27 0x0 0 r28 0x0 0 r29 0xfffa4aff310 17590654071568 ---Type <return> to continue, or q <return> to quit--- r30 0x8075b81be0 551730813920 r31 0x50 80 pc 0x8075b5a660 0x8075b5a660 <__nptl_deallocate_tsd+192> msr 0x800000000000d032 9223372036854829106 cr 0x0 0 lr 0x8075b5a680 0x8075b5a680 <__nptl_deallocate_tsd+224> ctr 0x80759e6250 551729128016 xer 0x0 0 orig_r3 0xfffa002cfd0 17590575615952 trap 0x300 768 (gdb) x/20i 0x8075b5a650 0x8075b5a650 <__nptl_deallocate_tsd+176>: bne cr7,0x8075b5a620 <__nptl_deallocate_tsd+128> 0x8075b5a654 <__nptl_deallocate_tsd+180>: ld r9,8(r30) 0x8075b5a658 <__nptl_deallocate_tsd+184>: cmpdi cr7,r9,0 0x8075b5a65c <__nptl_deallocate_tsd+188>: beq cr7,0x8075b5a620 <__nptl_deallocate_tsd+128> => 0x8075b5a660 <__nptl_deallocate_tsd+192>: ld r0,0(r9) 0x8075b5a664 <__nptl_deallocate_tsd+196>: std r2,40(r1) 0x8075b5a668 <__nptl_deallocate_tsd+200>: addi r31,r31,16 0x8075b5a66c <__nptl_deallocate_tsd+204>: addi r30,r30,16 0x8075b5a670 <__nptl_deallocate_tsd+208>: mtctr r0 0x8075b5a674 <__nptl_deallocate_tsd+212>: ld r11,16(r9) 0x8075b5a678 <__nptl_deallocate_tsd+216>: ld r2,8(r9) 0x8075b5a67c <__nptl_deallocate_tsd+220>: bctrl 0x8075b5a680 <__nptl_deallocate_tsd+224>: ld r2,40(r1) 0x8075b5a684 <__nptl_deallocate_tsd+228>: cmpdi cr7,r31,512 0x8075b5a688 <__nptl_deallocate_tsd+232>: bne cr7,0x8075b5a630 <__nptl_deallocate_tsd+144> 0x8075b5a68c <__nptl_deallocate_tsd+236>: nop 0x8075b5a690 <__nptl_deallocate_tsd+240>: mr r11,r13 0x8075b5a694 <__nptl_deallocate_tsd+244>: addi r9,r13,-30480 0x8075b5a698 <__nptl_deallocate_tsd+248>: addi r27,r27,1 0x8075b5a69c <__nptl_deallocate_tsd+252>: cmpdi cr7,r27,32 (gdb) x/20x 0x8075e6a050 0x8075e6a050: Cannot access memory at address 0x8075e6a050 The address is not NULL but neither is it valid. Thanks. New core is identical failure to the previous one. Same 0x8075e6a050 address as before. I am going to try and recreate this while setting a breakpoint on pthread_key_create and see if anyone registers this particular destructor address. Basically, the theory I have is that some part of the code loads a library, much like a plugin, which registers two destructors and then the library is unloaded leaving the destructor addresses in the _pthread_keys table. Therefore, when finally invoke __nptl_deallocate_tsd() it crashes on the user supplied destructor since the address space for the library it belonged has been unloaded. The only thing I am trying to find out now is who is registering those destructors. With Sowjanya's help and the breakpoint set on pthread_key_create, I was able to catch who was registering the destructor and it appears to be glib (which is totally unrelated to glibc). It happens as a consequence of this PAM module #6 0x00000fff0af52de4 in .pam_sm_authenticate () from /lib64/security/pam_fprintd.so which apparently relies on the glib library. It seems once it gets to g_thread_init_glib() code it uses a glib function to store away a couple of private keys. The gthread implementation calls pthread_key_create() to create the key and provide the destructor. (gdb) bt #0 __pthread_key_create (key=0xffea0255b90, destr=@0x803428a050: 0x80341d2a50) at pthread_key_create.c:33 #1 0x00000fff0ade3758 in ?? () from /lib64/libgthread-2.0.so.0 #2 0x00000080341d2964 in .g_thread_init_glib () from /lib64/libglib-2.0.so.0 #3 0x00000fff0ade3f2c in .g_thread_init () from /lib64/libgthread-2.0.so.0 #4 0x00000fff0ae46ff4 in .g_type_init_with_debug_flags () from /lib64/libgobject-2.0.so.0 #5 0x00000fff0ae47134 in .g_type_init () from /lib64/libgobject-2.0.so.0 #6 0x00000fff0af52de4 in .pam_sm_authenticate () from /lib64/security/pam_fprintd.so #7 0x00000fff7e9e5b24 in ?? () from /lib64/libpam.so.0 #8 0x00000fff7e9e5048 in .pam_authenticate () from /lib64/libpam.so.0 #9 0x00000fff70751ac8 in .simple_pam () from /opt/ibm/esa/lwi/runtime/core/../../native/liblwisecurity.Linux.ppc64.so #10 0x00000fff707526ec in .Java_com_ibm_lwi_security_nativeproviders_LwiNativeProviderImpl_checkUserPAM () from /opt/ibm/esa/lwi/runtime/core/../../native/liblwisecurity.Linux.ppc64.so #11 0x00000fff7ddaa18c in L201 () from /opt/ibm/java-ppc64-70/jre/lib/ppc64/compressedrefs/libj9vm26.so #12 0x00000fff7c651010 in .JVM_InvokeMethod () from /opt/ibm/java-ppc64-70/jre/lib/ppc64/compressedrefs/libjclse7b_26.so #13 0x00000fff7de9f068 in .JVM_InvokeMethod () from /opt/ibm/java-ppc64-70/jre/lib/ppc64/compressedrefs/libjvm.so #14 0x00000fff7e8ebd38 in .JVM_InvokeMethod () from /opt/ibm/java-ppc64-70/jre/lib/ppc64/j9vm/libjvm.so #15 0x00000fff7087ea7c in .Java_sun_reflect_NativeMethodAccessorImpl_invoke0 () from /opt/ibm/java-ppc64-70/jre/lib/ppc64/libjava.so #16 0x00000fff70ea20e0 in ?? () #17 0x00000fff7ddf32e0 in .javaProtectedThreadProc () from /opt/ibm/java-ppc64-70/jre/lib/ppc64/compressedrefs/libj9vm26.so #18 0x00000fff7dc8ddfc in .j9sig_protect () from /opt/ibm/java-ppc64-70/jre/lib/ppc64/compressedrefs/libj9prt26.so #19 0x00000fff7ddf3158 in .javaThreadProc () ---Type <return> to continue, or q <return> to quit---q from /opt/ibm/java-ppc64-70/jre/lib/ppc64/compressedrefs/libj9Quit However, I believe that once the pam_fprintd PAM module finishes, it gets unloaded and so does glib that it relies on but the destrutctor addresses remain in the __pthread_keys table and nothing cleans it up. I actually don't see where in glibc one calls to remove a key destructor. It almost seems like it is expect to be valid through the life of the process until the last thread exits. I have suggest the submitter remove the pam_fprintd PAM module with the following command authconfig --disablefingerprint --update which will remove the entry in /etc/pam.d/system-auth and hopefully avoid this PAM module getting loaded during the authentication chain invocations. Sending this bug on to Red Hat to see if they have seen something like this before and have an insight on how this should be fixed. Created attachment 826469 [details]
skip empty slot
... due to mirroring problems at the IBM side closing this Red Hat bugzilla .. sorry for the noise ... |