Bug 1032447 - JVM exits and ESA tool stops abnormally while accessing ESA web UI for 2-5 minutes (using IBM Java 7 SR5) (glibc)
Summary: JVM exits and ESA tool stops abnormally while accessing ESA web UI for 2-5 m...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: glibc
Version: 6.5
Hardware: ppc64
OS: All
unspecified
medium
Target Milestone: rc
: ---
Assignee: Carlos O'Donell
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-20 08:53 UTC by IBM Bug Proxy
Modified: 2016-11-24 15:41 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-11-20 09:02:55 UTC
Target Upstream Version:


Attachments (Terms of Use)
skip empty slot (1.02 KB, text/plain)
2013-11-20 08:53 UTC, IBM Bug Proxy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 98796 0 None None None Never

Description IBM Bug Proxy 2013-11-20 08:53:05 UTC

Comment 1 IBM Bug Proxy 2013-11-20 08:53:11 UTC
---Problem Description---
Electronic Service Agent (ESA) is a tool and shipped as part of IBMIT. It is a web application that executes under the Java 7. The JVM crashes and generates a core while accessing ESA web UI for 2-5 minutes. Issue doesn't exist on RHEL 6.3 and 6.4 and occurs consistently in RHEL 6.5. 
 
---uname output---
Linux kofi07.austin.ibm.com 2.6.32-421.el6.ppc64 #1 SMP Mon Sep 30 12:06:35 EDT 2013 ppc64 ppc64 ppc64 GNU/Linux
 
To analyze the core, I installed the corresponding glibc debuginfo packages and the particular frame in question shows:

(gdb) frame 13
#13 __nptl_deallocate_tsd () at pthread_create.c:154
154				    __pthread_keys[idx].destr (data);

which is invoking a user provided destructor since the value was not NULL. A call to pthread_key_create() would provide the address of the destructor function to invoke.

(gdb) info registers
r0             0x1	1
r1             0xfffa4afe6e0	17590654068448
r2             0x8075b890f0	551730843888
r3             0xfff3042f4d0	17588700771536
r4             0x0	0
r5             0x0	0
r6             0xfffa4afea70	17590654069360
r7             0xfffa4afb638	17590654055992
r8             0xfffa4afe880	17590654068864
r9             0x8075e6a050	551733862480
r10            0xfff30000030	17588696383536
r11            0x1	1
r12            0x44404424	1145062436
r13            0xfffa4b06910	17590654101776
r14            0x5c36d8	6043352
r15            0x3d5400	4019200
r16            0xfffa4ac0000	17590653812736
r17            0x4	4
r18            0x7	7
r19            0xfff6868f110	17589642785040
r20            0xfff6868ec90	17589642783888
r21            0xfffa7aef7b0	17590704338864
r22            0xfffa4aff910	17590654073104
r23            0x40000	262144
r24            0x8075b85bb0	551730830256
r25            0x0	0
r26            0x0	0
r27            0x0	0
r28            0x0	0
r29            0xfffa4aff310	17590654071568
---Type <return> to continue, or q <return> to quit--- 
r30            0x8075b81be0	551730813920
r31            0x50	80
pc             0x8075b5a660	0x8075b5a660 <__nptl_deallocate_tsd+192>
msr            0x800000000000d032	9223372036854829106
cr             0x0	0
lr             0x8075b5a680	0x8075b5a680 <__nptl_deallocate_tsd+224>
ctr            0x80759e6250	551729128016
xer            0x0	0
orig_r3        0xfffa002cfd0	17590575615952
trap           0x300	768
(gdb) x/20i  0x8075b5a650
   0x8075b5a650 <__nptl_deallocate_tsd+176>:	bne     cr7,0x8075b5a620 <__nptl_deallocate_tsd+128>
   0x8075b5a654 <__nptl_deallocate_tsd+180>:	ld      r9,8(r30)
   0x8075b5a658 <__nptl_deallocate_tsd+184>:	cmpdi   cr7,r9,0
   0x8075b5a65c <__nptl_deallocate_tsd+188>:	beq     cr7,0x8075b5a620 <__nptl_deallocate_tsd+128>
=> 0x8075b5a660 <__nptl_deallocate_tsd+192>:	ld      r0,0(r9)
   0x8075b5a664 <__nptl_deallocate_tsd+196>:	std     r2,40(r1)
   0x8075b5a668 <__nptl_deallocate_tsd+200>:	addi    r31,r31,16
   0x8075b5a66c <__nptl_deallocate_tsd+204>:	addi    r30,r30,16
   0x8075b5a670 <__nptl_deallocate_tsd+208>:	mtctr   r0
   0x8075b5a674 <__nptl_deallocate_tsd+212>:	ld      r11,16(r9)
   0x8075b5a678 <__nptl_deallocate_tsd+216>:	ld      r2,8(r9)
   0x8075b5a67c <__nptl_deallocate_tsd+220>:	bctrl
   0x8075b5a680 <__nptl_deallocate_tsd+224>:	ld      r2,40(r1)
   0x8075b5a684 <__nptl_deallocate_tsd+228>:	cmpdi   cr7,r31,512
   0x8075b5a688 <__nptl_deallocate_tsd+232>:	bne     cr7,0x8075b5a630 <__nptl_deallocate_tsd+144>
   0x8075b5a68c <__nptl_deallocate_tsd+236>:	nop
   0x8075b5a690 <__nptl_deallocate_tsd+240>:	mr      r11,r13
   0x8075b5a694 <__nptl_deallocate_tsd+244>:	addi    r9,r13,-30480
   0x8075b5a698 <__nptl_deallocate_tsd+248>:	addi    r27,r27,1
   0x8075b5a69c <__nptl_deallocate_tsd+252>:	cmpdi   cr7,r27,32
(gdb) x/20x 0x8075e6a050
0x8075e6a050:	Cannot access memory at address 0x8075e6a050

The address is not NULL but neither is it valid.

Thanks. New core is identical failure to the previous one. Same 0x8075e6a050 address as before.

I am going to try and recreate this while setting a breakpoint on pthread_key_create and see if anyone registers this particular destructor address.

Basically, the theory I have is that some part of the code loads a library, much like a plugin, which registers two destructors and then the library is unloaded leaving the destructor addresses in the _pthread_keys table. Therefore, when finally invoke __nptl_deallocate_tsd() it crashes on the user supplied destructor since the address space for the library it belonged has been unloaded. The only thing I am trying to find out now is who is registering those destructors.

With  Sowjanya's help and the breakpoint set on pthread_key_create, I was able to catch who was registering the destructor and it appears to be glib (which is totally unrelated to glibc).  It happens as a consequence of this PAM module

#6  0x00000fff0af52de4 in .pam_sm_authenticate () from /lib64/security/pam_fprintd.so

which apparently relies on the glib library. It seems once it gets to g_thread_init_glib() code it uses a glib function to store away a couple of private keys. The gthread implementation calls pthread_key_create() to create the key and provide the destructor.

(gdb) bt
#0  __pthread_key_create (key=0xffea0255b90, destr=@0x803428a050: 0x80341d2a50) at pthread_key_create.c:33
#1  0x00000fff0ade3758 in ?? () from /lib64/libgthread-2.0.so.0
#2  0x00000080341d2964 in .g_thread_init_glib () from /lib64/libglib-2.0.so.0
#3  0x00000fff0ade3f2c in .g_thread_init () from /lib64/libgthread-2.0.so.0
#4  0x00000fff0ae46ff4 in .g_type_init_with_debug_flags () from /lib64/libgobject-2.0.so.0
#5  0x00000fff0ae47134 in .g_type_init () from /lib64/libgobject-2.0.so.0
#6  0x00000fff0af52de4 in .pam_sm_authenticate () from /lib64/security/pam_fprintd.so
#7  0x00000fff7e9e5b24 in ?? () from /lib64/libpam.so.0
#8  0x00000fff7e9e5048 in .pam_authenticate () from /lib64/libpam.so.0
#9  0x00000fff70751ac8 in .simple_pam ()
   from /opt/ibm/esa/lwi/runtime/core/../../native/liblwisecurity.Linux.ppc64.so
#10 0x00000fff707526ec in .Java_com_ibm_lwi_security_nativeproviders_LwiNativeProviderImpl_checkUserPAM ()
   from /opt/ibm/esa/lwi/runtime/core/../../native/liblwisecurity.Linux.ppc64.so
#11 0x00000fff7ddaa18c in L201 () from /opt/ibm/java-ppc64-70/jre/lib/ppc64/compressedrefs/libj9vm26.so
#12 0x00000fff7c651010 in .JVM_InvokeMethod ()
   from /opt/ibm/java-ppc64-70/jre/lib/ppc64/compressedrefs/libjclse7b_26.so
#13 0x00000fff7de9f068 in .JVM_InvokeMethod ()
   from /opt/ibm/java-ppc64-70/jre/lib/ppc64/compressedrefs/libjvm.so
#14 0x00000fff7e8ebd38 in .JVM_InvokeMethod () from /opt/ibm/java-ppc64-70/jre/lib/ppc64/j9vm/libjvm.so
#15 0x00000fff7087ea7c in .Java_sun_reflect_NativeMethodAccessorImpl_invoke0 ()
   from /opt/ibm/java-ppc64-70/jre/lib/ppc64/libjava.so
#16 0x00000fff70ea20e0 in ?? ()
#17 0x00000fff7ddf32e0 in .javaProtectedThreadProc ()
   from /opt/ibm/java-ppc64-70/jre/lib/ppc64/compressedrefs/libj9vm26.so
#18 0x00000fff7dc8ddfc in .j9sig_protect ()
   from /opt/ibm/java-ppc64-70/jre/lib/ppc64/compressedrefs/libj9prt26.so
#19 0x00000fff7ddf3158 in .javaThreadProc ()
---Type <return> to continue, or q <return> to quit---q
 from /opt/ibm/java-ppc64-70/jre/lib/ppc64/compressedrefs/libj9Quit

However, I believe that once the pam_fprintd PAM module finishes, it gets unloaded and so does glib that it relies on but the destrutctor addresses remain in the __pthread_keys table and nothing cleans it up. I actually don't see where in glibc one calls to remove a key destructor. It almost seems like it is expect to be valid through the life of the process until the last thread exits.

I have suggest the submitter remove the pam_fprintd PAM module with the following command

authconfig --disablefingerprint --update

which will remove the entry in /etc/pam.d/system-auth and hopefully avoid this PAM module getting loaded during the authentication chain invocations.

Sending this bug on to Red Hat to see if they have seen something like this before and have an insight on how this should be fixed.

Comment 2 IBM Bug Proxy 2013-11-20 08:53:27 UTC
Created attachment 826469 [details]
skip empty slot

Comment 3 Hanns-Joachim Uhl 2013-11-20 09:02:55 UTC
... due to mirroring problems at the IBM side closing this Red Hat bugzilla .. sorry for the noise ...


Note You need to log in before you can comment on or make changes to this bug.