RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1897150 - OpenJDK 11 crashes when calling jstack on the JVM process
Summary: OpenJDK 11 crashes when calling jstack on the JVM process
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: java-11-openjdk
Version: 8.2
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: 8.0
Assignee: Andrew John Hughes
QA Contact: OpenJDK QA
URL:
Whiteboard:
Depends On: 1832121
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-12 12:45 UTC by Paulo Andrade
Modified: 2024-12-20 19:23 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1832121
Environment:
Last Closed: 2021-09-28 10:09:23 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)
case_02694877-without-eclipse.tar (10.83 MB, application/x-tar)
2020-11-12 19:48 UTC, Paulo Andrade
no flags Details

Comment 1 Paulo Andrade 2020-11-12 19:48:46 UTC
Created attachment 1728854 [details]
case_02694877-without-eclipse.tar

Reproducer that works on all java-11-openjdk environments tested.

Steps to reproduce:

$ tar xf case_02694877-without-eclipse.tar
$ cd case_02694877-without-eclipse.tar

Download http://archive.eclipse.org/eclipse/downloads/drops4/R-4.16-202006040540/download.php?dropFile=eclipse-platform-4.16-linux-gtk-x86_64.tar.gz

$ ./prepare.sh
$ ./test.sh

Did not add eclipse to the tarball due to size limitation of attachments.

On the customer support case page there is a tarball with a bundled eclipse,
named case_02694877.tar to make the reproducer self contained.

Comment 2 Simeon Andreev 2020-12-11 11:32:25 UTC
Any news here? Work on Eclipse is made difficult due to this; every time Eclipse is hanging or its UI is frozen (this is not infrequent) calling jstack to see what is wrong can crash Eclipse, losing both work and any information about the hang (and so any chance to try to fix it).

Comment 4 Andrey Loskutov 2020-12-16 23:09:59 UTC
Description of problem from original bug 1832121 (so this bug can be found by searching on stack trace) :

OpenJDK 11 crashes on jstack invocation.
Version-Release number of selected component (if applicable):

OpenJDK 11.0.5
OpenJDK 11.0.7

Additional info:

Stack: [0x00007fffb70a4000,0x00007fffb71a4000],  sp=0x00007fffb71a28f8,  free space=1018k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libpthread.so.0+0xce90]  pthread_getcpuclockid+0x0
V  [libjvm.so+0xe8a751]  Thread::print_on(outputStream*, bool) const+0x51
V  [libjvm.so+0xe8cfc6]  JavaThread::print_on(outputStream*, bool) const+0xe6
V  [libjvm.so+0xe8fd18]  Threads::print_on(outputStream*, bool, bool, bool, bool)+0x668
V  [libjvm.so+0xf04cd0]  VM_Operation::evaluate()+0xe0
V  [libjvm.so+0xf0269f]  VMThread::evaluate_operation(VM_Operation*)+0x11f
V  [libjvm.so+0xf02af5]  VMThread::loop()+0x265
V  [libjvm.so+0xf0302c]  VMThread::run()+0x7c
V  [libjvm.so+0xe90fe5]  Thread::call_run()+0x155
V  [libjvm.so+0xc1a878]  thread_native_entry(Thread*)+0xf8

Note: see also https://bugs.eclipse.org/bugs/show_bug.cgi?id=569757 and https://bugs.openjdk.java.net/browse/JDK-8258027

Comment 5 Mario Torre 2020-12-17 14:10:37 UTC
We suspect this may be an issue with Eclipse itself, it also affects JMC: https://bugs.openjdk.java.net/browse/JMC-6749

Comment 6 Alex Macdonald 2020-12-18 16:02:08 UTC
We are seeing something similar with JMC, and have the bug tracked in a number of places. The current thought is that this is related to GTK in some way, see the comment: https://bugs.openjdk.java.net/browse/JDK-8258027?focusedCommentId=14389444&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14389444

> Based on the analysis I am closing this as an "External" issue. The SWT callback code calls jni_AttachAsDaemon, performs a java upcall and then detaches from the VM again. The suspicion is that a GTK error causes the attached thread to terminate abruptly without ever detaching from the VM. 

For what it's worth, on the JMC-side we noticed this immediately after Eclipse updated to 2020-03 from 2019-12, so that's where we've been looking around the most.

JMC JIRA: https://bugs.openjdk.java.net/browse/JMC-6749
OpenJDK JIRA: https://bugs.openjdk.java.net/browse/JDK-8258027
Eclipse Bugzilla: https://bugs.eclipse.org/bugs/show_bug.cgi?id=569757

Comment 7 Simeon Andreev 2020-12-19 15:45:57 UTC
No reproduction when running the script over a day with:

Eclipse SDK
Version: 2019-12 (4.14)
Build id: I20191210-0610

Reproduced almost right away (2nd iteration of the script) with:

Eclipse SDK
Version: 2020-03 (4.15)
Build id: I20191219-1800

Unfortunately I have no builds saved in-between.

I'll see if I can find something that changed in SWT in that range, that is maybe causing the problem. GTK+ is not printing any warnings/errors during reproduction, so I don't know about GTK+ errors during SWT native code (GTK+ is usually very vocal on std err when something goes wrong / is unexpected).

I guess I'll also try adding prints to the thread attach and detach code (to verify if there really is a "dangling" attach), though that might be less trivial.

Comment 8 Simeon Andreev 2020-12-19 15:47:02 UTC
Do note that jstack should not crash the JDK regardless. We would like a fix even if this is an issue with Eclipse native code; the JDK cannot be crashed by jstack, when trying to find e.g. why the application "froze".

Comment 9 Simeon Andreev 2020-12-21 09:12:05 UTC
First SWT commit I can reproduce the jstack crash with is: https://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=558be9a19c6de3c914ccbed0ac541d5c849bf1f5 (I don't see thread attach/detach in that code, though maybe its somehow macro-ed).

The crash does not occur with Java 8 as runtime.

Comment 10 Simeon Andreev 2020-12-21 09:28:35 UTC
I've reported a bug for SWT: https://bugs.eclipse.org/bugs/show_bug.cgi?id=569853

Since I assume jstack crashing the JDK is only a symptom of some bigger problem with the native code added for: https://bugs.eclipse.org/bugs/show_bug.cgi?id=540060

Comment 11 Simeon Andreev 2020-12-21 14:48:27 UTC
We've updated https://bugs.eclipse.org/bugs/show_bug.cgi?id=569853 with our findings.

The reason for the missing detach is listed here: https://bugs.eclipse.org/bugs/show_bug.cgi?id=262985#c4

Its unclear if adding a detach will cause the same problem again.

Comment 12 Andrey Loskutov 2020-12-21 14:49:32 UTC
I would like to post here comments entered on the private RH issue https://access.redhat.com/support/cases/#/case/02694877?attachmentId=a092K000025JDNEQA4
but make sense to be public:

Comment from Andrade, Paulo:

Hi,

  As previously guessed, the thread pointer is invalid:

(gdb) bt
#0  0x00007ffff72031f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff72048e8 in __GI_abort () at abort.c:90
#2  0x00007ffff5e4fd47 in os::abort (dump_core=true, siginfo=0x7fffb6d377b0, context=0x7fffb6d37680) at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/os/linux/os_linux.cpp:1504
#3  0x00007ffff619cec3 in VMError::report_and_die (id=11, message=0x0, detail_fmt=0x7ffff687c7e9 "%s", detail_args=0x7fffb6d37350, thread=0x7ffff051a800, pc=0x7ffff79b5e90 <pthread_getcpuclockid> "\213\207\320\002", 
    siginfo=0x7fffb6d377b0, context=0x7fffb6d37680, filename=0x0, lineno=0, size=0) at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/share/utilities/vmError.cpp:1603
#4  0x00007ffff619be58 in VMError::report_and_die (thread=0x7ffff051a800, sig=11, pc=0x7ffff79b5e90 <pthread_getcpuclockid> "\213\207\320\002", siginfo=0x7fffb6d377b0, context=0x7fffb6d37680, detail_fmt=0x7ffff687c7e9 "%s")
    at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/share/utilities/vmError.cpp:1270
#5  0x00007ffff619bebf in VMError::report_and_die (thread=0x7ffff051a800, sig=11, pc=0x7ffff79b5e90 <pthread_getcpuclockid> "\213\207\320\002", siginfo=0x7fffb6d377b0, context=0x7fffb6d37680)
    at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/share/utilities/vmError.cpp:1276
#6  0x00007ffff5e5ed2b in JVM_handle_linux_signal (sig=11, info=0x7fffb6d377b0, ucVoid=0x7fffb6d37680, abort_if_unrecognized=1)
    at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp:616
#7  0x00007ffff5e57da4 in signalHandler (sig=11, info=0x7fffb6d377b0, uc=0x7fffb6d37680) at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/os/linux/os_linux.cpp:4650
#8  <signal handler called>
#9  pthread_getcpuclockid (threadid=140733637961472, clockid=0x7fffb6d383b0) at ../nptl/sysdeps/unix/sysv/linux/pthread_getcpuclockid.c:35
#10 0x00007ffff5e5bd15 in os::Linux::pthread_getcpuclockid (tid=140733637961472, clock_id=0x7fffb6d383b0) at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/os/linux/os_linux.hpp:214
#11 0x00007ffff5e5a8e7 in fast_cpu_time (thread=0x555557459000) at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/os/linux/os_linux.cpp:5796
#12 0x00007ffff5e5aa4e in os::thread_cpu_time (thread=0x555557459000, user_sys_cpu_time=true) at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/os/linux/os_linux.cpp:5842
#13 0x00007ffff6103b13 in Thread::print_on (this=0x555557459000, st=0x7fff6fee1b20, print_extended_info=false) at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/share/runtime/thread.cpp:902
#14 0x00007ffff610b158 in JavaThread::print_on (this=0x555557459000, st=0x7fff6fee1b20, print_extended_info=false) at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/share/runtime/thread.cpp:2998
#15 0x00007ffff610fc6c in Threads::print_on (st=0x7fff6fee1b20, print_stacks=true, internal_format=false, print_concurrent_locks=false, print_extended_info=false)
    at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/share/runtime/thread.cpp:4651
#16 0x00007ffff61a0147 in VM_PrintThreads::doit (this=0x7fff6fee19d0) at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/share/runtime/vmOperations.cpp:215
#17 0x00007ffff619f866 in VM_Operation::evaluate (this=0x7fff6fee19d0) at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/share/runtime/vmOperations.cpp:67
#18 0x00007ffff61e36fe in VMThread::evaluate_operation (this=0x7ffff051a800, op=0x7fff6fee19d0) at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/share/runtime/vmThread.cpp:413
#19 0x00007ffff61e3e11 in VMThread::loop (this=0x7ffff051a800) at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/share/runtime/vmThread.cpp:548
#20 0x00007ffff61e31ee in VMThread::run (this=0x7ffff051a800) at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/share/runtime/vmThread.cpp:310
#21 0x00007ffff6102c1e in Thread::call_run (this=0x7ffff051a800) at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/share/runtime/thread.cpp:379
#22 0x00007ffff5e4e0f1 in thread_native_entry (thread=0x7ffff051a800) at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/os/linux/os_linux.cpp:786
#23 0x00007ffff79b0e25 in start_thread (arg=0x7fffb6d39700) at pthread_create.c:308
#24 0x00007ffff72c634d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) f 9
#9  pthread_getcpuclockid (threadid=140733637961472, clockid=0x7fffb6d383b0) at ../nptl/sysdeps/unix/sysv/linux/pthread_getcpuclockid.c:35
35	  if (INVALID_TD_P (pd))
(gdb) p pd
$1 = (struct pthread *) 0x7fff1a7fa700
(gdb) p *pd
Cannot access memory at address 0x7fff1a7fa700

  The check is:
# define INVALID_TD_P(pd) __builtin_expect ((pd)->tid <= 0, 0)

$2 = (pid_t *) 0x7fff1a7fa9d0

that matches the fault address:

(gdb) f 7
#7  0x00007ffff5e57da4 in signalHandler (sig=11, info=0x7fffb6d377b0, uc=0x7fffb6d37680) at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/os/linux/os_linux.cpp:4650
4650	  JVM_handle_linux_signal(sig, info, uc, true);
(gdb) p info._sifields._sigfault
$3 = {si_addr = 0x7fff1a7fa9d0}

  Checking a bit more:

(gdb) f 11
#11 0x00007ffff5e5a8e7 in fast_cpu_time (thread=0x555557459000) at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/os/linux/os_linux.cpp:5796
5796	                                              &clockid);
(gdb) list
5791	static jlong slow_thread_cpu_time(Thread *thread, bool user_sys_cpu_time);
5792	
5793	static jlong fast_cpu_time(Thread *thread) {
5794	    clockid_t clockid;
5795	    int rc = os::Linux::pthread_getcpuclockid(thread->osthread()->pthread_id(),
5796	                                              &clockid);
5797	    if (rc == 0) {
5798	      return os::Linux::fast_thread_cpu_time(clockid);
5799	    } else {
5800	      // It's possible to encounter a terminated native thread that failed
(gdb) p thread._osthread
$4 = (OSThread *) 0x555555e746f0
(gdb) p* thread._osthread
$5 = {<CHeapObj<(MemoryType)2>> = {<AllocatedObj> = {_vptr.AllocatedObj = 0x7ffff70ce970 <vtable for OSThread+16>}, <No data fields>}, _start_proc = 0x0, _start_parm = 0x0, _state = RUNNABLE, _interrupted = 0, _thread_type = -235802127, 
  _pthread_id = 140733637961472, _caller_sigmask = {__val = {4, 140737327925148, 4145621761, 93825001722400, 140733637954672, 140737318800606, 8589934592, 93825001801456, 140733637954672, 152, 0, 93825001801456, 140733637954720, 
      140737318799079, 140733637954720, 7407211869416263168}}, sr = {_state = os::SuspendResume::SR_RUNNING}, _siginfo = 0x0, _ucontext = 0x0, _expanding_stack = 0, _alt_sig_stack = 0x0, _startThread_lock = 0x555555e61220, 
  _thread_id = 20384}
(gdb) p/x thread._osthread._pthread_id
$5 = 0x7fff1a7fa700

  The bad pointer appeared in frame 13:

(gdb) f 13
#13 0x00007ffff6103b13 in Thread::print_on (this=0x555557459000, st=0x7fff6fee1b20, print_extended_info=false) at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/share/runtime/thread.cpp:902
902	              os::thread_cpu_time(const_cast<Thread*>(this), true) / 1000000.0

  The bad thread is the 4th in the global list:

(gdb) p Threads::_thread_list._next._next._next
$14 = (JavaThread *) 0x555557459000

Following the list apparently only the native thread pointer got corrupted somehow:

(gdb) p ((struct pthread *)(Threads::_thread_list._osthread._pthread_id)).tid
$18 = 32733
(gdb) p ((struct pthread *)(Threads::_thread_list._next._osthread._pthread_id)).tid
$19 = 32690
(gdb) p ((struct pthread *)(Threads::_thread_list._next._next._osthread._pthread_id)).tid
$20 = 20649
(gdb) p ((struct pthread *)(Threads::_thread_list._next._next._next._osthread._pthread_id)).tid
Cannot access memory at address 0x7fff1a7fa9d0
(gdb) p ((struct pthread *)(Threads::_thread_list._next._next._next._next._osthread._pthread_id)).tid
$21 = 20347
(gdb) p ((struct pthread *)(Threads::_thread_list._next._next._next._next._next._osthread._pthread_id)).tid
$22 = 20346
(gdb) p ((struct pthread *)(Threads::_thread_list._next._next._next._next._next._next._osthread._pthread_id)).tid
$23 = 20228

  The (struct pthread *) value 0x7fff1a7fa700 looks like the pattern of
a valid thread pointer.

  All we can see is that somehow the native (real) thread is gone, and there
is a java wrapper pointing to the dangling data.

  I will make my test environment available to jvm experts, hopefully they can
better understand the root cause of the problem, and/or when/why the native
thread is gone.

Comment 13 Andrey Loskutov 2020-12-21 14:51:19 UTC
I would like to post here comments entered on the private RH issue https://access.redhat.com/support/cases/#/case/02694877?attachmentId=a092K000025JDNEQA4
but make sense to be public:

Comment from Basant, Divya:

Hello 

I did dig a little further into the provided dump file and my findings are as below:

Problem here is that thread id computed is corrupt: 

(gdb) f 10
#10 0x00007ffff5e5bd15 in os::Linux::pthread_getcpuclockid (
    tid=140733637961472, clock_id=0x7fffb6d383b0)
    at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/os/linux/os_linux.hpp:214
214	    return _pthread_getcpuclockid ? _pthread_getcpuclockid(tid, clock_id) : -1;
(gdb) p tid
$3 = 140733637961472


Now thread id here is passed on from lower frame, which is computed in frame #11: 

(gdb) f 11
#11 0x00007ffff5e5a8e7 in fast_cpu_time (thread=0x555557459000)
    at /usr/src/debug/java-11-openjdk-11.0.8.10-5.el7.x86_64/openjdk/src/hotspot/os/linux/os_linux.cpp:5796
5796	                                              &clockid);
(gdb) list
5791	static jlong slow_thread_cpu_time(Thread *thread, bool user_sys_cpu_time);
5792	
5793	static jlong fast_cpu_time(Thread *thread) {
5794	    clockid_t clockid;
5795	    int rc = os::Linux::pthread_getcpuclockid(thread->osthread()->pthread_id(),  <<======== HERE 
5796	                                              &clockid);
5797	    if (rc == 0) {
5798	      return os::Linux::fast_thread_cpu_time(clockid);
5799	    } else {
5800	      // It's possible to encounter a terminated native thread that failed

Thread structure looks intact here, as shown below: 

(gdb) p *thread
$7 = {<ThreadShadow> = {<CHeapObj<(MemoryType)2>> = {<AllocatedObj> = {
        _vptr.AllocatedObj = 0x7ffff70d8b10 <vtable for JavaThread+16>}, <No data fields>}, _pending_exception = 0x0, _exception_file = 0x0, 
    _exception_line = 0}, static _thr_current = 0x7ffff051a800, _gc_data = {
    140737338223344, 140737338888768, 17433981653976416256, 0, 0, 0, 0, 
    140737338110560, 140737338888928, 17433981653976416257, 0, 0, 0, 0, 
    17433981653976478193, 17433981653976478193, 17433981653976478193, 
    17433981653976478193}, _real_malloc_address = 0x555557458c80, 
  _threads_hazard_ptr = 0x0, _threads_list_ptr = 0x0, 
  _nested_threads_hazard_ptr_cnt = 0, _SR_lock = 0x555555e74620, 
  _suspend_flags = 0, _num_nested_signal = 0, _suspendible_thread = false, 
  _active_handles = 0x5555587ba7a0, _free_handle_block = 0x7fffa68ee640, 
  _last_handle_mark = 0x555558a91920, _oops_do_parity = 1, _rcu_counter = 0, 
  _allow_safepoint_count = 0, _allow_allocation_count = 0, 
  _skip_gcalot = false, _polling_page = 0x7ffff7ff6008, 
  _tlab = {<CHeapObj<(MemoryType)2>> = {<AllocatedObj> = {
        _vptr.AllocatedObj = 0x7ffff70d90b0 <vtable for ThreadLocalAllocBuffer+16>}, <No data fields>}, _start = 0x0, _top = 0x0, _pf_top = 0x0, _end = 0x0, 
    _allocation_end = 0x0, _desired_size = 63629, _refill_waste_limit = 994, 
    _allocated_before_last_gc = 888, 
    _bytes_since_last_sample_point = 17433981653976478193, 
    static _max_size = 131072, static _reserve_for_allocation_prefetch = 72, 
    static _target_refills = 50, _number_of_refills = 0, 
---Type <return> to continue, or q <return> to quit---
    _fast_refill_waste = 0, _slow_refill_waste = 0, _gc_waste = 0, 
    _slow_allocations = 0, _allocated_size = 0, 
    _allocation_fraction = {<CHeapObj<(MemoryType)5>> = {<AllocatedObj> = {
          _vptr.AllocatedObj = 0x7ffff70bcff0 <vtable for AdaptiveWeightedAverage+16>}, <No data fields>}, _average = 0.0454543307, _sample_count = 2, 
      _weight = 35, _is_old = false, static OLD_THRESHOLD = 100, 
      _last_sample = 0}, static _global_stats = 0x7ffff03ebb30}, 
  _allocated_bytes = 888, _heap_sampler = {_bytes_until_sample = 0, 
    static _rnd = 1459933632, static _enabled = 0, 
    static _sampling_interval = 524288, _collectors_present = 0}, 
  _statistical_info = {_start_time_stamp = 1596793964511, 
    _define_class_count = 0}, _jfr_thread_local = {_java_event_writer = 0x0, 
    _java_buffer = 0x0, _native_buffer = 0x0, _shelved_buffer = 0x0, 
    _stackframes = 0x0, _trace_id = 188, _thread = {_ptr = 0x0}, 
    _data_lost = 0, _stack_trace_id = 18446744073709551615, _user_time = 0, 
    _cpu_time = 0, _wallclock_time = 5546200411505827, _stack_trace_hash = 0, 
    _stackdepth = 0, _entering_suspend_flag = 0, _dead = false}, 
  _vm_operation_started_count = 0, _vm_operation_completed_count = 0, 
  _current_pending_monitor = 0x0, 
  _current_pending_monitor_is_from_java = true, 
  _current_waiting_monitor = 0x0, omFreeList = 0x0, omFreeCount = 0, 
  omFreeProvision = 32, omInUseList = 0x0, omInUseCount = 0, 
  _visited_for_critical_count = true, _unhandled_oops = 0xf1f1f1f1f1f1f1f1, 
---Type <return> to continue, or q <return> to quit---
  _osthread = 0x555555e746f0, _resource_area = 0x55555641d9c0, 
  _current_resource_mark = 0x0, _handle_area = 0x555558a918a0, 
  _metadata_handles = 0x5555564215b0, _stack_base = 0x7fff1a7fb000 "", 
  _stack_size = 10485760, _self_raw_id = 0, _lgrp_id = -1, _owned_locks = 0x0, 
  _jvmti_env_iteration_count = 0, _Stalled = 0, _TypeTag = 11181, 
  _ParkEvent = 0x555558f3eb00, _SleepEvent = 0x555558f3e900, 
  _MutexEvent = 0x555558f3e700, _MuxEvent = 0x5555565cce00, 
  NativeSyncRecursion = -235802127, _OnTrap = 0, _hashStateW = 273326509, 
  _hashStateX = 2059348338, _hashStateY = 842502087, _hashStateZ = 34663, 
  _schedctl = 0x0, rng = {-235802127, -235802127, -235802127, -235802127}}

Interesting is that if I track the thread details, as it should be returned by "thread->osthread()->pthread_id()", I find that this specific thread is not running as it is not listed in `info threads`: 

(gdb) p thread->_osthread
$8 = (OSThread *) 0x555555e746f0
(gdb) p *thread->_osthread
$9 = {<CHeapObj<(MemoryType)2>> = {<AllocatedObj> = {
      _vptr.AllocatedObj = 0x7ffff70ce970 <vtable for OSThread+16>}, <No data fields>}, _start_proc = 0x0, _start_parm = 0x0, _state = RUNNABLE, 
  _interrupted = 0, _thread_type = -235802127, _pthread_id = 140733637961472, 
  _caller_sigmask = {__val = {4, 140737327925148, 4145621761, 93825001722400, 
      140733637954672, 140737318800606, 8589934592, 93825001801456, 
      140733637954672, 152, 0, 93825001801456, 140733637954720, 
      140737318799079, 140733637954720, 7407211869416263168}}, sr = {
    _state = os::SuspendResume::SR_RUNNING}, _siginfo = 0x0, _ucontext = 0x0, 
  _expanding_stack = 0, _alt_sig_stack = 0x0, 
  _startThread_lock = 0x555555e61220, _thread_id = 20384}
(gdb) p thread->_osthread->_pthread_id
$10 = 140733637961472

(gdb) p thread->_osthread->_thread_id
$11 = 20384

There is no thread running with ID 20384. Complete list of threads appear as below: 

(gdb) p clock_id
$4 = (clockid_t *) 0x7fffb6d383b0
(gdb) info threads
  Id   Target Id         Frame 
  81   Thread 0x7ffff4561700 (LWP 17236) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  80   Thread 0x7fff6fee2700 (LWP 32733) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  79   Thread 0x7fff8dd8d700 (LWP 17990) 0x00007ffff79b7a9b in __libc_recv (
    fd=133, buf=0x7fffa482a188, n=8, flags=16384)
    at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  78   Thread 0x7fff7e2d5700 (LWP 20155) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  77   Thread 0x7fff8e78e700 (LWP 17989) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  76   Thread 0x7fff8d38c700 (LWP 17993) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  75   Thread 0x7fff94c48700 (LWP 17965) 0x00007ffff72bba3d in poll ()
    at ../sysdeps/unix/syscall-template.S:81
  74   Thread 0x7fffd6b94700 (LWP 17237) 0x00007ffff79b6a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, 
    futex=0x7ffff0041180)
    at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43
  73   Thread 0x7fffaa4b5700 (LWP 18026) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  72   Thread 0x7fff83230700 (LWP 32690) pthread_cond_timedwait@@GLIBC_2.3.2 ()
---Type <return> to continue, or q <return> to quit---
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  71   Thread 0x7fff8b3f7700 (LWP 18030) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  70   Thread 0x7fff83b3f700 (LWP 18074) pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  69   Thread 0x7fff95649700 (LWP 17963) 0x00007ffff72bba3d in poll ()
    at ../sysdeps/unix/syscall-template.S:81
  68   Thread 0x7fff8aef6700 (LWP 18031) 0x00007ffff79b7a9b in __libc_recv (
    fd=137, buf=0x7fff8aee4f50, n=8192, flags=0)
    at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  67   Thread 0x7fffab0bb700 (LWP 17325) 0x00007ffff79b6a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, 
    futex=0x7ffff0041180)
    at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43
  66   Thread 0x7fff86044700 (LWP 18069) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  65   Thread 0x7fffa95ae700 (LWP 17885) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  64   Thread 0x7fffab1bc700 (LWP 17324) 0x00007ffff79b6a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, 
    futex=0x7ffff0041180)
    at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43
  63   Thread 0x7fffac2d0700 (LWP 17318) pthread_cond_wait@@GLIBC_2.3.2 ()
---Type <return> to continue, or q <return> to quit---
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  62   Thread 0x7fff86547700 (LWP 18125) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  61   Thread 0x7fffae606700 (LWP 17252) 0x00007ffff79b7e4d in nanosleep ()
    at ../sysdeps/unix/syscall-template.S:81
  60   Thread 0x7fff84c41700 (LWP 18596) 0x00007ffff79b7a9b in __libc_recv (
    fd=95, buf=0x7fff84c30150, n=8192, flags=0)
    at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  59   Thread 0x7fffaf109700 (LWP 17249) pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  58   Thread 0x7fff8320c700 (LWP 18579) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  57   Thread 0x7fff80b08700 (LWP 18886) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  56   Thread 0x7fff86a48700 (LWP 18580) 0x00007ffff79b7a9b in __libc_recv (
    fd=90, buf=0x7fff86a37050, n=8192, flags=0)
    at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  55   Thread 0x7fff81a09700 (LWP 19422) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  54   Thread 0x7fff94247700 (LWP 18595) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  53   Thread 0x7fff6b404700 (LWP 20649) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
---Type <return> to continue, or q <return> to quit---
  52   Thread 0x7fff80407700 (LWP 20149) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  51   Thread 0x7fff81f0a700 (LWP 18844) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  50   Thread 0x7fffabdcf700 (LWP 17319) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  49   Thread 0x7fff7d48a700 (LWP 20151) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  48   Thread 0x7fffaaeb9700 (LWP 17398) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  47   Thread 0x7fff730ec700 (LWP 20176) 0x00007ffff79b770d in read ()
    at ../sysdeps/unix/syscall-template.S:81
  46   Thread 0x7fff71ce8700 (LWP 20180) 0x00007ffff79b7a9b in __libc_recv (
    fd=107, buf=0x7fff71cd6fd0, n=8192, flags=0)
    at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  45   Thread 0x7fffaabb6700 (LWP 17456) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  44   Thread 0x7fff721e9700 (LWP 20179) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  43   Thread 0x7fff6efdf700 (LWP 20218) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  42   Thread 0x7fffad6d4700 (LWP 17558) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
---Type <return> to continue, or q <return> to quit---
  41   Thread 0x7fff6eade700 (LWP 20219) 0x00007ffff79b7a9b in __libc_recv (
    fd=69, buf=0x7fff6eacd050, n=8192, flags=0)
    at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  40   Thread 0x7fff602ea700 (LWP 20303) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  39   Thread 0x7fffae0d6700 (LWP 17568) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  38   Thread 0x7fff6e5dd700 (LWP 20220) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  37   Thread 0x7fff1f4ff700 (LWP 20304) pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  36   Thread 0x7fffa8bac700 (LWP 17978) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  35   Thread 0x7fff6b905700 (LWP 20228) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  34   Thread 0x7fff8c98b700 (LWP 17994) 0x00007ffff79b7a9b in __libc_recv (
    fd=134, buf=0x7fffa482a3d8, n=8, flags=16384)
    at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  33   Thread 0x7fff1e0fd700 (LWP 20306) 0x00007ffff72bba3d in poll ()
    at ../sysdeps/unix/syscall-template.S:81
  32   Thread 0x7fff1eafe700 (LWP 20305) 0x00007ffff72bba3d in poll ()
    at ../sysdeps/unix/syscall-template.S:81
  31   Thread 0x7fffa90ad700 (LWP 18027) 0x00007ffff79b7a9b in __libc_recv (
---Type <return> to continue, or q <return> to quit---
    fd=136, buf=0x7fffa909bd10, n=8192, flags=0)
    at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  30   Thread 0x7fff1d6fc700 (LWP 20346) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  29   Thread 0x7fff1bbfc700 (LWP 20381) 0x00007ffff72bba3d in poll ()
    at ../sysdeps/unix/syscall-template.S:81
  28   Thread 0x7fff89dc0700 (LWP 18041) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  27   Thread 0x7fff1cfff700 (LWP 20347) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  26   Thread 0x7fff88a80700 (LWP 18058) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  25   Thread 0x7fffaacb7700 (LWP 17438) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  24   Thread 0x7fff1b1fb700 (LWP 20382) 0x00007ffff72bba3d in poll ()
    at ../sysdeps/unix/syscall-template.S:81
  23   Thread 0x7fff8807f700 (LWP 18059) 0x00007ffff79b7a9b in __libc_recv (
    fd=140, buf=0x7fffa4568498, n=8, flags=16384)
    at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  22   Thread 0x7fffaadb8700 (LWP 17437) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  21   Thread 0x7fff85643700 (LWP 18070) 0x00007ffff79b7a9b in __libc_recv (
    fd=142, buf=0x7fffa4792778, n=8, flags=16384)
---Type <return> to continue, or q <return> to quit---
    at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  20   Thread 0x7ffff7fcf740 (LWP 17235) 0x00007ffff79b1f57 in pthread_join (
    threadid=140737292670720, thread_return=0x7fffffff2658)
    at pthread_join.c:92
  19   Thread 0x7fffab6bd700 (LWP 17323) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  18   Thread 0x7fffb9dc2700 (LWP 17238) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  17   Thread 0x7fff84040700 (LWP 18441) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  16   Thread 0x7fffac7d1700 (LWP 17311) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  15   Thread 0x7fffb9cc1700 (LWP 17239) 0x00007ffff79b6a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, 
    futex=0x7ffff009b210)
    at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43
  14   Thread 0x7fff7f906700 (LWP 20154) pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  13   Thread 0x7fffaccd2700 (LWP 17308) pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  12   Thread 0x7fffb77bc700 (LWP 17240) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  11   Thread 0x7fff75df5700 (LWP 20160) pthread_cond_timedwait@@GLIBC_2.3.2 ()
---Type <return> to continue, or q <return> to quit---
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  10   Thread 0x7fffb76bb700 (LWP 17241) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  9    Thread 0x7fffaeb07700 (LWP 17251) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  8    Thread 0x7fffaf80c700 (LWP 17246) pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  7    Thread 0x7fffaafba700 (LWP 17326) 0x00007ffff79b6a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, 
    futex=0x7ffff0041180)
    at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43
  6    Thread 0x7fffaf70b700 (LWP 17247) pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  5    Thread 0x7fffb6c38700 (LWP 17243) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  4    Thread 0x7fffafd0d700 (LWP 17245) 0x00007ffff79b6a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, 
    futex=0x7ffff0012a50)
    at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43
  3    Thread 0x7fffb6737700 (LWP 17244) pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  2    Thread 0x7fffaf60a700 (LWP 17248) pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
---Type <return> to continue, or q <return> to quit---
* 1    Thread 0x7fffb6d39700 (LWP 17242) 0x00007ffff72031f7 in __GI_raise (
    sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56

So it appears that thread does not exist anymore but it's reference in memory still exists and that is why jstack fails to process non-existing/running thread details and crashes. At this point it looks more like an application bug to me and threads being used incorrectly might be the very cause of application going to hung state.


Regards, 
Divya

Comment 14 Andrey Loskutov 2020-12-21 14:52:43 UTC
I would like to post here comments entered on the private RH issue https://access.redhat.com/support/cases/#/case/02694877?attachmentId=a092K000025JDNEQA4
but make sense to be public:

Comment from Andrade, Paulo

Hi,

  Not certain if suggested before, but likely the problem can be worked
around with

-XX:-UseLinuxPosixThreadCPUClocks


  There might be some race condition in the jvm, where jstack attaches
to the jvm in an inconsistent state. With the above it should still
attach in an inconsistent state, but because should read data from /proc
it is unlikely it will crash. Instead, just print some warning or error
messages.

Comment 15 Andrey Loskutov 2020-12-21 15:01:22 UTC
(In reply to Andrey Loskutov from comment #14)
>   Not certain if suggested before, but likely the problem can be worked
> around with
> 
> -XX:-UseLinuxPosixThreadCPUClocks


I see the flag is enabled by default in both java 8 and Java 11, for a good reason: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6888526 and for example https://blog.packagecloud.io/eng/2017/03/14/using-strace-to-understand-java-performance-improvement/

So disabling this flag is not an option for us (Advantest).

Note: although we've identified the code in Eclipse that "forgets" to call DetachCurrentThread after AttachCurrentThreadAsDaemon (see https://bugs.eclipse.org/bugs/show_bug.cgi?id=569853), we believe that JVM should be fixed, because Eclipse code in question is not the only native code that may "forget" about DetachCurrentThread. Also https://docs.oracle.com/en/java/javase/11/docs/specs/jni/invocation.html#attachcurrentthreadasdaemon doesn't state that a thread *must* call DetachCurrentThread after using AttachCurrentThreadAsDaemon, and there are plenty ways how an application can "forget" to do so.

To sum up:

1) Java 8 has UseLinuxPosixThreadCPUClocks and no issues / crashes with jstack & Eclipse 4.15.
2) Java 11 has crashes with jstack & same Eclipse 4.15.
3) It is an obvious regression in JVM and it should be addressed, independently if the code in Eclipse will be fixed or not.

Comment 21 Mario Torre 2021-09-28 10:09:23 UTC
It appears that this bug can only be properly addressed by the JVM with a major rewrite of the thread initialization and teardown logic, which would be very invasive for OpenJDK 8 and 11 at this stage. Later versions of OpenJDK, including 17, contains this rewrite, it is however user responsibility to ensure the proper order of DetachCurrentThread and AttachCurrentThreadAsDaemon (in that case of Eclipse).

Comment 22 Andrew Dinn 2023-01-26 14:31:22 UTC
(In reply to Andrey Loskutov from comment #15)

> Note: although we've identified the code in Eclipse that "forgets" to call
> DetachCurrentThread after AttachCurrentThreadAsDaemon (see
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=569853), we believe that JVM
> should be fixed, because Eclipse code in question is not the only native
> code that may "forget" about DetachCurrentThread. Also
> https://docs.oracle.com/en/java/javase/11/docs/specs/jni/invocation.
> html#attachcurrentthreadasdaemon doesn't state that a thread *must* call
> DetachCurrentThread after using AttachCurrentThreadAsDaemon, and there are
> plenty ways how an application can "forget" to do so.

You are right that the documentation does not explicitly state that a client of this API *must* call DetachCurrentThread after using AttachCurrentThreadAsDaemon. I agree that it probably ought to do so. However, I don't think that justifies your conclusion that the JVM *must* be 'fixed' to handle some or all possible consequences of clients not calling that method.

Firstly, note that the API was not designed to cater for such a circumstance, even if this is not explicitly stated. I believe it's a fairly small and not a very controversial inference from the fact that this API is i) provided and ii) documented next to the Attach API to the conclusions that i) it's a good idea to use it and ii) that things might very likely go more or less wrong if you don't. Yes, it's not explicit in the documentation but I think this requirement is clearly visible in the design of the API.

The case is the same for most JNI/JVM APIs. The JNI documentation rarely offers a fully comprehensive set of warnings to spell out every possible thing that could go wrong. Just as not every cliff has a signpost saying "Do not jump off this cliff as you may well die." The docs assume a certain amount of drawing of the obvious conclusion on the part of the developer.

Which implies (modulo the ever present possibility of latent bugs) that the implementation is not actually 'broken'. As far as we are aware, clients which use the API as intended suffer no problems. What, arguably, is 'broken' is the documentation. It would be fair to offer the more guarded critique that the JVM is not *robust* in the face of misuse of this API. Failure on a client's part to detach a thread clearly can lead to catastrophic consequences.

While true, that is not an abnormal situation when it comes to incorrect use of JNI APIs or, indeed, even for apps that are using JNI correctly but suffer from their own bugs. There are many ways that apps which use JNI can derail JVM operation . That's because JNI explicitly allows them to bypass the reliability guarantees offered by the JVM's implementation of a managed runtime (an 'opportunity' that is explicitly documented). That does not mean that the JVM should not and does not take steps to minimize any harm that may be caused but it means we are in a world where harm cannot always be avoided. Given that, the question then becomes which measures to take and at what cost.

Clearly, there have been improvements made in later JVM releases in an attempt to make the JVM more robust against failure to detach a thread. However, while such changes may be appropriate in a development branch, where they can be tested both for their impact on reliability and performance before being released, that does not mean they will automatically be appropriate for backport into a maintenance tree, where the imperative is to keep the platform stable across all existing deployments. The need for stability grows over time and that even comes at the cost of not fixing bugs that only affect only a small fraction of existing deployments. In such cases, the more complex the fix the less likely it is that a backport will occur. The case is stronger when the error arises because an API is used in a way that was never intended. As an OpenJDK backports project reviewer I would be extremely unwilling to propose the relevant fixes here for backport to jdk11u.


Note You need to log in before you can comment on or make changes to this bug.