Created attachment 1791450 [details] Crash log from JDK 16. Description of problem: OpenJDK 16 and OpenJDK 17 early access crash while an application running on the JDK is being debugged by Eclipse. Version-Release number of selected component (if applicable): openjdk version "16.0.1" 2021-04-20 OpenJDK Runtime Environment (build 16.0.1+9-24) OpenJDK 64-Bit Server VM (build 16.0.1+9-24, mixed mode, sharing) openjdk version "17-ea" 2021-09-14 OpenJDK Runtime Environment (build 17-ea+25-2252) OpenJDK 64-Bit Server VM (build 17-ea+25-2252, mixed mode, sharing) How reproducible: We observe the crash during one of our UI performance tests. So far we have no steps to reproduce outside of our product (reproducing the UI test to a Java application that is interacted with e.g. jdb will be time consuming and not trivial). Actual results: OpenJDK 16 and OpenJDK 17 early access crash with the following trace: Current thread (0x00007fff501390c0): JavaThread "JDWP Transport Listener: dt_socket" daemon [_thread_in_vm, id=66520, stack(0x00007ffee9451000,0x00007ffee9552000)] Stack: [0x00007ffee9451000,0x00007ffee9552000], sp=0x00007ffee9550b80, free space=1022k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xbc8ae8] OopStorage::Block::release_entries(unsigned long, OopStorage*)+0x38 V [libjvm.so+0xbc8d4d] OopStorage::release(oopDesc* const*)+0x8d V [libjvm.so+0x87af3d] jni_DeleteGlobalRef+0x8d C [libjdwp.so+0xe8da] deleteNode+0x7a C [libjdwp.so+0xeedf] commonRef_reset+0x6f C [libjdwp.so+0x1058c] debugInit_reset+0x6c C [libjdwp.so+0x267d2] acceptThread+0xa2 V [libjvm.so+0x9fb243] JvmtiAgentThread::call_start_function()+0x83 V [libjvm.so+0xd7c9db] JavaThread::thread_main_inner()+0x11b V [libjvm.so+0xd816cd] Thread::call_run()+0xfd V [libjvm.so+0xbd86d7] thread_native_entry(Thread*)+0xe7 Expected results: No crash. Additional info: We no longer observe the crash, if we check out OpenJDK 16 (https://github.com/openjdk/jdk16) and revert this commit: https://github.com/openjdk/jdk/commit/79f1dfb8d3941377da77e73f7bbab93baef29b8e (and then build and use the modified JDK 16) RHEL support case: https://access.redhat.com/support/cases/#/case/02965947 Core dump: https://access.redhat.com/support/cases/#/case/02965947/discussion?attachmentId=a092K00002aGmLLQA0 We were required to do preliminary performance evaluation with OpenJDK 17 and Shenandoah GC (which has fixes in OpenJDK 17 that we need in order to use the GC, assumed to improve performance in parts of our product). During the evaluation we observed the crash.
Jdk16 is not in rhel7. it is available only via epel. Or are your binaries from different source? Unluckily, jdk17 will not go to rhel7 too, due its lifecycle phase. Do you observe the issue also in el8?
Jiri, we don't care if that is in RHEL 7 or not, please move to appropriate RHEL version (8 or 9?). We simply want to make sure Java 17 bug is fixed *in some* RHEL release.
> Do you observe the issue also in el8? We won't be able to test on RHEL 8, as our entire infrastructure is still on RHEL 7. We don't have an RHEL 8 environment where our product has been ran. If OpenJDK developers/maintainers are not able to figure out a fix, and we are asked for a reproducer, and we manage to create one outside of our product (the success of this step might be unlikely), we could try that reproducer on RHEL 8 (we do have RHEL 8 test machines).
(In reply to Andrey Loskutov from comment #4) > Jiri, we don't care if that is in RHEL 7 or not, please move to appropriate > RHEL version (8 or 9?). > We simply want to make sure Java 17 bug is fixed *in some* RHEL release. Sure. I was trying to figure out if it was indeed java-latest-openjdk from epel7. As there is no reproducer, and epel7 have pretty old gcc, I was wondering if you observe the same also for java-latest-openjdk built for epel8, as there is much newer gcc used to built it, and that would eliminate some possibilities.
(In reply to jiri vanek from comment #6) > Sure. I was trying to figure out if it was indeed java-latest-openjdk from > epel7. As there is no reproducer, and epel7 have pretty old gcc, I was > wondering if you observe the same also for java-latest-openjdk built for > epel8, as there is much newer gcc used to built it, and that would eliminate > some possibilities. Jiri, since RHEL doesn't provide Java 16 and Java 17 is not there yet, there is no RHEL release where you would see the problem anyway. It is unlikely the gcc issue, because we already identified a problem with a concrete commit causing the crash: https://github.com/openjdk/jdk/commit/79f1dfb8d3941377da77e73f7bbab93baef29b8e, and the software works fine after reverting it. Would be really strange it is only one gcc related issue for the entire Java 16 code base.
Adding OpenJDK bug reference which is suspected to cause this issue.
Created attachment 1793059 [details] Crash log file with OpenJDK 16 fast debug build. I ran our UI tests with a fast debug build of OpenJDK 16, with it, an assertion is hit: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/tmp/jep2604_fastdebug_jdk16/jdk16/src/hotspot/share/runtime/jniHandles.cpp:148), pid=81289, tid=56880 # assert(!is_jweak(handle)) failed: wrong method for detroying jweak # # JRE version: OpenJDK Runtime Environment (16.0) (fastdebug build 16-internal+0-adhoc.sandreev.jdk16) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 16-internal+0-adhoc.sandreev.jdk16, mixed mode, tiered, compressed oops, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x1043cb4] JNIHandles::destroy_global(_jobject*)+0x224 # # Core dump will be written. Default location: Core dumps may be processed with "/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I %h" (or dumping to /tmp/st8-gui-automation-test/GUIPerfTest/workspaces/Lahaina_GUIPerf_deviceFromSVN/core.81289) # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # --------------- S U M M A R Y ------------ Command Line: ... Host: socbm775, Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz, 16 cores, 125G, Red Hat Enterprise Linux Workstation release 7.4 (Maipo) Time: Tue Jun 22 14:11:27 2021 CEST elapsed time: 7011.035246 seconds (0d 1h 56m 51s) --------------- T H R E A D --------------- Current thread (0x00007fff58104d50): JavaThread "JDWP Transport Listener: dt_socket" daemon [_thread_in_vm, id=56880, stack(0x00007ffe3c4f5000,0x00007ffe3c5f6000)] Stack: [0x00007ffe3c4f5000,0x00007ffe3c5f6000], sp=0x00007ffe3c5f4be0, free space=1022k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x1043cb4] JNIHandles::destroy_global(_jobject*)+0x224 V [libjvm.so+0xfafd9f] jni_DeleteGlobalRef+0x10f C [libjdwp.so+0xf11a] deleteNode+0x7a C [libjdwp.so+0xf71f] commonRef_reset+0x6f C [libjdwp.so+0x10dec] debugInit_reset+0x6c C [libjdwp.so+0x270a2] acceptThread+0xa2 V [libjvm.so+0x127c054] JvmtiAgentThread::call_start_function()+0x1d4 V [libjvm.so+0x1a51e9a] JavaThread::thread_main_inner()+0x5ba V [libjvm.so+0x1a589e0] Thread::call_run()+0x100 V [libjvm.so+0x15fbd26] thread_native_entry(Thread*)+0x116
This bug was now created: https://bugs.openjdk.java.net/browse/JDK-8269232
OK, with the suggestions provided on the mailing list and on https://bugs.openjdk.java.net/browse/JDK-8269232, I was able to write an Eclipse-based reproducer. As suggested, enabling collections multiple times (not matching the disable collections calls) triggers the crash: IJavaClassType jdiClassType = (IJavaClassType) javaClass[0]; IJavaObject instance = jdiClassType.newInstance("()V", null, javaThread); try { instance.disableCollection(); if (!monitor.isCanceled()) { instance.sendMessage("isEmpty", "()Z", null, javaThread, false); } try { Thread.sleep(1_000); } catch (InterruptedException e) { e.printStackTrace(); } } finally { instance.enableCollection(); instance.enableCollection(); } Looking at the crashing code I did try to not call enableCollection() previously (in the code above), but that didn't reproduce the issue. I'll see if I can reproduce this with jdb, since steps in Eclipse might not be easy to use for everyone. Meanwhile, let me know if I should attach the Eclipse plug-in that is used to reproduce the crash and provide the steps for this. I'll also try to trace the enable/disable collection calls in our product (possibly we have some faulty code either in it or in Eclipse). And I'll try the patch suggested by Roman: diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/commonRef.c b/src/jdk.jdwp.agent/share/native/libjdwp/commonRef.c index 054b736e46b..be52477e44c 100644 --- a/src/jdk.jdwp.agent/share/native/libjdwp/commonRef.c +++ b/src/jdk.jdwp.agent/share/native/libjdwp/commonRef.c @@ -205,7 +205,9 @@ weakenNode(JNIEnv *env, RefNode *node) } return weakRef; } else { - node->strongCount--; + if (node->strongCount > 0) { + node->strongCount--; + } return node->ref; } }
Adding a link to JDK-8269232 which is the upstream bug where this is being tracked.
(In reply to Simeon Andreev from comment #12) > Meanwhile, let me know if I should attach the > Eclipse plug-in that is used to reproduce the crash and provide the steps > for this. Yes, please. That would be great! If we'd have something to reproduce the issue it would be a lot better than having nothing ;-)
Created attachment 1793914 [details] Eclipse plug-in that does non-matching enable/disable collection calls. OK, I ran our UI performance tests with the suggested patch: diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/commonRef.c b/src/jdk.jdwp.agent/share/native/libjdwp/commonRef.c index 054b736e46b..be52477e44c 100644 --- a/src/jdk.jdwp.agent/share/native/libjdwp/commonRef.c +++ b/src/jdk.jdwp.agent/share/native/libjdwp/commonRef.c @@ -205,7 +205,9 @@ weakenNode(JNIEnv *env, RefNode *node) } return weakRef; } else { - node->strongCount--; + if (node->strongCount > 0) { + node->strongCount--; + } return node->ref; } } I no longer see the crash. Also I don't see the crash with the Eclipse reproduction steps listed below. I was unable to reproduce with jdb and a snippet, as I was unable to create objects in the debuggee JVM. Calling enablegc on local variables or fields did not result in a crash. So here are the steps to reproduce with Eclipse: 1. Run Eclipse with the plug-in from the attached archive "TestJdiAccess.zip". 2. Run the snippet following snippet with debug agent, so that JDT can attach: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8080 public class Test { public static void main(String[] args) { while (true) { System.out.println("sleep"); try { Thread.sleep(1_000); } catch (InterruptedException e) { e.printStackTrace(); } } } } 3. Attach to the running snippet, using a "Remote Java Application" launch, targeting the same port as above (8080). 4. Run the command for the handler from the attached plug-in, e.g. by clicking on the new toolbar button. 5. Disconnect the attach launch. 6. Observe crash: # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007ffff6d5cae8, pid=16469, tid=16491 # # JRE version: OpenJDK Runtime Environment (16.0.1+9) (build 16.0.1+9-24) # Java VM: OpenJDK 64-Bit Server VM (16.0.1+9-24, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0xbc8ae8] OopStorage::Block::release_entries(unsigned long, OopStorage*)+0x38 # # Core dump will be written. Default location: Core dumps may be processed with "/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I %h" (or dumping to /data/tmp/runtimeeclipse_local_ws_java16/Test1/core.16469) # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # --------------- S U M M A R Y ------------ Command Line: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8080 -Dfile.encoding=UTF-8 -XX:+ShowCodeDetailsInExceptionMessages test.Test Host: Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz, 16 cores, 125G, Red Hat Enterprise Linux Workstation release 7.4 (Maipo) Time: Thu Jun 24 08:56:21 2021 CEST elapsed time: 7.442819 seconds (0d 0h 0m 7s) --------------- T H R E A D --------------- Current thread (0x00007ffff0229fd0): JavaThread "JDWP Transport Listener: dt_socket" daemon [_thread_in_vm, id=16491, stack(0x00007fff824f9000,0x00007fff825fa000)] Stack: [0x00007fff824f9000,0x00007fff825fa000], sp=0x00007fff825f8b80, free space=1022k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xbc8ae8] OopStorage::Block::release_entries(unsigned long, OopStorage*)+0x38 V [libjvm.so+0xbc8d4d] OopStorage::release(oopDesc* const*)+0x8d V [libjvm.so+0x87af3d] jni_DeleteGlobalRef+0x8d C [libjdwp.so+0xe8da] deleteNode+0x7a C [libjdwp.so+0xeedf] commonRef_reset+0x6f C [libjdwp.so+0x1058c] debugInit_reset+0x6c C [libjdwp.so+0x267d2] acceptThread+0xa2 V [libjvm.so+0x9fb243] JvmtiAgentThread::call_start_function()+0x83 V [libjvm.so+0xd7c9db] JavaThread::thread_main_inner()+0x11b V [libjvm.so+0xd816cd] Thread::call_run()+0xfd V [libjvm.so+0xbd86d7] thread_native_entry(Thread*)+0xe7
(In reply to Simeon Andreev from comment #15) > > So here are the steps to reproduce with Eclipse: > > 1. Run Eclipse with the plug-in from the attached archive > "TestJdiAccess.zip". > 2. Run the snippet following snippet with debug agent, so that JDT can > attach: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8080 > > public class Test { > public static void main(String[] args) { > while (true) { > System.out.println("sleep"); > try { > Thread.sleep(1_000); > } catch (InterruptedException e) { > e.printStackTrace(); > } > } > } > } 2.a: Set a breakpoint in the above snippet somewhere so that the debugger actually suspends. > 3. Attach to the running snippet, using a "Remote Java Application" launch, > targeting the same port as above (8080). > 4. Run the command for the handler from the attached plug-in, e.g. by > clicking on the new toolbar button. > 5. Disconnect the attach launch. > 6. Observe crash: > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007ffff6d5cae8, pid=16469, tid=16491 > # > # JRE version: OpenJDK Runtime Environment (16.0.1+9) (build 16.0.1+9-24) > # Java VM: OpenJDK 64-Bit Server VM (16.0.1+9-24, mixed mode, sharing, > tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0xbc8ae8] OopStorage::Block::release_entries(unsigned > long, OopStorage*)+0x38 > # > # Core dump will be written. Default location: Core dumps may be > processed with "/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I %h" > (or dumping to /data/tmp/runtimeeclipse_local_ws_java16/Test1/core.16469) > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > # > > --------------- S U M M A R Y ------------ > > Command Line: > -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8080 > -Dfile.encoding=UTF-8 -XX:+ShowCodeDetailsInExceptionMessages test.Test > > Host: Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz, 16 cores, 125G, Red Hat > Enterprise Linux Workstation release 7.4 (Maipo) > Time: Thu Jun 24 08:56:21 2021 CEST elapsed time: 7.442819 seconds (0d > 0h 0m 7s) > > --------------- T H R E A D --------------- > > Current thread (0x00007ffff0229fd0): JavaThread "JDWP Transport > Listener: dt_socket" daemon [_thread_in_vm, id=16491, > stack(0x00007fff824f9000,0x00007fff825fa000)] > > Stack: [0x00007fff824f9000,0x00007fff825fa000], sp=0x00007fff825f8b80, > free space=1022k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0xbc8ae8] OopStorage::Block::release_entries(unsigned > long, OopStorage*)+0x38 > V [libjvm.so+0xbc8d4d] OopStorage::release(oopDesc* const*)+0x8d > V [libjvm.so+0x87af3d] jni_DeleteGlobalRef+0x8d > C [libjdwp.so+0xe8da] deleteNode+0x7a > C [libjdwp.so+0xeedf] commonRef_reset+0x6f > C [libjdwp.so+0x1058c] debugInit_reset+0x6c > C [libjdwp.so+0x267d2] acceptThread+0xa2 > V [libjvm.so+0x9fb243] JvmtiAgentThread::call_start_function()+0x83 > V [libjvm.so+0xd7c9db] JavaThread::thread_main_inner()+0x11b > V [libjvm.so+0xd816cd] Thread::call_run()+0xfd > V [libjvm.so+0xbd86d7] thread_native_entry(Thread*)+0xe7 OK, thanks. Reproduced with the above 2.a addition.
> 2.a: Set a breakpoint in the above snippet somewhere so that the debugger > actually suspends. Yes, sorry, forgot to mention this.
(In reply to Simeon Andreev from comment #15) > I was unable to reproduce with jdb and a snippet, as I was unable to create > objects in the debuggee JVM. Calling enablegc on local variables or fields > did not result in a crash. Yes, neither was I. The reason for this is that Eclipse comes with its own implementation of JDI. The OpenJDK version guards for unbalanced calls to enable/disableCollection()[1] while Eclipse's implementation doesn't[2]. This means the JDWP implementation of OpenJDK needs to be extra defensive to account for this case. [1] Note gcDisableCount guards in: https://github.com/openjdk/jdk/blob/e515873f887ce4071ab4878a4bafca8eea67afea/src/jdk.jdi/share/classes/com/sun/tools/jdi/ObjectReferenceImpl.java#L435..L444 https://github.com/openjdk/jdk/blob/e515873f887ce4071ab4878a4bafca8eea67afea/src/jdk.jdi/share/classes/com/sun/tools/jdi/ObjectReferenceImpl.java#L447..L461 [2] https://github.com/eclipse/eclipse.jdt.debug/blob/9d267305a774f5db042c71412370c915e83208f4/org.eclipse.jdt.debug/jdi/org/eclipse/jdi/internal/ObjectReferenceImpl.java#L99..L108 https://github.com/eclipse/eclipse.jdt.debug/blob/9d267305a774f5db042c71412370c915e83208f4/org.eclipse.jdt.debug/jdi/org/eclipse/jdi/internal/ObjectReferenceImpl.java#L114..L123
Adding the Eclipse bug for getting their JDI implementation fixed (count disable/enableCollection() calls).
I see the suggested patch was integrated: https://github.com/openjdk/jdk17/pull/165 When is the next early access OpenJDK 17 build planned? We would like to double check with the fix.
(In reply to Simeon Andreev from comment #20) > I see the suggested patch was integrated: > https://github.com/openjdk/jdk17/pull/165 > > When is the next early access OpenJDK 17 build planned? We would like to > double check with the fix. Watch the upstream bug[1] to say "Resolved In Build: XX" which should happen in the next week or two. Currently it says "master" for the build number, so it's in no EA build yet. Once you know the build number, you go to https://jdk.java.net/17/ and verify that the latest build there is >= than the build number as mentioned in the bug. Then, download the EA build and test :) [1] https://bugs.openjdk.java.net/browse/JDK-8269232
OK, thanks. And thanks for the quick resolution!
Just a heads-up JDK 17+29 is available which includes the fix of JDK-8269232: https://jdk.java.net/17/
FEDORA-2021-09a7e126e5 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2021-09a7e126e5
FEDORA-2021-09a7e126e5 has been pushed to the Fedora 35 stable repository. If problem still persists, please make note of it in this bug report.