Bug 2162365
Summary: | JVM crash in PhaseIdealLoop::spinup with OpenJDK 17 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Simeon Andreev <simeon.andreev> | ||||||||||
Component: | java-17-openjdk | Assignee: | Roland Westrelin <rwestrel> | ||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | OpenJDK QA <java-qa> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | unspecified | CC: | asaji, loskutov, rwestrel | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2023-03-02 13:18:38 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Created attachment 1939137 [details]
Replay log from the crash.
Stack trace: Stack: [0x00007fffc4427000,0x00007fffc4528000], sp=0x00007fffc4522c40, free space=1007k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xd8ed42] PhaseIdealLoop::spinup(Node*, Node*, Node*, Node*, Node*, small_cache*) [clone .part.0]+0x52 V [libjvm.so+0xd8f3b2] PhaseIdealLoop::handle_use(Node*, Node*, small_cache*, Node*, Node*, Node*, Node*, Node*)+0x72 V [libjvm.so+0xd9046e] PhaseIdealLoop::do_split_if(Node*)+0xf8e V [libjvm.so+0xad6b4b] PhaseIdealLoop::split_if_with_blocks(VectorSet&, Node_Stack&)+0x17b V [libjvm.so+0xacdc3a] PhaseIdealLoop::build_and_optimize()+0x101a V [libjvm.so+0x5d061f] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x16f V [libjvm.so+0x5ce282] Compile::Optimize()+0xb92 V [libjvm.so+0x5cfe75] Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, bool, DirectiveSet*)+0xe65 V [libjvm.so+0x50ffb9] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xe9 V [libjvm.so+0x5d8ea8] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xf68 V [libjvm.so+0x5d9b38] CompileBroker::compiler_thread_loop()+0x508 V [libjvm.so+0xe59204] JavaThread::thread_main_inner()+0x184 V [libjvm.so+0xe5c98e] Thread::call_run()+0xde V [libjvm.so+0xc162c1] thread_native_entry(Thread*)+0xe1 Hi Simeon, Thanks for reporting this. Something is wrong with a key compile graph structure here (the dominator tree ). It's not going to be possible to identify the cause for that without a reproducer I am afraid. In case we do come up with one I am noting down what I have been able to deduce from the info in the crash log. Don't worry if it does not make any sense to you. If it does, enjoy! A disassembly of the code leading up to the faulting address indicates this error is happening in the dominator tree search. Here's the relevant source and code disassembly: Node *PhaseIdealLoop::spinup( Node *iff_dom, Node *new_false, Node *new_true, Node *use_blk, Node *def, small_cache *cache ) { if (use_blk->is_top()) // Handle dead uses return use_blk; Node *prior_n = (Node*)((intptr_t)0xdeadbeef); Node *n = use_blk; // Get path input assert( use_blk != iff_dom, "" ); // Here's the "spinup" the dominator tree loop. Do a cache-check // along the way, in case we've come this way before. while( n != iff_dom ) { // Found post-dominating point? prior_n = n; n = idom(n); // Search higher <=== errro here . . . 0x7ffff6e29cf0: push %rbp 0x7ffff6e29cf1: mov $0xdeadbeef,%r10d <== r10 == prior_n = 0xdeadbeef 0x7ffff6e29cf7: mov %rsp,%rbp 0x7ffff6e29cfa: push %r15 0x7ffff6e29cfc: push %r14 0x7ffff6e29cfe: push %r13 0x7ffff6e29d00: mov %r8,%r13 <== move use_blk to r13 == n 0x7ffff6e29d03: push %r12 0x7ffff6e29d05: mov %rsi,%r12 <== move iff_dom to r12 0x7ffff6e29d08: push %rbx : various other input arg saves 0x7ffff6e29d09: mov %rdi,%rbx 0x7ffff6e29d0c: sub $0x38,%rsp 0x7ffff6e29d10: mov %rdx,-0x40(%rbp) 0x7ffff6e29d14: mov 0x10(%rbp),%r14 0x7ffff6e29d18: mov %rcx,-0x48(%rbp) 0x7ffff6e29d1c: mov %r9,-0x38(%rbp) 0x7ffff6e29d20: mov %r8,-0x50(%rbp) 0x7ffff6e29d24: cmp %r12,%r13 <== compare n != iff_dom 0x7ffff6e29d27: je 0x7ffff6e29d94 0x7ffff6e29d29: nopl 0x0(%rax) : inlined code from PhaseIdealLoop.idom() 0x7ffff6e29d30: mov 0x9f8(%rbx),%rax <== load _idom array from PhaseIdealLoop (offset 0x9f8 is right) 0x7ffff6e29d37: mov 0x28(%r13),%edx <== load node _idx 0x7ffff6e29d3b: lea (%rax,%rdx,8),%rdi <== index node in _idom array 0x7ffff6e29d3f: mov (%rdi),%r15 <== load _in array field of node 0x7ffff6e29d42: mov 0x8(%r15),%rax <== index entry _in[0] !!!!! CRASH !!!!! 0x7ffff6e29d46: cmpq $0x0,(%rax) 0x7ffff6e29d4a: jne 0x7ffff6e29d72 0x7ffff6e29d4c: mov 0x28(%rbx),%ecx 0x7ffff6e29d4f: nop 0x7ffff6e29d50: mov 0x28(%r15),%eax 0x7ffff6e29d54: cmp %ecx,%eax 0x7ffff6e29d56: jae 0x7ffff6e2d740 The _idom entry associated with the index for use_blk is null. That indicates that the dominator tree has not been correctly derived. Given any block node there ought to be a dominating node for that block. n.b.there is no code for the check for use_blk->is_top(). However, when spinup is called from handle_use use_blk is known to be non-null. the only other call is a recursive one internal to spinup so it seems the top level check has been pushed down to the point of recursion. Please provide a reproducer for this bug if you can. Andrew, thanks for the analysis. Unfortunately we can't reproduce. The crashes here are reported while compilation tasks during our automated build, and very seldom. So we neither can't pinpoint if that is related to compilation of specific module or narrow down otherwise. One point may be related: I don't see from the crash dump, how long it took to the crash. Typically during compilation of our product we don't have a single JVM build deamon like gradle, but start *a lot* of short living JVM processes with ant compile tasks (I would guess over 300 per product build). So just a wild guess: could it be, the JVM runs into the crash right before or during shutdown, so the code here is running in the "unexpected" JVM state? And we observe the crash only during build time simply because the probability to get a crash on JVM shutdown is much higer there, with so many short living processes? Also, while we cant reproduce (in a reasonable amount of time), we can add any code you wish or enable diagnostics and run our compile with that. Maybe this can help narrow down the problem... (In reply to Andrey Loskutov from comment #6) > Andrew, thanks for the analysis. Unfortunately we can't reproduce. The > crashes here are reported while compilation tasks during our automated > build, and very seldom. So we neither can't pinpoint if that is related to > compilation of specific module or narrow down otherwise. In that case I am not sure there is anything we can do here. > One point may be related: I don't see from the crash dump, how long it took > to the crash. Typically during compilation of our product we don't have a > single JVM build deamon like gradle, but start *a lot* of short living JVM > processes with ant compile tasks (I would guess over 300 per product build). > > So just a wild guess: could it be, the JVM runs into the crash right before > or during shutdown, so the code here is running in the "unexpected" JVM > state? And we observe the crash only during build time simply because the > probability to get a crash on JVM shutdown is much higer there, with so many > short living processes? I think that is unlikely. The compiler thread is a VM thread operating on VM data. When the JVM shuts down it should get stopped cleanly as part of JVM shutdown. There should be no danger of that overwriting or freeing/unmapping data on which the compiler operates. Yet that is what we are seeing. What I think is more likely is that the dominator tree is being incorrectly computed, possibly because the underlying graph is not in the expected format. That may well be to do with the use of ecj to produce the bytecode. There is a great deal of room for bytecode compilers to transform the same Java source into different bytecode representations (e.g. ecj and OpenJDK javac 'model' loops as do {...} while (...) vs while (...) do {...}). The compiler may be making an unwarranted assumption about the shape of the bytecode which then manifests in an unexpected graph shape. The lack of consistent reproducibility could easily be because it depends on decisions that are timing or execution profile dependent (e.g. what code to inline, argument type profile optimizations etc). (In reply to Andrew Dinn from comment #8) > What I think is more likely is that the dominator tree is being > incorrectly computed, possibly because the underlying graph is not in the > expected format. That may well be to do with the use of ecj to produce the > bytecode. Sure, we use ecj and that definitely produces different class files compared to javac :-) > The lack > of consistent reproducibility could easily be because it depends on > decisions that are timing or execution profile dependent (e.g. what code to > inline, argument type profile optimizations etc). Any idea how we could "force" wrong decisions? We fully control our environment, so we can run JVM with any settings you want. Extra bytecode validation or whatever needed to better diagnose the issue. > Any idea how we could "force" wrong decisions? We fully control our
> environment, so we can run JVM with any settings you want.
> Extra bytecode validation or whatever needed to better diagnose the issue.
I am not really sure what would help with diagnosis. The only obviously related flags that you could play with are
1 SplitIfBlocks (product : true)
2 PrintDominators (develop : false)
3 VerifyLoopOptimizations (notproduct : false)
Setting that first option to false will bypass the problem by avoiding the calls to split_if that are blowing up. That might be useful for getting the compile to finish (albeit with lower quality compiled code) where it currently crashes. It would only help to clarify the problem if we still see errors when it is false. That would indicate that the problem is not just in the dominator computation
Option 2 requires running with a debug build. It will produce a *lot* of output in the scenario you describe where the error manifests. It really needs to be used with a reliable reproducer and, preferably, enabled from the debugger when you are about to compile a method that is know to cause the problem with only one active compile thread.
Option 3 also requires running with a debug build. It might possibly catch some problems but no guarantees. You can combine it with PrintOpto to get detailed info about the loop transforms, including the SplitIf transformation. However, that will lead to the same information overload outcome as option 2. So, again best used with a reproducer under debug.
You are probably getting all the bytecode verification you need (flag BytecodeVerificationRemote defaults to true).
I've discussed in my team, we don't have resources to try to reproduce. We are undergoing a RHEL 9 update, with which we'll update to latest available OpenJDK 17. This will likely take at least a few months. After the move, if we still see the issue when compiling our product (and its not fixed by the OpenJDK 17 update) we'll look into reproducing the problem. Due to problems in our IT infrastructure we are not sure how often the issue occurs. If that problem is fixed and we learn that the crash is "too frequent" we might change priority and look into reproducing the crash sooner. (In reply to Andrew Dinn from comment #8) > What I think is more likely is that the dominator tree is being > incorrectly computed, possibly because the underlying graph is not in the > expected format. That may well be to do with the use of ecj to produce the > bytecode. We've got same crash with JVM running spotbugs code *compiled by javac*. SIGSEGV (0xb) at pc=0x00007ffff6e29d42, pid=9275, tid=9302 Problematic frame: V [libjvm.so+0xd8ed42] PhaseIdealLoop::spinup(Node*, Node*, Node*, Node*, Node*, small_cache*) [clone .part.0]+0x52 Current thread (0x00007ffff01e5770): JavaThread "C2 CompilerThread0" daemon [_thread_in_native, id=9302, stack(0x00007fffa47fc000,0x00007fffa48fd000)] Current CompileTask: C2: 17367 10126 ! 4 edu.umd.cs.findbugs.detect.FindPuzzlers::sawOpcode (4108 bytes) Stack: [0x00007fffa47fc000,0x00007fffa48fd000], sp=0x00007fffa48f7c40, free space=1007k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xd8ed42] PhaseIdealLoop::spinup(Node*, Node*, Node*, Node*, Node*, small_cache*) [clone .part.0]+0x52 V [libjvm.so+0xd8f3b2] PhaseIdealLoop::handle_use(Node*, Node*, small_cache*, Node*, Node*, Node*, Node*, Node*)+0x72 V [libjvm.so+0xd9046e] PhaseIdealLoop::do_split_if(Node*)+0xf8e V [libjvm.so+0xad6b4b] PhaseIdealLoop::split_if_with_blocks(VectorSet&, Node_Stack&)+0x17b V [libjvm.so+0xacdc3a] PhaseIdealLoop::build_and_optimize()+0x101a V [libjvm.so+0x5d061f] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x16f V [libjvm.so+0x5ce282] Compile::Optimize()+0xb92 V [libjvm.so+0x5cfe75] Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, bool, DirectiveSet*)+0xe65 V [libjvm.so+0x50ffb9] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xe9 V [libjvm.so+0x5d8ea8] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xf68 V [libjvm.so+0x5d9b38] CompileBroker::compiler_thread_loop()+0x508 V [libjvm.so+0xe59204] JavaThread::thread_main_inner()+0x184 V [libjvm.so+0xe5c98e] Thread::call_run()+0xde V [libjvm.so+0xc162c1] thread_native_entry(Thread*)+0xe1 Beside the identical crash stack, the other common thing here is that the code that is being optimized is *huge*. Both org.eclipse.jdt.internal.compiler.lookup.BinaryTypeBinding::createMethod (we saw in the reported crash before) and edu.umd.cs.findbugs.detect.FindPuzzlers::sawOpcode (we see now) are pretty huge "spaghetti-style" methods. I wonder if that complex method code is contributing to the dominator tree being incorrectly computed. See - https://github.com/spotbugs/spotbugs/blob/30884910e0b72e114a85d98a5b7b17b40d2d684a/spotbugs/src/main/java/edu/umd/cs/findbugs/detect/FindPuzzlers.java#L174 - https://github.com/eclipse-jdt/eclipse.jdt.core/blob/7f8a17fea31dbd5361e54f274d09866c5e7982a0/org.eclipse.jdt.core.compiler.batch/src/org/eclipse/jdt/internal/compiler/lookup/BinaryTypeBinding.java#L928 With the new crash reported in spotbugs code we are working on creating a reproducer. We've managed so far to reproduce it in 15 out of 3000 executions (takes ~8 hours), which is kind of "stable" reproducer (I will attach crash logs in a moment). The crash happens while we start spotbugs analysis via ant on the application code in one of our projects. There is no Eclipse/ecj involved, just a pure standalone ant task that is supposed to do a static code analysis via spotbugs and usually completes in ~20 seconds. Interesting point we observed so far: the crash happens only on a "small" workstation with 64 GB RAM / 12 cores (and original one was reported on 32 GB / 4 core VM). It is not reproducible so far on 128+ GB / 16 core workstations. Note, that in the crash cases JVM used max heap below magic 32 GB border, so it was using "compressed pointers". Not sure if that could be one of important factors contributing to the crash, but so far we haven't received bug reports from "big" workstations. We plan to continue improving reproducer code / changing test environment so we can provide something we can share or have insight what exactly contributes to the crash. If we can instrument something that would help with analysis of this issue, please give us a pointer. Created attachment 1945451 [details]
crash logs from spotbugs execution
(In reply to Andrey Loskutov from comment #12) . . . > We've got same crash with JVM running spotbugs code *compiled by javac*. Thanks for pursuing this. Good to know that it is not ecj and even better that you now have a (relatively) reliable reproducer. Are you able to reliably reproduce the failure any more or less consistently using the replay file to rerun the compilation up to FindPuzzlers.sawOpcode? > Beside the identical crash stack, the other common thing here is that the > code that is being optimized is *huge*. > > Both > org.eclipse.jdt.internal.compiler.lookup.BinaryTypeBinding::createMethod (we > saw in the reported crash before) and > edu.umd.cs.findbugs.detect.FindPuzzlers::sawOpcode (we see now) are pretty > huge "spaghetti-style" methods. > > I wonder if that complex method code is contributing to the dominator tree > being incorrectly computed. Either of method size or complexity could be the immediate or indirect cause. However, it could also be many other things. The best way to find out would be to reproduce the problem in a debugger. > Interesting point we observed so far: the crash happens only on a "small" > workstation with 64 GB RAM / 12 cores (and original one was reported on 32 > GB / 4 core VM). It is not reproducible so far on 128+ GB / 16 core > workstations. > > Note, that in the crash cases JVM used max heap below magic 32 GB border, so > it was using "compressed pointers". Not sure if that could be one of > important factors contributing to the crash, but so far we haven't received > bug reports from "big" workstations. It may relate to the use of compressed oops. One thing you could maybe usefully try to check that hypothesis is to explicitly disable compressed oops while running with a heap below 32GB and see if the problem goes still happens. Of course, the test is asymmetrical, given the relatively low failure rate you are currently seeing -- if you don't see a failure that doesn't guarantee compressed oops is the culprit. > We plan to continue improving reproducer code / changing test environment so > we can provide something we can share or have insight what exactly > contributes to the crash. Ok, thanks for pursuing it. > If we can instrument something that would help with analysis of this issue, > please give us a pointer. The best thing would be for us to be able to reproduce the bug reliably, preferably in a debug build of OpenJDK but even the ability to do so in a product release would be a big help. It doesn't really matter whether we achieve that by running your job or by rerunning compiles using the replay file. (In reply to Andrew Dinn from comment #14) > Are you able to reliably reproduce the failure any more or less consistently > using the replay file to rerun the compilation up to FindPuzzlers.sawOpcode? Please provide instructions how to do that, I will try. I was remembered by stack overflow doing that few years ago (see my own answer https://stackoverflow.com/questions/33759206/java-replay-log-diagnosing-out-of-memory-error) but the links I've put there are gone. > The best way to find out > would be to reproduce the problem in a debugger. :) I'm working towards a better reproducer... > It may relate to the use of compressed oops. One thing you could maybe > usefully try to check that hypothesis Looks like that is not heap size dependent. I've run test with smaller heaps with no crashes. However, using some VM flags that were set by JVM on a smaller workstation I was able to reproduce crash on 16 core, with oops disabled (I believe so, since heap size was 32 GB). The flags I've set to 16 core were taken by looking on a diff between "java -XX:+PrintFlagsFinal -version" execution on 12 vs 16 core machines: -XX:CICompilerCount=4 -XX:NonNMethodCodeHeapSize=5839372 -XX:NonProfiledCodeHeapSize=122909434 -XX:ProfiledCodeHeapSize=122909434 -XX:G1ConcRefinementThreads=10 -XX:ParallelGCThreads=10 -XX:AllocatePrefetchInstr=0 > > If we can instrument something that would help with analysis of this issue, > > please give us a pointer. > > The best thing would be for us to be able to reproduce the bug reliably, > preferably in a debug build of OpenJDK I will see if I can get a debug build somehow. Simeon is on vacation, he usually managed these JVM builds. BTW, I've got also a core file - do you want/need it? That is about 200 MB packed, so I guess I can put it somewhere on the web if you need. > BTW, I've got also a core file - do you want/need it? That is about 200 MB
> packed, so I guess I can put it somewhere on the web if you need.
Yes, please.
If that doesn't help, we can look at the replay thing next unless you find a reproducer in the meantime.
(In reply to Roland Westrelin from comment #18) > > BTW, I've got also a core file - do you want/need it? That is about 200 MB > > packed, so I guess I can put it somewhere on the web if you need. > > Yes, please. > If that doesn't help, we can look at the replay thing next unless you find a > reproducer in the meantime. Here is it: https://drive.google.com/file/d/1NrWkj0aOztD8basGtFP-vUuWv-pft0EF/view?usp=sharing Meanwhile I have ~5+ core files (growing), if that one is not good enough, I can give you more :-) > Here is it: > https://drive.google.com/file/d/1NrWkj0aOztD8basGtFP-vUuWv-pft0EF/ > view?usp=sharing Thanks. > Meanwhile I have ~5+ core files (growing), if that one is not good enough, I > can give you more :-) Could you share one of them from a eclipse crash? (In reply to Roland Westrelin from comment #20) > > Meanwhile I have ~5+ core files (growing), if that one is not good enough, I > > can give you more :-) > > Could you share one of them from a eclipse crash? By "eclipse" you probably mean ecj compiler task crashes? No, the crashes I can reproduce (and from which I have core files) are all from spotbugs task execution, not from Eclipse compiler. I'm close to the state where I can share reproducer because I've managed to crash it without using our internal code yesterday. Give me a day more to polish that. The crash rate is still 1/1000 executions, but with the script & -XX:OnError command one can attach debugger or do something else. > I'm close to the state where I can share reproducer because I've managed to
> crash it without using our internal code yesterday.
> Give me a day more to polish that.
>
> The crash rate is still 1/1000 executions, but with the script & -XX:OnError
> command one can attach debugger or do something else.
Excellent! Thanks for taking the time to put a reproducer together.
Created attachment 1945931 [details] reproducer application I've attached bug_2162365_reproducer.zip that contains all required binaries except JVM itself and a script that will run the loop. Extract it somewhere with enough space for future core files and see README for more details how to run reproducer. The script just runs ant "findbugs" task in a loop, and the task just runs spotbugs over spotbugs & bcel libraries using a single FindPuzzlers bug detector (see https://github.com/spotbugs/spotbugs/blob/30884910e0b72e114a85d98a5b7b17b40d2d684a/spotbugs/src/main/java/edu/umd/cs/findbugs/detect/FindPuzzlers.java#L174). It crashes 1 to 3 times from 1000 executions in our environment. I've got it crashed on a 4 core virtual machine on RH 9 / Java 17.0.2 from RH9, and on 12 & 16 core bare metal machines running RH 7.9 & Java 17.0.4. > I've attached bug_2162365_reproducer.zip that contains all required binaries
> except JVM itself and a script that will run the loop.
Thanks. I will try it.
I could reproduce it and analyze it. Thanks again for the reproducer, that was very helpful. I believe it's a known issue: https://bugs.openjdk.org/browse/JDK-8280696 that was backported to 17.0.5. With the patch for that bug fix, I can't reproduce the crash anymore. Can you confirm you haven't seen it with 17.0.5 or newer? (In reply to Roland Westrelin from comment #25) > I could reproduce it and analyze it. Thanks again for the reproducer, that > was very helpful. > > I believe it's a known issue: https://bugs.openjdk.org/browse/JDK-8280696 > that was backported to 17.0.5. With the patch for that bug fix, I can't > reproduce the crash anymore. Since you've analyzed that, and bug JDK-8280696 provides zero information about crash preconditions, could you please elaborate which preconditions need to be met to encounter it and if there are any workarounds possible to avoid the crash? This would be helpful to decide if we can "live" with the crash / 17.0.4 till we get a fix for https://bugzilla.redhat.com/show_bug.cgi?id=2138897 or if we have to plan & evaluate update to the 17.0.6 to get the crash fix ASAP. > Can you confirm you haven't seen it with 17.0.5 > or newer? Tests are running on 3 workstations using the reproducer / Java 17.0.6+10, I will give an update after a crash free day. > Since you've analyzed that, and bug JDK-8280696 provides zero information > about crash preconditions, could you please elaborate which preconditions > need to be met to encounter it and if there are any workarounds possible to > avoid the crash? The only reliable way to avoid the crash is to exclude the method from JIT compilation. -XX:CompileCommand=exclude,edu.umd.cs.findbugs.detect.FindPuzzlers::sawOpcode Unless that method is critical for performance, that could be good enough. I can't answer the question about preconditions. The transformation that triggers the crash is used routinely. Disabling it entirely would likely have a significant performance impact. Some code pattern triggers this. Figuring what it is would take a while (in your crash, it shows up after the compiler has extensively transformed the code and tracing back what happens is complicated). It's also unlikely to help: the sawOpcode method would then need to be modified somehow so the code pattern is removed from there. The JIT compiler trims the method it compiles, inlines some other, duplicates part of the code. So it's quite possible it wouldn't even be possible to locate where that code pattern is in the sawOpcode. > Tests are running on 3 workstations using the reproducer / Java 17.0.6+10, I > will give an update after a crash free day. Thanks. I haven't seen crashes so far, after ~4000x5 executions of reproducer on 3 different workstations using Java 17.0.6+10 OpenJDK build from Adoptium, so it looks like this bug can be closed. Thank you for analysis. (In reply to Andrey Loskutov from comment #28) > I haven't seen crashes so far, after ~4000x5 executions of reproducer on 3 > different workstations using Java 17.0.6+10 OpenJDK build from Adoptium, so > it looks like this bug can be closed. Thank you for analysis. Thanks for doing the runs. |
Created attachment 1939136 [details] Crash log from OpenJDK. Description of problem: We have an OpenJDK crash while compiling with our source code with ECJ (the Eclipse Java Compiler). Version-Release number of selected component (if applicable): openjdk version "17.0.4" 2022-07-19 OpenJDK Runtime Environment Temurin-17.0.4+8 (build 17.0.4+8) OpenJDK 64-Bit Server VM Temurin-17.0.4+8 (build 17.0.4+8, mixed mode, sharing) How reproducible: We don't have steps to reproduce, so far we have seen the crash twice during builds. Additional info: We are still on RHEL 7.9, using the Eclipse Temurin JDK 17 builds for Linux. I'm unable to open an OpenJDK 17 bug for RHEL 7 though, so I'm opening one for RHEL 9. Fixing the crash on RHEL 9 is enough for us, since the fix will be in OpenJDK 17 which we can roll out.