Bug 2183436

Summary: Crash during debug, after pop, redefine, step into and StackFrame.thisObject()
Product: Red Hat Enterprise Linux 9 Reporter: Simeon Andreev <simeon.andreev>
Component: java-17-openjdkAssignee: Andrew John Hughes <ahughes>
Status: CLOSED MIGRATED QA Contact: OpenJDK QA <java-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 9.0CC: adinn, asaji, loskutov, mmillson, neugens
Target Milestone: rcKeywords: MigratedToJIRA
Target Release: ---Flags: pm-rhel: mirror+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-12 23:18:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Crash log when reproducing with Eclipse.
none
Crash log when reproducing with IntelliJ.
none
Screen recording when reproducing with IntelliJ.
none
Screen recording when reproducing with Eclipse.
none
Screen recording when reproducing with IntelliJ.
none
Screen recordign when reproducing with Eclipse. none

Description Simeon Andreev 2023-03-31 06:23:12 UTC
Created attachment 1954832 [details]
Crash log when reproducing with Eclipse.

Description of problem:

We are running into a JDK crash, I can reproduce the same crash with Eclipse and IntelliJ. The crash occurs after code hotswap.

I wasn't able to reproduce the crash with javac and jdb: the crash is seen during a call to StackFrame.thisObject() and I didn't find a way to call this with jdb.

The crash log lists the following stack trace:

Stack: [0x00007f6fc1626000,0x00007f6fc1726000],  sp=0x00007f6fc1723c30,  free space=1015k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xa9e82b]  GenerateOopMap::error_work(char const*, __va_list_tag*)+0x125
V  [libjvm.so+0xa9e92c]  GenerateOopMap::report_error(char const*, ...)+0xa6
V  [libjvm.so+0xa9e9a7]  GenerateOopMap::verify_error(char const*, ...)+0x77
V  [libjvm.so+0xa9a8ed]  GenerateOopMap::init_basic_blocks()+0x243
V  [libjvm.so+0xa9a630]  GenerateOopMap::do_interpretation()+0xb8
V  [libjvm.so+0xa9e678]  GenerateOopMap::compute_map(Thread*)+0x4be
V  [libjvm.so+0x103d106]  OopMapForCacheEntry::compute_map(Thread*)+0xfe
V  [libjvm.so+0x103dd39]  OopMapCacheEntry::fill(methodHandle const&, int)+0xdd
V  [libjvm.so+0x103ebed]  OopMapCache::compute_one_oop_map(methodHandle const&, int, InterpreterOopMap*)+0x4d
V  [libjvm.so+0xf99f24]  Method::mask_for(int, InterpreterOopMap*)+0x78
V  [libjvm.so+0x13c296d]  interpretedVFrame::stack_data(bool) const+0x77
V  [libjvm.so+0x13c28d3]  interpretedVFrame::locals() const+0x1d
V  [libjvm.so+0xdd9d2d]  VM_GetOrSetLocal::check_slot_type_no_lvt(javaVFrame*)+0xbf
V  [libjvm.so+0xdd9fa2]  VM_GetOrSetLocal::doit()+0x118
V  [libjvm.so+0x13e4031]  VM_Operation::evaluate()+0xe1
V  [libjvm.so+0x14484c4]  VMThread::evaluate_operation(VM_Operation*)+0x62
V  [libjvm.so+0x1448cb9]  VMThread::inner_execute(VM_Operation*)+0x30f
V  [libjvm.so+0x1449065]  VMThread::loop()+0x117
V  [libjvm.so+0x1448062]  VMThread::run()+0xfc
V  [libjvm.so+0x133b207]  Thread::call_run()+0x195
V  [libjvm.so+0x104fc85]  thread_native_entry(Thread*)+0x199

From what I've debugged in Eclipse (I cannot say about IntelliJ), the operations that are done would look like this in jdb:

stop at EclipseHotswapCrash$GlyphLayout:25
run
# change "ab" to "abc" in the source file, recompile
pop
redefine EclipseHotswapCrash$GlyphLayout EclipseHotswapCrash$GlyphLayout.class
step into
# StackFrame.thisObject() is called to update stack frame labels

Version-Release number of selected component (if applicable):

OpenJDK 17.0.6

How reproducible:

IntelliJ steps to Reproduce:
   1. Debug the snippet with breakpoint at line 25.
   2. On the main thread, at the paused stack frame, right-click -> Reset Frame.
   3. Change the string at line 22 from "ab" to "abc", save.
   4. Build, accept reloading classes.
   5. Step into.



Eclipse steps to Reproduce:
   1. Debug the snippet with breakpoint at line 25.
   2. Change the string at line 22 from "ab" to "abc", save.


Actual results:

The JDK crashes.

Expected results:

No crash, debugging works as expected.

Additional info:


See my comment here (there is some more info on the ticket as well): https://github.com/eclipse-jdt/eclipse.jdt.debug/issues/227#issuecomment-1491007209

Originally the bug was reported here: https://bugs.eclipse.org/bugs/show_bug.cgi?id=565537

I've reported a bug via: https://bugreport.java.com/bugreport/

Comment 1 Simeon Andreev 2023-03-31 06:23:30 UTC
Created attachment 1954833 [details]
Crash log when reproducing with IntelliJ.

Comment 2 Simeon Andreev 2023-03-31 06:23:53 UTC
Created attachment 1954835 [details]
Screen recording when reproducing with IntelliJ.

Comment 3 Simeon Andreev 2023-03-31 06:24:15 UTC
Created attachment 1954836 [details]
Screen recording when reproducing with Eclipse.

Comment 4 Simeon Andreev 2023-03-31 06:28:48 UTC
For some reason vokoscreen is not recording context menus on my desktop, if you have trouble following the reproduction steps (due to not seeing exactly what is being clicked where) let me know and I'll record again so that context menus are also visible.

Comment 5 Simeon Andreev 2023-03-31 06:55:54 UTC
Created attachment 1954838 [details]
Screen recording when reproducing with IntelliJ.

Comment 6 Simeon Andreev 2023-03-31 06:57:00 UTC
Created attachment 1954839 [details]
Screen recordign when reproducing with Eclipse.

Comment 7 Andrew Dinn 2023-04-24 09:11:19 UTC
Hi Simeon,

I have managed to reproduce this in a debug build of OpenJDK17 and am currently narrowing down the source of the problem. I'll provide more info as I proceed however here is a brief summary of the problem just for the record.

The error happens when reanalyzing the bytecode for the modified class in order to generate a map that identifies the location of live objects during interpretation. The problem is a disparity detected between two stages of the analysis, specifically those implemented by GenerateOopMap::mark_bbheaders_and_count_gc_points and GenerateOopMap::init_basic_blocks, the latter called from GenerateOopMap::do_interpretation.

The control-flow analysis performed in the first stage identifies 84 basic blocks (this is *the* unique, minimal partition of the bytecode where 1) no block contains internal control flow nodes and 2) the partition segments the sequence at all control transfer target locations). An ensuing dataflow pass is performed by GenerateOopMap::do_interpretation. It calls GenerateOopMap::init_basic_blocks to create the required block records and it only detects 83 basic block headers. The assumption when these counts do not match (an assumption that appears legitimate at first glance) is that the the bytecode is malformed in some way e.g. it contains an orphaned block or an invalid control transfer target. n.b. this would be a disparity in the modified version of the bytecode received by the JVM after reload -- the original version will have passed through this same code.

I have yet to determine whether the analysis or the bytecode received is faulty. If it is the latter then I will also need to see whether the fault lies only in the bytecode received or is also present at the dispatching (debugger) end.

Comment 8 Andrew Dinn 2023-04-24 13:02:07 UTC
Hi Simeon,

It turns out that the control analysis is operating on garbage method bytecode for method GlyphLayout.setText (although it is consistently the same garbage bytecode). For example, the original bytecode and the updated bytecode written by the javac compiler both include an if branch at bytecode 6 that branches to bytecode 15

    Code:
       0: aload_2
       1: ldc           #18                 // String abc
       3: invokevirtual #20                 // Method java/lang/Object.equals:(Ljava/lang/Object;)Z
       6: ifeq          15
       9: getstatic     #24                 // Field java/lang/System.out:Ljava/io/PrintStream;
      12: invokevirtual #30                 // Method java/io/PrintStream.println:()V
      15: aload_0
      16: getfield      #12                 // Field runs:Ljava/util/ArrayList;
          . . .

The corresponding trace output after setting TraceNewOopMapGeneration (n.b. I have labelled each line with the relevant bytecode index and instruction) is as follows:

    Method name: setText
    Basicblock#0 begins at: 0   # reported at bci  0
    Basicblock#1 begins at: 15  # reported at bci 6 (ifne goto 15)
    Basicblock#2 begins at: 9   # also reported at bci 6 (drop through)
    . . .


When the control flow analysis is performed the trace output is

    Method name: setText
    Basicblock#0 begins at: 0    # reported at bci  0
    Basicblock#1 begins at: 209  # reported at bci 7 (ifne goto 209)
    Basicblock#2 begins at: 10   # also reported at bci 7 (drop through)
    . . .

I will dig deeper to see if I can find out where this invalid bytecode originates. It is possible that the new bytecode (presented to the JVM by the debugger) has been corrupted in transit or, possibly, mangled by the JVM itself after receipt (the disk file is valid). However, I think that is unlikely. My suspicion is that the analysis is employing a stale handle for the method, or maybe just the constant pool, associated with the current frame (i.e. the setText frame that the debugger reverted to when the drop frame was performed) and as a result picking up bytecode info from memory that has been freed and reused.

However, before investigating that I am going to see if the error also reproduces on JDK head. If so I will raise an upstream JIRA and then attempt to fix the problem in head before returning to the jdk17 problem.

Comment 9 Andrew Dinn 2023-04-27 08:21:42 UTC
Hi Simeon,

I discovered that the problem does reproduce on JDK head. However, I also worked out what is wrong and it appears that the problem is due to an error in the JVMTI client. I am not yet clear whether the issue is in the JDWP agent or in the debugger on the other end of the JDWP connection (I was driving the test using IntelliJ Idea but I assume it is the same issue when using Eclipse to drive the test).

The sequence of events is as follows

  startup:
  0.  Client sets breakpoint in GlyphLayout.setText
     . . .
  Run to break
     . . .
  1 Client drops frame from GlyphLayout.setText to EclipseHotswapCrash.main
  2 Client clears breakpoint in GlyphLayout.setText
  3 Client redefines 6 app classes including inner class GlyphLayout
  4 Client sets breakpoint in GlyphLayout.setText
  5 Client single steps into GlyphLayout.setText (stopping at bci 0)
  6 Client reads locals from GlyphLayout.setText

The problem happens at step 4 and manifests at step 6. It happens because the breakpoint location supplied at 4 is invalid. It is actually the same location as was set during startup at step 0. However, thanks to the redefine the location does not now identify a valid bytecode operation index. The client needs to recalculate the line to bci translation using the redefined bytecode.

Here is how I found out what is happening

The original bytecode from a hex dump of the class file starts at 00000ec8:

  00000ec0: ---- ---- ---- ---- 2c12 12b6 0014 9900  ........,.......
  00000ed0: 09b2 0018 b600 1e2a b400 0c3a 0ab2 0023  .......*...:...#
  00000ee0: 190a b600 2719 0ab6 002d 2bb4 0030 3a0b  ....'....-+..0:.
  . . .

That's
  0x2c                 0: aload_2
  0x12 0x12            1: ldc           #18                 // String abc
  0xb6 0x00 0x14       3: invokevirtual #20                 // Method java/lang/Object.equals:(Ljava/lang/Object;)Z
  0x99 0x00 0x09       6: ifeq          15
  0xb2 0x00 0x18       9: getstatic     #24                 // Field java/lang/System.out:Ljava/io/PrintStream;
  0xb6 0x00 0x1e       12: invokevirtual #30                 // Method java/io/PrintStream.println:()V
  0x2a                 15: aload_0
  0xb4 0x00 0x0c       16: getfield      #12                 // Field runs:Ljava/util/ArrayList;

At steps 0 and 2 the break point location is correctly passed as 9 i.e. the index of the get_static bytecode.

At step 3 when RedefineClasses is executed the old and new bytecode for this method is as follows:

  (gdb) x/s _new_methods->_data[4]->_name->_body
  0x80068b416:	"setText"
  (gdb) x/20xb _old_methods->_data[4]->_constMethod + 1
  0x7fffbc402038:	0x2c	0xe6	0x00	0xb6	0x03	0x00	0x99	0x00
  0x7fffbc402040:	0x09	0xb2	0x04	0x00	0xb6	0x05	0x00	0x2a
  0x7fffbc402048:	0xb4	0x02	0x00	0x3a
  (gdb) x/20xb _new_methods->_data[4]->_constMethod + 1
  0x7fffbc407668:	0x2c	0xe7	0x01	0x00	0xb6	0x03	0x00	0x99
  0x7fffbc407670:	0x00	0x09	0xb2	0x04	0x00	0xb6	0x05	0x00
  0x7fffbc407678:	0x2a	0xb4	0x02	0x00

Note first of all that the old bytecode differs slightly from the code on disk thanks to some rewriting done by the JVM. Most notably, the two bytecodes at 0x7fffbc402039 and 0x7fffbc40203a are 0xe6 and 0x00 because the ldc has been rewritten to a fast load instruction (fast_aldc) that bypasses the initial link resolution check. The single trailing operand is also changed because the relevant value is available in the interpreter's link resolution cache.

  (gdb) p (Bytecodes::Code)0x12
  $454 = Bytecodes::_ldc
  (gdb) p (Bytecodes::Code)0xe6
  $452 = Bytecodes::_fast_aldc

Also, note that the new bytecode includes a wide fast load bytecode 0xe7 (fast_aldc_w) which has two operands. This causes all subsequent operands to be shifted along by one byte. That means that the bytecode submitted to the JVM before the link optimization rewrite was an ldc_w, also with two operands:

  (gdb) p (Bytecodes::Code)0x13
  $455 = Bytecodes::_ldc_w
  (gdb) p (Bytecodes::Code)0xe7
  $453 = Bytecodes::_fast_aldc_w

At step 6, the point where the crash occurs, the bytecode is as follows:

  (gdb) x/20b 0x7fffbc407668
  0x7fffbc407668:	0x2c	0xe7	0x01	0x00	0xb6	0x03	0x00	0x99
  0x7fffbc407670:	0x00	0xca	0xb2	0x04	0x00	0xb6	0x05	0x00
  0x7fffbc407678:	0x2a	0xb4	0x02	0x00

Notice that the offset associated with the ifeq has been changed from 0x09 to 0xca i.e. the if target bci has been modified from 16 to 209. In the new bytecode that actually indexes an operand byte of a goto rather than an operation bytecode. This is why the control flow analysis and the linear block scan show a disparity which manifests as a verify error. The bytecode is indeed invalid.

So, where did that 0xca come from? It's actually a misplaced breakpoint.

  (gdb) p (Bytecodes::Code)0xca
  $457 = Bytecodes::_breakpoint

At step 4 the client requested a break to be reset at location 9 i.e. what it assumes is the bci for the get_static bytecode that is the first operation marked as belonging to line 24. However, it ought to be passing location 10 -- the transform has adjusted the bytecode and associated line number mapping. The set breakpoint method simply does what it is asked and places a breakpoint bytecode at bci 9 i.e. overwrites the operand of the ifeq.

So, the problem is that whatever is driving the JVMTI APIs is not updating the line map and adjusting the bci for line breakpoint locations after the retransform. Since the debugger  on the end of the JDWP connection is responsible for supplying the rewritten bytecode I am assuming that it is also the responsibility of the debugger, rather than the JDWP agent code, to ensure that the breakpoint locations are correctly re-computed. Before I look further into how the jdwp code works can you confirm whether that assumption valid -- or if not explain what the debugger (Eclipse/IntelliJ) does and does not know about line numbers and bytecode locations?

Comment 10 Simeon Andreev 2023-05-01 10:08:57 UTC
Hi Andrew,

sorry, I was out-of-office. Great to hear you are looking into the issue!

> Before I look further into how the jdwp code works can you confirm whether that assumption valid -- or if not explain what the debugger (Eclipse/IntelliJ) does and does not know about line numbers and bytecode locations?

I cannot speak for IntelliJ as I don't know the code base - I used it to check whether the issue is seen only in Eclipse. I wasn't able to reproduce the problem in jdb and so needed an alternative.

For Eclipse, after hotswapping code, the breakpoint at line 24 is reinstalled. I do see an index 9 in the request written to the JVM, when installing the breakpoint. I also see index 9 when initially installing the breakpoint, when starting to debug. So Eclipse behavior is similar to what you have observed in IntelliJ.

After a look at where the 9 is coming from, it looks like Eclipse is asking the JVM (sorry if its the debug agent and not the JVM, I'm not too familiar with the distinction) about the line table of the method. The code is found here:

https://github.com/eclipse-jdt/eclipse.jdt.debug/blob/master/org.eclipse.jdt.debug/jdi/org/eclipse/jdi/internal/MethodImpl.java#L164

For the method EclipseHotswapCrash$GlyphLayout.setText(), I see line 24 is still mapped to index 9 after this call. The line table information is asked after Eclipse sends a request to redefine the changed classes, see:

https://github.com/eclipse-jdt/eclipse.jdt.debug/blob/master/org.eclipse.jdt.debug/model/org/eclipse/jdt/internal/debug/core/hcr/JavaHotCodeReplaceManager.java#L611 (redefine call)
https://github.com/eclipse-jdt/eclipse.jdt.debug/blob/master/org.eclipse.jdt.debug/model/org/eclipse/jdt/internal/debug/core/hcr/JavaHotCodeReplaceManager.java#L666 (reinstalling breakpoints, including reading line table information from methods) 

Stack trace when reading the line table, if you are interested in browsing the Eclipse code:

"Worker-41: Debug Event Dispatch" #396 prio=5 os_prio=0 cpu=71.50ms elapsed=834.09s tid=0x00007f7c1812ad10 nid=0xe8de at breakpoint [0x00007f7d1c5fd000]
  java.lang.Thread.State: RUNNABLE
       at org.eclipse.jdi.internal.MethodImpl.getLineTable(MethodImpl.java:211)
       at org.eclipse.jdi.internal.MethodImpl.javaStratumLineToCodeIndexes(MethodImpl.java:137)
       at org.eclipse.jdi.internal.MethodImpl.javaStratumLocationsOfLines(MethodImpl.java:882)
       at org.eclipse.jdi.internal.ReferenceTypeImpl.locationsOfLine(ReferenceTypeImpl.java:2033)
       at org.eclipse.jdi.internal.ReferenceTypeImpl.locationsOfLine(ReferenceTypeImpl.java:1662)
       at org.eclipse.jdt.internal.debug.core.breakpoints.JavaLineBreakpoint.determineLocations(JavaLineBreakpoint.java:387)
       at org.eclipse.jdt.internal.debug.core.breakpoints.JavaLineBreakpoint.newRequests(JavaLineBreakpoint.java:292)
       at org.eclipse.jdt.internal.debug.core.breakpoints.JavaBreakpoint.createRequest(JavaBreakpoint.java:517)
       at org.eclipse.jdt.internal.debug.core.breakpoints.JavaBreakpoint.createRequests(JavaBreakpoint.java:657)
       at org.eclipse.jdt.internal.debug.core.breakpoints.JavaBreakpoint.addToTarget(JavaBreakpoint.java:615)
       at org.eclipse.jdt.internal.debug.core.breakpoints.JavaLineBreakpoint.addToTarget(JavaLineBreakpoint.java:173)
       at org.eclipse.jdt.internal.debug.core.model.JDIDebugTarget.breakpointAdded(JDIDebugTarget.java:1636)
       at org.eclipse.jdt.internal.debug.core.model.JDIDebugTarget.reinstallBreakpointsIn(JDIDebugTarget.java:1039)
       at org.eclipse.jdt.internal.debug.core.hcr.JavaHotCodeReplaceManager.redefineTypesJDK(JavaHotCodeReplaceManager.java:666)
       at org.eclipse.jdt.internal.debug.core.hcr.JavaHotCodeReplaceManager.doHotCodeReplace(JavaHotCodeReplaceManager.java:471)
       at org.eclipse.jdt.internal.debug.core.hcr.JavaHotCodeReplaceManager.lambda$0(JavaHotCodeReplaceManager.java:317)
       at org.eclipse.jdt.internal.debug.core.hcr.JavaHotCodeReplaceManager$$Lambda$1491/0x0000000800cd8208.run(Unknown Source)
       at org.eclipse.debug.core.DebugPlugin$AsynchRunner.run(DebugPlugin.java:1187)
       at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:45)
       at org.eclipse.debug.core.DebugPlugin$AsynchRunner.async(DebugPlugin.java:1174)
       at org.eclipse.debug.core.DebugPlugin$EventDispatchJob.run(DebugPlugin.java:490)
       at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)

For the class file, I see:

  // Method descriptor #52 (Ltest/EclipseHotswapCrash$BitmapFont;Ljava/lang/CharSequence;IILtest/EclipseHotswapCrash$Color;FIZLjava/lang/String;)V
  // Stack: 7, Locals: 29
  public void setText(test.EclipseHotswapCrash.BitmapFont font, java.lang.CharSequence str, int start, int end, test.EclipseHotswapCrash.Color color, float targetWidth, int halign, boolean wrap, java.lang.String truncate);
      0  aload_2 [str]
      1  ldc <String "abc"> [53]
      3  invokevirtual java.lang.Object.equals(java.lang.Object) : boolean [55]
      6  ifeq 15
      9  getstatic java.lang.System.out : java.io.PrintStream [59]
     12  invokevirtual java.io.PrintStream.println() : void [65]
     15  aload_0 [this]
     16  getfield test.EclipseHotswapCrash$GlyphLayout.runs : java.util.ArrayList [47]
     19  astore 10 [runs]

This line seems to remain unchanged before and after changing "ab" to "abc":

      9  getstatic java.lang.System.out : java.io.PrintStream [59]

Unfortunately I'm out of my depth here. Is the Eclipse debugger supposed to read some other information here and not ask the JVM for the line table information, to retrieve the index of line 24 in the byte code? What is retrieved over JDI seems to match the disk state. I'm not sure what transformation you refer to with this paragraph:

> At step 4 the client requested a break to be reset at location 9 i.e. what it assumes is the bci for the get_static bytecode that is the first operation marked as belonging to line 24. However, it ought to be passing location 10 -- the transform has adjusted the bytecode and associated line number mapping. The set breakpoint method simply does what it is asked and places a breakpoint bytecode at bci 9 i.e. overwrites the operand of the ifeq.

Are there extra operations done by the JVM during the redefine? Eclipse triggers the redefine with the disk state of the changed classes (the disk contents are sent as a byte array, essentially).

Best regards,
Simeon

Comment 11 Andrew Dinn 2023-05-02 13:51:27 UTC
Hi Simeon,

(In reply to Simeon Andreev from comment #10)

> For Eclipse, after hotswapping code, the breakpoint at line 24 is
> reinstalled. I do see an index 9 in the request written to the JVM, when
> installing the breakpoint. I also see index 9 when initially installing the
> breakpoint, when starting to debug. So Eclipse behavior is similar to what
> you have observed in IntelliJ.

Right, thanks for confirming.

> After a look at where the 9 is coming from, it looks like Eclipse is asking
> the JVM (sorry if its the debug agent and not the JVM, I'm not too familiar
> with the distinction) about the line table of the method. The code is found
> here:
> 
> https://github.com/eclipse-jdt/eclipse.jdt.debug/blob/master/org.eclipse.jdt.
> debug/jdi/org/eclipse/jdi/internal/MethodImpl.java#L164

Great, that's very helpful to know.

> For the method EclipseHotswapCrash$GlyphLayout.setText(), I see line 24 is
> still mapped to index 9 after this call. The line table information is asked
> after Eclipse sends a request to redefine the changed classes, see:
> . . .

Ok, so a round trup to the JVM should be using the line info in the uploaded bytecode. So, I'll try to intercept the upload on the JVM side to see if 1) the ldc_w is present at upload (rather than added by the JVM) and if so 2) whether the line map has been updated correctly or not. If we have no ldc_w or we have ldc_w and updated line info then the fault will lie in the JVM. If we have an ldc_w but the odl line number then the problem is in Eclipse/IntelliJ (or in the bytecode transform library it is using).

> For the class file, I see:
> 
>   // Method descriptor #52
> (Ltest/EclipseHotswapCrash$BitmapFont;Ljava/lang/CharSequence;IILtest/
> EclipseHotswapCrash$Color;FIZLjava/lang/String;)V
>   // Stack: 7, Locals: 29
>   public void setText(test.EclipseHotswapCrash.BitmapFont font,
> java.lang.CharSequence str, int start, int end,
> test.EclipseHotswapCrash.Color color, float targetWidth, int halign, boolean
> wrap, java.lang.String truncate);
>       0  aload_2 [str]
>       1  ldc <String "abc"> [53]
>       3  invokevirtual java.lang.Object.equals(java.lang.Object) : boolean
> [55]
>       6  ifeq 15
>       9  getstatic java.lang.System.out : java.io.PrintStream [59]
>      12  invokevirtual java.io.PrintStream.println() : void [65]
>      15  aload_0 [this]
>      16  getfield test.EclipseHotswapCrash$GlyphLayout.runs :
> java.util.ArrayList [47]
>      19  astore 10 [runs]
> 
> This line seems to remain unchanged before and after changing "ab" to "abc":
> 
>       9  getstatic java.lang.System.out : java.io.PrintStream [59]

I'm assuming you mean the class file on disk after the rebuild? This suggests that the ldc_w is being inserted by the debug agent code on the JVM side. . .

> Unfortunately I'm out of my depth here. Is the Eclipse debugger supposed to
> read some other information here and not ask the JVM for the line table
> information, to retrieve the index of line 24 in the byte code? What is
> retrieved over JDI seems to match the disk state. I'm not sure what
> transformation you refer to with this paragraph:
> 
> > At step 4 the client requested a break to be reset at location 9 i.e. what it assumes is the bci for the get_static bytecode that is the first operation marked as belonging to line 24. However, it ought to be passing location 10 -- the transform has adjusted the bytecode and associated line number mapping. The set breakpoint method simply does what it is asked and places a breakpoint bytecode at bci 9 i.e. overwrites the operand of the ifeq.
> 
> Are there extra operations done by the JVM during the redefine? Eclipse
> triggers the redefine with the disk state of the changed classes (the disk
> contents are sent as a byte array, essentially).

The JVM interpreter does a small amount of rewriting of any loaded bytecode, whether it arrives via a normal class load or via a JDWP redefine. It only does this for a few special cases, switching real bytecodes for pseudo bytecodes that are not defined in the JVM spec. One example is the breakpoint bytecode I mentioned, 0x99. If you check the spec you will see that 0x99 is an unused opcode. When a debugger requests a breakpoint to be set at some line in a method the JDWP agent overwrites the first operation bytecode mapped to that line with a breakpoint bytecode. When the interpreter sees this it responds by trapping out and invoking the agent to handle the break.

The interpreter standardly replaces many ldc and ldc_w bytecodes with fast_aldc and fast_aldc_w at the first point where it executes them. Executing an ldc requires looking up a value in the constant pool of the class the method belongs to, using the operand (2 operands for ldc_w) to index the relevant constant. However, with object targets this lookup may involve several translation steps e.g. if the target is Foo.class then lookup the index of a class name string in the constant pool, locate the symbol table entry for that class relative to the current class's loader, if the class is not loaded or fully resolved then link load and resolve it, then go form the JVM internal class info to a java.lang.Class instance. Likewise, looking up a String may involve creating a java.lang.String object. So, after doing this the first time the interpreter caches the result in a cache local to the current class/method and overwrites the bytecode with a fast_aldc. It also updates the operand to be the local cache index.

There are a few other rewrites that can also happen but, as far as I am aware, the rewritten bytecode always expects to use exactly the same number of bytes as the original bytecode. So, the fast_aldc_w with two byte operands that is visible at the point of crash ought only have been added because the interpreter found an ldc_w with two operands. Clearly, that ldc_w is not present in the code on disk.

So, the question that I now need to address is how that ldc_w got into the bytecode the JVM is seeing. The follow-up question is why the line number info failed to be adjusted when the ldc_w was added -- or maybe it was changed but the JDWP agent/JVM is passing back stale line info? I'll let you know what I find.

Comment 12 Andrew Dinn 2023-05-02 14:04:04 UTC
Correction: The breakpoint bytecode is 0xca not 0x99.

Comment 13 Simeon Andreev 2023-05-02 15:09:36 UTC
> I'm assuming you mean the class file on disk after the rebuild? This suggests that the ldc_w is being inserted by the debug agent code on the JVM side. . .

Yes, those are the disk contents of the .class file. I did try a breakpoint in code Eclipse, before the .class file bytes are read, then replacing with the output of javac, and resuming Eclipse. The crash still occurred.

> If we have an ldc_w but the odl line number then the problem is in Eclipse/IntelliJ (or in the bytecode transform library it is using).

I'm not aware of Eclipse using any byte code transformation for/when hotswapping code, though I can be wrong. From what I debugged, changing the source (and saving) triggers a compile, a redefine of the affected classes (using the byte contents of the .class files written by the compile) and then breakpoints in affected sources are set again.

> The JVM interpreter does a small amount of rewriting of any loaded bytecode, whether it arrives via a normal class load or via a JDWP redefine. It only does this for a few special cases, switching real bytecodes for pseudo bytecodes that are not defined in the JVM spec. One example is the breakpoint bytecode I mentioned, 0x99. If you check the spec you will see that 0x99 is an unused opcode. When a debugger requests a breakpoint to be set at some line in a method the JDWP agent overwrites the first operation bytecode mapped to that line with a breakpoint bytecode. When the interpreter sees this it responds by trapping out and invoking the agent to handle the break.

This is very interesting, I didn't know how breakpoints are actually handled/implemented. Thank you for explaining!

Comment 14 Andrew Dinn 2023-05-02 16:26:55 UTC
Hi Simeon,

The bytecode loaded by the external debugger is correct and does not include a ldc_w for the load of String "abc". So, th eproblem is in the JVM.

I have identified the place where the error is being introduced by JVM code. It is in the implementation of class VM_RedefineClasses. This is an subclass of (C++) class VM_Operation which performs redefinition of a class at a safepoint. The problem occurs under set up method doit_prologue (the VM thread calls this before calling the action method doit). doit_prologue is responsible for preparing a new (C++) Klass to be switched into in the JVM Klass base in place of the current class.

The rewriting is happening because doit_prologue decides to play safe and merge the old and new constant pools from the original and new class rather than just use the newly presented constant pool (by calling VM_RedefineClasses::merge_constant_pools()). That means "ab" is kept at the original index and "abc" is added at the end of the pool (at index 264).

This also means that references to "abc" at index 0x12 in the new bytecode (e.g. the ldc at the start of setText) have to be rewritten to index 0x108 in the new bytecode (bu calling VM_RedefineClasses::rewrite_cp_refs_in_method()). Of course, that's now a 2 byte index so the ldc supplied in the new bytecode is rewritten as an ldc_w in the rewritten bytecode and subsequent instructions gets moved along one.

Of course, when the bytecode includes line number info then any line mappings at or beyond the rewritten ldc also need to be adjusted by 1 (or possibly by more than 1 should other ldc -> ldc_w rewrites occur). No other rewrites afect indices, just this one.

So, I'll raise an upstream bug and propose a fix for it. I'll also see if I can get it backported to jdk17 (possible) and maybe even jdk11 (probably unlikely).

Comment 15 Andrew Dinn 2023-05-03 08:19:35 UTC
Hi Simeon,

I raised the following upstream bug

  https://bugs.openjdk.org/browse/JDK-8307331

Comment 16 Simeon Andreev 2023-05-03 08:41:43 UTC
Sounds great, thank you for the exhaustive analysis!

Comment 17 Andrew Dinn 2023-05-04 09:41:50 UTC
Hi Simeon,

I have raised a pull request (PR) to fix this in OpenJDK head.

  https://github.com/openjdk/jdk/pull/13795

The problem is slightly more subtle than it appears and I have updated the JIRA accordingly.

VM_RedefineClass does include machinery to update the line number table (in a class called Relocator). This works when only one bytecode requires its index to be changed from 8 to 16 bytes, with a consequent offset of 1 applied to line numbers. The problem only occurs when multiple adjustments are needed which it turns out is the case in your reproducer for the bug.

In the latter case each adjustment is made in isolation to the original line number table, with the result that all but the last adjustment gets dropped (this also means the last adjustment may be applied at the wrong location). The fix is to apply the updates cumulatively. When my patch is applied your reproducer no longer crashes.

The patch in the PR should apply cleanly to jdk17 and fix the problem there too. I will schedule a backport once I have managed to get this PR reviewed and committed to head.

Comment 18 Simeon Andreev 2023-05-04 09:52:10 UTC
Hi Andrew,

sounds great, thank you very much!

Best regards,
Simeon

Comment 19 Andrew Dinn 2023-05-10 08:45:09 UTC
Hi Simeon,

The PR for jdk head has been integrated. I also opened a backport PR for jdk17u

  https://github.com/openjdk/jdk17u-dev/pull/1324

and this has been integrated.

I am not proposing to backport the change to jdk11u or jdk8u. The patch does not just affect the debugger. It can also affect apps that are using agents, potentially causing reported line numbers to change. In most cases this will only have a cosmetic effect on program output such as e.g. stack backtraces. However, I still am concerned that a change in reported line numbers, even if only in an exception backtrace,  may upset some apps notwithstanding the fact that hte current values may be wrong. Given the greater priority of maintaining stability for these older releases vs jdk17u I think the risk of backporting outweighs the potential gain in fixing a relatively rare debugger error, even if it is one which leads to a crash.

So, to summarize you will obtain a fix for this issue in the next jdk17u update release and also in jdk21 when it becomes available.

Comment 20 Simeon Andreev 2023-05-10 10:21:52 UTC
Sounds great! We are aiming to drop support for Java 11 soon, and instead offer Java 17 and Java 21 support in our product. So no downport to Java 11 and Java 8 is fine for us.

Once again thank you for fixing this!

Comment 21 RHEL Program Management 2023-09-12 23:17:04 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 22 RHEL Program Management 2023-09-12 23:18:11 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.