Bug 1602008
Summary: | java-1.8.0-openjdk: jstack: mixed mode stack trace issue for i686 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | zzambers | ||||
Component: | java-1.8.0-openjdk | Assignee: | Severin Gehwolf <sgehwolf> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | OpenJDK QA <java-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.5 | CC: | ahughes, dbhole, jvanek | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | java-1.8.0-openjdk-1.8.0.212.b02-1.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-06-09 17:53:24 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
zzambers
2018-07-17 15:14:36 UTC
This is also reproducible in a plain i686 mock. Hence I've adjusted the synopsis of this bug. This looks to be an issue with glibc not being compiled with -fno-omit-frame-pointer and the serviceability code isn't being smart enough to figure it out without the frame pointer being compiled in. In this case glibc. The serviceability code currently pretty much relies on a valid ebp register value. From LinuxCDebugger.topFrameForThread(): String cpu = dbg.getCPU(); if (cpu.equals("x86")) { X86ThreadContext context = (X86ThreadContext) thread.getContext(); Address ebp = context.getRegisterAsAddress(X86ThreadContext.EBP); if (ebp == null) return null; Address pc = context.getRegisterAsAddress(X86ThreadContext.EIP); if (pc == null) return null; return new LinuxX86CFrame(dbg, ebp, pc); Then, when getting the next CFrame this trips over an implicit is-address-mapped-in-check for ebp (zero-offset) value 'f0' in LinuxX86CFrame.sender(): Address nextEBP = ebp.getAddressAt( 0 * ADDRESS_SIZE); Hence, the exception. Not sure what to do about this. # jps 935 Jps 825 Foo # jstack -m 825 [...] ----------------- 834 ----------------- 0xf771f430 ???????? sun.jvm.hotspot.debugger.UnmappedAddressException: f0 at sun.jvm.hotspot.debugger.PageCache.checkPage(PageCache.java:208) at sun.jvm.hotspot.debugger.PageCache.getData(PageCache.java:63) at sun.jvm.hotspot.debugger.DebuggerBase.readBytes(DebuggerBase.java:225) at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.readCInteger(LinuxDebuggerLocal.java:498) at sun.jvm.hotspot.debugger.DebuggerBase.readAddressValue(DebuggerBase.java:462) at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.readAddress(LinuxDebuggerLocal.java:433) at sun.jvm.hotspot.debugger.linux.LinuxAddress.getAddressAt(LinuxAddress.java:74) at sun.jvm.hotspot.debugger.linux.x86.LinuxX86CFrame.sender(LinuxX86CFrame.java:69) at sun.jvm.hotspot.tools.PStack.run(PStack.java:161) at sun.jvm.hotspot.tools.PStack.run(PStack.java:58) at sun.jvm.hotspot.tools.PStack.run(PStack.java:53) at sun.jvm.hotspot.tools.JStack.run(JStack.java:66) at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260) at sun.jvm.hotspot.tools.Tool.start(Tool.java:223) at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118) at sun.jvm.hotspot.tools.JStack.main(JStack.java:92) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.tools.jstack.JStack.runJStackTool(JStack.java:140) at sun.tools.jstack.JStack.main(JStack.java:106) Note that '834' above is the LWP id of the thread. Then, doing the same in gdb, attaching to 825 we see for LWP 834, Thread 7, this backtrace: # gdb -ex 'set confirm off' -ex 'handle SIGSEGV pass nostop noprint' -ex 'set breakpoint pending on' -ex 'thread apply all bt' -ex 'quit' -p 825 [...] Thread 7 (Thread 0xa3863b40 (LWP 834)): #0 0xf771f430 in __kernel_vsyscall () #1 0xf7703acc in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=1, futex=0xf770f000) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43 #2 do_futex_wait (sem=0xf770f000, sem@entry=0xf70ea854 <sig_sem>, abstime=0x0) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:226 #3 0xf7703bb7 in __new_sem_wait_slow (sem=0xf70ea854 <sig_sem>, abstime=0x0) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:407 #4 0xf6cc18d4 in check_pending_signals (wait=true) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.171-8.b10.el7_5.i386/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:2522 #5 0xf6cbc632 in signal_thread_entry (thread=0xa37a4800, __the_thread__=0xa37a4800) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.171-8.b10.el7_5.i386/openjdk/hotspot/src/share/vm/runtime/os.cpp:250 #6 0xf6e0f5f7 in JavaThread::thread_main_inner (this=0xa37a4800) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.171-8.b10.el7_5.i386/openjdk/hotspot/src/share/vm/runtime/thread.cpp:1699 #7 0xf6cc336a in java_start (thread=0xa37a4800) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.171-8.b10.el7_5.i386/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:796 #8 0xf76fdb1c in start_thread (arg=0xa3863b40) at pthread_create.c:308 #9 0xf75fd45e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131 [...] So we are indeed in glibc code where jstack fails to provide a backtrace. Looking at the registers for that thread in that moment we see that ebp holds value 'f0' which doesn't look like a reasonable frame pointer address. Yet, the serviceability agent code assumes that. When it tries to verify if the address at ebp is mapped in, it fails and the exception is being thrown. # gdb -ex 'set confirm off' -ex 'handle SIGSEGV pass nostop noprint' -ex 'set breakpoint pending on' -ex 'thread apply 7 info registers' -ex 'quit' -p 825 Thread 7 (Thread 0xa3863b40 (LWP 834)): eax 0xfffffe00 -512 ecx 0x80 128 edx 0x1 1 ebx 0xf70ea854 -150034348 esp 0xa3862fa0 0xa3862fa0 ebp 0xf0 0xf0 esi 0x0 0 edi 0xf770f000 -143593472 eip 0xf771f430 0xf771f430 <__kernel_vsyscall+16> eflags 0x282 [ SF IF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99 Looking some more into the ebp register values via gdb we see that there are more of the dont-look-like-valid-frame-pointer-values kind: # for i in $(seq 1 15); do gdb -ex 'set confirm off' -ex 'handle SIGSEGV pass nostop noprint' -ex 'set breakpoint pending on' -ex "thread apply $i info registers" -ex 'quit' -p 825; done > gdb_register_info.825.log # grep ebp gdb_register_info.825.log ebp 0xf642ab40 0xf642ab40 ebp 0xa2ec91b0 0xa2ec91b0 ebp 0x3 0x3 ebp 0xa2f9b0f0 0xa2f9b0f0 ebp 0xa301c0f0 0xa301c0f0 ebp 0xa309d0f0 0xa309d0f0 ebp 0xf0 0xf0 ebp 0x3 0x3 ebp 0x3 0x3 ebp 0xa38e40a0 0xa38e40a0 ebp 0x3 0x3 ebp 0x3 0x3 ebp 0x3 0x3 ebp 0x3 0x3 ebp 0xf6429e40 0xf6429e40 However, only 'f0' is throwing the exception because 0x3 isn't 4-bytes aligned and the serviceablity code checks for this. From LinuxX86CFrame.java: // Check alignment of ebp if ( dbg.getAddressValue(ebp) % ADDRESS_SIZE != 0) { return null; } What's curious is that there should be a check for "valid" frame pointers, and there is from LinuxX86CFrame.sender(): X86ThreadContext context = (X86ThreadContext) thread.getContext(); Address esp = context.getRegisterAsAddress(X86ThreadContext.ESP); if ( (ebp == null) || ebp.lessThan(esp) ) { return null; } So given that esp is supposed to be "0xa3862fa0", it's strange that this check would pass. Debugging this reveals: main[1] print thread.getContext().getRegisterAsAddress(sun.jvm.hotspot.debugger.x86.X86ThreadContext.ESP) thread.getContext().getRegisterAsAddress(sun.jvm.hotspot.debugger.x86.X86ThreadContext.ESP) = null main[1] print thread.getContext().getRegisterAsAddress(sun.jvm.hotspot.debugger.x86.X86ThreadContext.SP) thread.getContext().getRegisterAsAddress(sun.jvm.hotspot.debugger.x86.X86ThreadContext.SP) = "0xa3862fa0" So we are getting a null and falsely pass this check. Digging some deeper I've traced it down to the native ptrace() system call for LWP thread 834. At that point we seem to be getting the correct values for ebp and esp: (gdb) p user $1 = (struct user_regs_struct *) 0xa34feee8 (gdb) next 142 return true; (gdb) list 137 #ifdef PTRACE_GETREGS_REQ 138 if (ptrace_getregs(PTRACE_GETREGS_REQ, pid, user, NULL) < 0) { 139 print_debug("ptrace(PTRACE_GETREGS, ...) failed for lwp %d\n", pid); 140 return false; 141 } 142 return true; 143 #elif defined(PTRACE_GETREGSET) 144 struct iovec iov; 145 iov.iov_base = user; 146 iov.iov_len = sizeof(*user); (gdb) p user->ebp $2 = 240 (gdb) p user->esp $3 = -1551487072 (gdb) p/x user->esp $4 = 0xa3862fa0 (gdb) p pid $5 = 834 Question is where things get wrong... Aha! The native function to fill in native register values uses SP as the index into the data array holding register values. From agent/src/os/linux/LinuxDebuggerLocal.c around line 378: #ifdef i386 #define REG_INDEX(reg) sun_jvm_hotspot_debugger_x86_X86ThreadContext_##reg regs[REG_INDEX(GS)] = (uintptr_t) gregs.xgs; regs[REG_INDEX(FS)] = (uintptr_t) gregs.xfs; regs[REG_INDEX(ES)] = (uintptr_t) gregs.xes; regs[REG_INDEX(DS)] = (uintptr_t) gregs.xds; regs[REG_INDEX(EDI)] = (uintptr_t) gregs.edi; regs[REG_INDEX(ESI)] = (uintptr_t) gregs.esi; regs[REG_INDEX(FP)] = (uintptr_t) gregs.ebp; regs[REG_INDEX(SP)] = (uintptr_t) gregs.esp; regs[REG_INDEX(EBX)] = (uintptr_t) gregs.ebx; regs[REG_INDEX(EDX)] = (uintptr_t) gregs.edx; regs[REG_INDEX(ECX)] = (uintptr_t) gregs.ecx; regs[REG_INDEX(EAX)] = (uintptr_t) gregs.eax; regs[REG_INDEX(PC)] = (uintptr_t) gregs.eip; regs[REG_INDEX(CS)] = (uintptr_t) gregs.xcs; regs[REG_INDEX(SS)] = (uintptr_t) gregs.xss; #endif /* i386 */ Note: SP is aliased to UESP in serviceability agent Java code. But neither are being used in the frame pointer validity check in LinuxX86CFrame.sender. ESP is! (gdb) info frame Stack level 0, frame at 0xa34fef40: eip = 0xe721b1f0 in Java_sun_jvm_hotspot_debugger_linux_LinuxDebuggerLocal_getThreadIntegerRegisterSet0 (/usr/src/debug/java-1.8.0-openjdk-1.8.0.171-8.b10.el7_5.i386/openjdk/hotspot/agent/src/os/linux/LinuxDebuggerLocal.c:370); saved eip 0xe769339a called by frame at 0xa34fef80 source language c. Arglist at 0xa34fef38, args: env=0xa3cf8d2c, this_obj=0xa34fef8c, lwp_id=834 Locals at 0xa34fef38, Previous frame's sp is 0xa34fef40 Saved registers: ebx at 0xa34fef30, ebp at 0xa34fef38, esi at 0xa34fef34, eip at 0xa34fef3c (gdb) p/x gregs->esp $12 = 0xa3862fa0 (gdb) p/x gregs->ebp $13 = 0xf0 So the native bits fill in values correctly, but the Java code then uses the wrong value. Created attachment 1469966 [details]
Proposed patch, fixing the FP validity check on i686
I've filed upstream bug JDK-8208091 for this, since it's reproducible on latest JDK sources too. The upstream bug has been fixed: http://hg.openjdk.java.net/jdk/jdk/rev/2e98c7737d8f What's pending is a backport to 8. I've asked for JDK 11 push approval but it wasn't deemed critical enough for that. The fix has been pushed to JDK 8u: http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/dbcf38c78dae This was fixed over a year ago in 8u211-b01 and first shipped in RHEL 7 in the 8u212 updates. Closing. |