Bug 1850710

Summary: "PC not saved" message, no disassembly on attempt to stepi through JIT-compiled code
Product: Red Hat Enterprise Linux 8 Reporter: Ben Crocker <bcrocker>
Component: gdbAssignee: Keith Seitz <keiths>
gdb sub component: system-version QA Contact: Michal Kolar <mkolar>
Status: CLOSED MIGRATED Docs Contact:
Severity: high    
Priority: unspecified CC: aph, gdb-bugs, keiths, ohudlick, sergiodj
Version: 8.3Keywords: MigratedToJIRA, Reopened, Triaged
Target Milestone: rc   
Target Release: 8.0   
Hardware: s390x   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1852580 (view as bug list) Environment:
Last Closed: 2023-12-13 17:07:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
mesa-meson.sh -- script to do what rhpkg does to build Mesa none

Description Ben Crocker 2020-06-24 19:12:26 UTC
Description of problem:

When I attempt to single-step into JIT-compiled code on S390x,
my gdb process gets stopped and backgrounded as soon as I execute
the basr instruction that called the JIT-compiled function.
I can fg the process, at which point gdb reports "PC not saved."
If I stepi again, exactly the same thing happens.

Each time I fg the process, I can see the pc incrementing, and
instruction effects (e.g. changes to contents of general registers)
have clearly taken place.  The command "disp/i $pc" works as
expected.  The "disassemble" command works fine, e.g.
disassemble $pc, $pc+10.

But stepi is broken and essentially unusable.

Gdb from the Developer Tool Set is marginally better, in that
it does not stop and kick me out to the command line, but it
still reports "PC not saved" and otherwise operates as above.


Version-Release number of selected component (if applicable):

gdb 8.2-12.el8
gcc-toolset-10-gdb 9.2-2.el8
kernel 4.18.0-214.el8.s390x

How reproducible:

100%


Steps to Reproduce:
1. Install tigervnc, metacity, and all mesa packages including debuginfo
2. Do NOT install gnome-shell at this time; use metacity as window manager:
3. cd /usr/bin ; ln -s metacity twm
4. Install Mesa demos from https://gitlab.freedesktop.org/mesa/demos and build
5. Start VNC server with display =, e.g., :1
6. Start the window manager if it hasn't already started: twm
7. export DISPLAY=:1
8. Start an X application like xclock or xterm to verify operation of server
   and window manager
9. In <mesa demos>/src/trivial, gdb tri
10. set a breakpoint in llvm_pipeline_generic
11. Once you hit that breakpoint, look for a line that looks like this:
    clipped = fpme->current_variant->jit_func(&fpme->llvm->jit_context,...
    Should be ~line 617.
12. Continue until you hit that breakpoint
13. (gdb) set disassemble-next-line on
14. stepi until you're about to execute the basr instruction to call
    the JIT-compiled function;
15. stepi one more time (i.e. execute the basr), bad behavior will occur.

Actual results:

As described above

Expected results:

Execution and disassembly continue seamlessly in the new code.

Additional info:

Comment 1 Ben Crocker 2020-06-25 14:33:51 UTC
I have now seen the "PC not saved" misbehavior when I attempt to
step OVER the JIT-compiled function.

Comment 2 Keith Seitz 2020-06-25 14:40:18 UTC
I am working on your reproducer, but I don't have high hopes. s390x boxes are in VERY short supply, and reservations are seldom "permitted" for more than a day. [Every time I try to keep a s390x box for more than a day, I get emails from others wanting access to it.]

Was there a question in here somewhere? Clearing NEEDINFO.

Comment 3 Keith Seitz 2020-06-25 16:07:22 UTC
I have reproduced borrowing the reporter's beaker machine.

Comment 4 Ben Crocker 2020-06-26 00:55:41 UTC
Created attachment 1698869 [details]
mesa-meson.sh -- script to do what rhpkg does to build Mesa

Attached is mesa-meson.sh, which is a script I developed
to do what rhpkg local does to build Mesa on non-X86
architectures.

First, you will need a Mesa repo:

# git clone https://gitlab.freedesktop.org/mesa/mesa

Then you need to get to the correct version of Mesa:

# cd mesa
# git checkout mesa-20.1.1

Then you need to build and install Mesa:

# mesa-meson.sh debug install

(This actually does a meson step, a ninja step, and a ninja-install step.)

If you happen to make any modifications to Mesa code, you should
be able to rebuild and reinstall by doing this:

# mesa-meson.sh ninja install

If, for some reason, you want to start over with a clean slate:

# mesa-meson.sh clean debug install

Comment 5 Ben Crocker 2020-06-26 01:18:06 UTC
Running the 'tri' demo program in the Mesa demo tree that you cloned
and built earlier:

First, cd to .../demos/src/trivial, and make sure your DISPLAY
environment points to the display you want, e.g.

# export DISPLAY=:1

to display to your VNC server.

Run 'tri' as described above.

Comment 6 Keith Seitz 2020-06-29 16:32:49 UTC
Just some notes about this bug while I was investigating (to archive).

To setup:

On s390x box:

dnf install -y mesa-* tigervnc-server metacity xterm
dnf builddep -y mesa-demos
pushd /usr/bin ; ln -s metacity twm ; popd
git clone https://gitlab.freedesktop.org/mesa/demos.git;
cd demos
./autogen.sh
make -j2 CFLAGS="-g -O0"
vncpasswd # enter "redhat"
echo ":1=root" >> /etc/tigervnc/vncserver.users
echo "session=gnome" >> /etc/tigervnc/vncserver-config-defaults
firewall-cmd --permanent --zone=public --add-port=5950/tcp
firewall-cmd --reload
/usr/bin/Xvnc :1 -desktop `hostname`:1 -fp catalogue:/etc/X11/fontpath.d -pn -rfbauth /root/.vnc/passwd -rfbport 5901 -rfbwait 30000 -noreset >& /dev/null &
twm &
xterm &

Locally:

vncviewer BEAKER_MACHINE:1
# enter above vncpasswd into dialog ("redhat")

# Follow gdb install/setup and build gdb

Simple reproducer:

./gdb --data-directory data-directory -nx -iex "set height 0" ../../demos/src/trivial/tri -ex "set breakpoint pending on" -ex "b llvm_pipeline_generic" -ex r -ex "until 617" -ex "disp/i \$pc" -ex "stepi 20" -ex "printf \"stepi here\n\""


After looking at this bug, it is clear that there is something not quite right about the inline frame sniffer.

That calls (eventually) frame_unwind_pc which throws an exception ("PC not saved"). That *surely* must be caught
somewhere, but it is not. Since it is unhandled, it goes through the generic error() handler which longjmps, prints
the exception string, and prints a prompt, skipping displays.

A quick hack around this is to catch OPTIMIZED_OUT_ERROR. I've used the following patch successfully, and while I
know it is not correct, it does cause gdb to behave better:

--- frame.c	2020-06-26 15:05:26.214596623 -0400
+++ frame.c.save	2020-06-26 15:05:16.914596623 -0400
@@ -2125,9 +2125,11 @@
     }
   catch (const gdb_exception_error &ex)
     {
-      if (ex.error == MEMORY_ERROR)
+      if (ex.error == MEMORY_ERROR || ex.error == OPTIMIZED_OUT_ERROR)
 	{
-	  this_frame->stop_reason = UNWIND_MEMORY_ERROR;
+	  this_frame->stop_reason = (ex.error == MEMORY_ERROR
+				     ? UNWIND_MEMORY_ERROR
+				     : UNWIND_NO_SAVED_PC);
 	  if (ex.message != NULL)
 	    {
 	      char *stop_string;

Adding devel score.

I will investigate the terminal problems for 8.4.

Comment 14 RHEL Program Management 2023-12-13 17:01:58 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 15 RHEL Program Management 2023-12-13 17:07:00 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.