1789594 – kernel: Wrong FE0/FE1 MSR restore in signal handlers on ppc64le

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1789594 - kernel: Wrong FE0/FE1 MSR restore in signal handlers on ppc64le

Summary: kernel: Wrong FE0/FE1 MSR restore in signal handlers on ppc64le

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	8.1
Hardware:	ppc64le
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	8.2
Assignee:	Steve Best
QA Contact:	Eirik Fuller
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1711971
TreeView+	depends on / blocked

Reported:	2020-01-09 21:16 UTC by bob.huemmer
Modified:	2023-03-24 17:05 UTC (History)
CC List:	12 users (show)
Fixed In Version:	kernel-4.18.0-171.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-04-28 16:37:27 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)
Standalone 'C' program which exhibits problem described above (1.19 KB, text/plain) 2020-01-09 21:16 UTC, bob.huemmer	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
IBM Linux Technology Center	183168	None	None	None	2020-01-10 16:53:16 UTC
Red Hat Issue Tracker	RHELPLAN-31994	None	None	None	2023-03-24 17:05:42 UTC
Red Hat Product Errata	RHSA-2020:1769	None	None	None	2020-04-28 16:37:47 UTC

Description bob.huemmer 2020-01-09 21:16:23 UTC

Created attachment 1651068 [details]
Standalone 'C' program which exhibits problem described above

Description of problem:
The attached 

Version-Release number of selected component (if applicable):


How reproducible: Reproduces consistently on ppc64le


Steps to Reproduce:
   1. Compile attached 'C' code as follows: gcc -g -o d.out -D_GNU_SOURCE d.c -lm
   2. Run the resultant executable: d.out

Actual results:
ecnt = 0
.
.
.
ecnt = 255
Segmentation fault (core dumped)

Expected results:
ecnt should be printed out 0-5000

Additional info:

This program fails on ppc64le when compiled and executed on RHEL 7.6alt, 8.0, 8.1 and 8.2beta.

This program works fine when compiled and executed on RHEL 7.6, 8.0 for Intel x86_64 and ARM.

Comment 1 Florian Weimer 2020-01-10 10:44:40 UTC

I assume you see this in the kernel logs, too:

[   41.980461] d.out[2164]: bad frame in setup_rt_frame: 00007fffc4ddfc80 nip 000000001000097c lr 00007fff95c704d8

I'm trying to figure out where exactly setup_rt_frame fails.

Comment 2 bob.huemmer 2020-01-10 11:35:00 UTC

Yes, I do see that message in the log file. Apologies for not mentioning that with the report.
Also, just for clarity, this is a standalone test case that demonstrates the problem and is not representative of production code.

Comment 3 IBM Bug Proxy 2020-01-10 16:50:35 UTC

------- Comment From pacman.com 2020-01-10 11:45 EDT-------
It appears to get into a situation where a SIGFPE is being _generate_ within the signal handler, resulting in infinite recursion, which will exhaust the stack.  Running on RHEL 8.0 / POWER9 here within gdb, the last "ecnt" I see is "126", then I start recursing in the handler.
--
...
Program received signal SIGFPE, Arithmetic exception.
0x0000000010000af4 in main ()
(gdb)
Continuing.
ecnt = 124

Program received signal SIGFPE, Arithmetic exception.
0x0000000010000af4 in main ()
(gdb)
Continuing.
ecnt = 125

Program received signal SIGFPE, Arithmetic exception.
0x0000000010000af4 in main ()
(gdb)
Continuing.
ecnt = 126

Program received signal SIGFPE, Arithmetic exception.
0x0000000010000af4 in main ()
(gdb)
Continuing.

Program received signal SIGFPE, Arithmetic exception.
0x000000001000097c in handler ()
(gdb)
Continuing.

Program received signal SIGFPE, Arithmetic exception.
0x000000001000097c in handler ()
(gdb)
Continuing.

Program received signal SIGFPE, Arithmetic exception.
0x000000001000097c in handler ()
(gdb) bt
#0  0x000000001000097c in handler ()
#1  <signal handler called>
#2  0x000000001000097c in handler ()
#3  <signal handler called>
#4  0x000000001000097c in handler ()
#5  <signal handler called>
#6  0x0000000010000af4 in main ()
--

Comment 4 Florian Weimer 2020-01-10 16:59:00 UTC

I believe this is an artifact of how the setup_rt_frame error is reported.

The failing call is in arch/powerpc/kernel/signal_64.c, handle_rt_signal64, which I find a bit strange:

        err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));

Can you reproduce this with upstream kernels later than Linux 5.1?

I'm trying to bisect it to find the commit that caused the problem to disappear.

Comment 5 IBM Bug Proxy 2020-01-10 21:30:25 UTC

------- Comment From pacman.com 2020-01-10 16:20 EDT-------
Some muddy results...

I was NOT able to reproduce on 5.4.0-2-powerpc64.  (Note: BigEndian)
I was able to reproduce the problem on a system with 4.19.0-248916-g6a81548889f9 (ppc64le).
I was NOT able to reproduce on 4.19.0-6-powerpc64le.

(None of these are Red Hat Enterprise Linux systems, FYI.)

Emulators like QEMU and Mambo are also inconsistent.  I don't have easy access to a system on which I can boot distro kernels, but I can try a system in our Beaker instance (next week at the earliest).

Comment 6 Florian Weimer 2020-01-13 13:55:43 UTC

(In reply to IBM Bug Proxy from comment #5)
> ------- Comment From pacman.com 2020-01-10 16:20 EDT-------
> Some muddy results...
> 
> I was NOT able to reproduce on 5.4.0-2-powerpc64.  (Note: BigEndian)
> I was able to reproduce the problem on a system with
> 4.19.0-248916-g6a81548889f9 (ppc64le).
> I was NOT able to reproduce on 4.19.0-6-powerpc64le.
> 
> (None of these are Red Hat Enterprise Linux systems, FYI.)
> 
> Emulators like QEMU and Mambo are also inconsistent.  I don't have easy
> access to a system on which I can boot distro kernels, but I can try a
> system in our Beaker instance (next week at the earliest).

I can reproduce it under KVM, on a POWER9 host. I'm using a custom initrd with a statically-linked test. Bisecting is much faster this way. I should have results pretty soon.

Comment 7 Florian Weimer 2020-01-13 14:30:12 UTC

Bisecting points to this upstream commit:

commit fe1ef6bcdb4fca33434256a802a3ed6aacf0bd2f
Author: Mark Cave-Ayland <mark.cave-ayland.uk>
Date:   Fri Feb 8 14:33:19 2019 +0000

    powerpc: Fix 32-bit KVM-PR lockup and host crash with MacOS guest
    
    Commit 8792468da5e1 "powerpc: Add the ability to save FPU without
    giving it up" unexpectedly removed the MSR_FE0 and MSR_FE1 bits from
    the bitmask used to update the MSR of the previous thread in
    __giveup_fpu() causing a KVM-PR MacOS guest to lockup and panic the
    host kernel.
    
    Leaving FE0/1 enabled means unrelated processes might receive FPEs
    when they're not expecting them and crash. In particular if this
    happens to init the host will then panic.
    
    eg (transcribed):
      qemu-system-ppc[837]: unhandled signal 8 at 12cc9ce4 nip 12cc9ce4 lr 12cc9ca4 code 0
      systemd[1]: unhandled signal 8 at 202f02e0 nip 202f02e0 lr 001003d4 code 0
      Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
    
    Reinstate these bits to the MSR bitmask to enable MacOS guests to run
    under 32-bit KVM-PR once again without issue.
    
    Fixes: 8792468da5e1 ("powerpc: Add the ability to save FPU without giving it up")
    Cc: stable.org # v4.6+
    Signed-off-by: Mark Cave-Ayland <mark.cave-ayland.uk>
    Signed-off-by: Michael Ellerman <mpe.au>

I verified that applying this change on top of v5.0 fixes the reproducer.

The commit subject is a bit misleading, but the FE0/FE1 update clearly matters because feenableexcept calls prctl (PR_SET_FP_MODE) call changes the MSR.

Comment 9 Bruno Meneguele 2020-01-20 12:52:21 UTC

Patch(es) available on kernel-4.18.0-171.el8

Comment 12 Eirik Fuller 2020-01-20 23:27:43 UTC

A modified test program crashed with a segmentation fault under 4.18.0-170.el8 and succeeded under 4.18.0-171.el8; the modifications to the test program follow.

--- d.c 2020-01-20 17:08:20.607776489 -0500
+++ test.c      2020-01-20 17:08:23.394633053 -0500
@@ -1,6 +1,6 @@

-/* Compile: gcc -g -o d.out -D_GNU_SOURCE d.c -lm */
-/* Good: The value of ecnt is printed out 5000 times */   
+/* Compile: gcc -g -o test.out -D_GNU_SOURCE test.c -lm */
+/* Good: The value of ecnt is incremented 5000 times */
 /* Bad:  A crash occurs after the 255 iteration of the loop ecnt=256 */
 /* ONLY failes on ppc64le - RHEL 7.6alt, 8.0, 8.1, 8.2beta */

@@ -21,7 +21,7 @@
 {   
     feclearexcept(FE_ALL_EXCEPT);
     feenableexcept(FE_INVALID|FE_OVERFLOW|FE_DIVBYZERO);
-    printf("ecnt = %d\n",++ecnt);
+    ++ecnt;
     siglongjmp(jb,1);
 }

@@ -58,5 +58,5 @@
         }
     }

-    exit(0);
+    exit(ecnt != LIMIT);
 }

In the modified test the counter is incremented but not printed, and the exit status is a sanity check on its expected value. In actual testing that sanity check does not fail because the exit status comes from the segmentation fault.

As mentioned in comment 1 and other comments here, the 4.18.0-170.el8 testing included the following kernel message.

[   81.322242] test.out[19893]: bad frame in setup_rt_frame: 00007fffd443f740 nip 000000001000091c lr 00007fff84c504d8 

As expected, no such kernel message occurred in the 4.18.0-171.el8 testing.

Moving to VERIFIED based on the test results.

Comment 16 errata-xmlrpc 2020-04-28 16:37:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1769

Note You need to log in before you can comment on or make changes to this bug.