Bug 1789594

Summary:

kernel: Wrong FE0/FE1 MSR restore in signal handlers on ppc64le

Product:

Red Hat Enterprise Linux 8

Reporter:

bob.huemmer

Component:

kernel

Assignee:

Steve Best <sbest>

kernel sub component:

ppc64

QA Contact:

Eirik Fuller <efuller>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

high

Priority:

unspecified

CC:

ashankar, bmarson, bob.huemmer, bugproxy, codonell, dj, fweimer, hannsj_uhl, mnewsome, pfrankli, rvr, sbest

Version:

8.1

Flags:

pm-rhel: mirror+

Target Milestone:

Target Release:

8.2

Hardware:

ppc64le

OS:

Linux

Whiteboard:

Fixed In Version:

kernel-4.18.0-171.el8

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-04-28 16:37:27 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1711971

Attachments:

Description	Flags
Standalone 'C' program which exhibits problem described above	none

Description bob.huemmer 2020-01-09 21:16:23 UTC

Created attachment 1651068 [details]
Standalone 'C' program which exhibits problem described above

Description of problem:
The attached 

Version-Release number of selected component (if applicable):


How reproducible: Reproduces consistently on ppc64le


Steps to Reproduce:
   1. Compile attached 'C' code as follows: gcc -g -o d.out -D_GNU_SOURCE d.c -lm
   2. Run the resultant executable: d.out

Actual results:
ecnt = 0
.
.
.
ecnt = 255
Segmentation fault (core dumped)

Expected results:
ecnt should be printed out 0-5000

Additional info:

This program fails on ppc64le when compiled and executed on RHEL 7.6alt, 8.0, 8.1 and 8.2beta.

This program works fine when compiled and executed on RHEL 7.6, 8.0 for Intel x86_64 and ARM.

Comment 1 Florian Weimer 2020-01-10 10:44:40 UTC

I assume you see this in the kernel logs, too:

[   41.980461] d.out[2164]: bad frame in setup_rt_frame: 00007fffc4ddfc80 nip 000000001000097c lr 00007fff95c704d8

I'm trying to figure out where exactly setup_rt_frame fails.

Comment 2 bob.huemmer 2020-01-10 11:35:00 UTC

Yes, I do see that message in the log file. Apologies for not mentioning that with the report.
Also, just for clarity, this is a standalone test case that demonstrates the problem and is not representative of production code.

Comment 3 IBM Bug Proxy 2020-01-10 16:50:35 UTC

------- Comment From pacman.com 2020-01-10 11:45 EDT-------
It appears to get into a situation where a SIGFPE is being _generate_ within the signal handler, resulting in infinite recursion, which will exhaust the stack.  Running on RHEL 8.0 / POWER9 here within gdb, the last "ecnt" I see is "126", then I start recursing in the handler.
--
...
Program received signal SIGFPE, Arithmetic exception.
0x0000000010000af4 in main ()
(gdb)
Continuing.
ecnt = 124

Program received signal SIGFPE, Arithmetic exception.
0x0000000010000af4 in main ()
(gdb)
Continuing.
ecnt = 125

Program received signal SIGFPE, Arithmetic exception.
0x0000000010000af4 in main ()
(gdb)
Continuing.
ecnt = 126

Program received signal SIGFPE, Arithmetic exception.
0x0000000010000af4 in main ()
(gdb)
Continuing.

Program received signal SIGFPE, Arithmetic exception.
0x000000001000097c in handler ()
(gdb)
Continuing.

Program received signal SIGFPE, Arithmetic exception.
0x000000001000097c in handler ()
(gdb)
Continuing.

Program received signal SIGFPE, Arithmetic exception.
0x000000001000097c in handler ()
(gdb) bt
#0  0x000000001000097c in handler ()
#1  <signal handler called>
#2  0x000000001000097c in handler ()
#3  <signal handler called>
#4  0x000000001000097c in handler ()
#5  <signal handler called>
#6  0x0000000010000af4 in main ()
--

Comment 4 Florian Weimer 2020-01-10 16:59:00 UTC

I believe this is an artifact of how the setup_rt_frame error is reported.

The failing call is in arch/powerpc/kernel/signal_64.c, handle_rt_signal64, which I find a bit strange:

        err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));

Can you reproduce this with upstream kernels later than Linux 5.1?

I'm trying to bisect it to find the commit that caused the problem to disappear.

Comment 5 IBM Bug Proxy 2020-01-10 21:30:25 UTC

------- Comment From pacman.com 2020-01-10 16:20 EDT-------
Some muddy results...

I was NOT able to reproduce on 5.4.0-2-powerpc64.  (Note: BigEndian)
I was able to reproduce the problem on a system with 4.19.0-248916-g6a81548889f9 (ppc64le).
I was NOT able to reproduce on 4.19.0-6-powerpc64le.

(None of these are Red Hat Enterprise Linux systems, FYI.)

Emulators like QEMU and Mambo are also inconsistent.  I don't have easy access to a system on which I can boot distro kernels, but I can try a system in our Beaker instance (next week at the earliest).

Comment 6 Florian Weimer 2020-01-13 13:55:43 UTC

(In reply to IBM Bug Proxy from comment #5)
> ------- Comment From pacman.com 2020-01-10 16:20 EDT-------
> Some muddy results...
> 
> I was NOT able to reproduce on 5.4.0-2-powerpc64.  (Note: BigEndian)
> I was able to reproduce the problem on a system with
> 4.19.0-248916-g6a81548889f9 (ppc64le).
> I was NOT able to reproduce on 4.19.0-6-powerpc64le.
> 
> (None of these are Red Hat Enterprise Linux systems, FYI.)
> 
> Emulators like QEMU and Mambo are also inconsistent.  I don't have easy
> access to a system on which I can boot distro kernels, but I can try a
> system in our Beaker instance (next week at the earliest).

I can reproduce it under KVM, on a POWER9 host. I'm using a custom initrd with a statically-linked test. Bisecting is much faster this way. I should have results pretty soon.

Comment 7 Florian Weimer 2020-01-13 14:30:12 UTC

Bisecting points to this upstream commit:

commit fe1ef6bcdb4fca33434256a802a3ed6aacf0bd2f
Author: Mark Cave-Ayland <mark.cave-ayland.uk>
Date:   Fri Feb 8 14:33:19 2019 +0000

    powerpc: Fix 32-bit KVM-PR lockup and host crash with MacOS guest
    
    Commit 8792468da5e1 "powerpc: Add the ability to save FPU without
    giving it up" unexpectedly removed the MSR_FE0 and MSR_FE1 bits from
    the bitmask used to update the MSR of the previous thread in
    __giveup_fpu() causing a KVM-PR MacOS guest to lockup and panic the
    host kernel.
    
    Leaving FE0/1 enabled means unrelated processes might receive FPEs
    when they're not expecting them and crash. In particular if this
    happens to init the host will then panic.
    
    eg (transcribed):
      qemu-system-ppc[837]: unhandled signal 8 at 12cc9ce4 nip 12cc9ce4 lr 12cc9ca4 code 0
      systemd[1]: unhandled signal 8 at 202f02e0 nip 202f02e0 lr 001003d4 code 0
      Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
    
    Reinstate these bits to the MSR bitmask to enable MacOS guests to run
    under 32-bit KVM-PR once again without issue.
    
    Fixes: 8792468da5e1 ("powerpc: Add the ability to save FPU without giving it up")
    Cc: stable.org # v4.6+
    Signed-off-by: Mark Cave-Ayland <mark.cave-ayland.uk>
    Signed-off-by: Michael Ellerman <mpe.au>

I verified that applying this change on top of v5.0 fixes the reproducer.

The commit subject is a bit misleading, but the FE0/FE1 update clearly matters because feenableexcept calls prctl (PR_SET_FP_MODE) call changes the MSR.

Comment 9 Bruno Meneguele 2020-01-20 12:52:21 UTC

Patch(es) available on kernel-4.18.0-171.el8

Comment 12 Eirik Fuller 2020-01-20 23:27:43 UTC

A modified test program crashed with a segmentation fault under 4.18.0-170.el8 and succeeded under 4.18.0-171.el8; the modifications to the test program follow.

--- d.c 2020-01-20 17:08:20.607776489 -0500
+++ test.c      2020-01-20 17:08:23.394633053 -0500
@@ -1,6 +1,6 @@

-/* Compile: gcc -g -o d.out -D_GNU_SOURCE d.c -lm */
-/* Good: The value of ecnt is printed out 5000 times */   
+/* Compile: gcc -g -o test.out -D_GNU_SOURCE test.c -lm */
+/* Good: The value of ecnt is incremented 5000 times */
 /* Bad:  A crash occurs after the 255 iteration of the loop ecnt=256 */
 /* ONLY failes on ppc64le - RHEL 7.6alt, 8.0, 8.1, 8.2beta */

@@ -21,7 +21,7 @@
 {   
     feclearexcept(FE_ALL_EXCEPT);
     feenableexcept(FE_INVALID|FE_OVERFLOW|FE_DIVBYZERO);
-    printf("ecnt = %d\n",++ecnt);
+    ++ecnt;
     siglongjmp(jb,1);
 }

@@ -58,5 +58,5 @@
         }
     }

-    exit(0);
+    exit(ecnt != LIMIT);
 }

In the modified test the counter is incremented but not printed, and the exit status is a sanity check on its expected value. In actual testing that sanity check does not fail because the exit status comes from the segmentation fault.

As mentioned in comment 1 and other comments here, the 4.18.0-170.el8 testing included the following kernel message.

[   81.322242] test.out[19893]: bad frame in setup_rt_frame: 00007fffd443f740 nip 000000001000091c lr 00007fff84c504d8 

As expected, no such kernel message occurred in the 4.18.0-171.el8 testing.

Moving to VERIFIED based on the test results.

Comment 16 errata-xmlrpc 2020-04-28 16:37:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1769