Bug 1789594
Summary: | kernel: Wrong FE0/FE1 MSR restore in signal handlers on ppc64le | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | bob.huemmer | ||||
Component: | kernel | Assignee: | Steve Best <sbest> | ||||
kernel sub component: | ppc64 | QA Contact: | Eirik Fuller <efuller> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | high | ||||||
Priority: | unspecified | CC: | ashankar, bmarson, bob.huemmer, bugproxy, codonell, dj, fweimer, hannsj_uhl, mnewsome, pfrankli, rvr, sbest | ||||
Version: | 8.1 | ||||||
Target Milestone: | rc | ||||||
Target Release: | 8.2 | ||||||
Hardware: | ppc64le | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-4.18.0-171.el8 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-04-28 16:37:27 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1711971 | ||||||
Attachments: |
|
I assume you see this in the kernel logs, too: [ 41.980461] d.out[2164]: bad frame in setup_rt_frame: 00007fffc4ddfc80 nip 000000001000097c lr 00007fff95c704d8 I'm trying to figure out where exactly setup_rt_frame fails. Yes, I do see that message in the log file. Apologies for not mentioning that with the report. Also, just for clarity, this is a standalone test case that demonstrates the problem and is not representative of production code. ------- Comment From pacman.com 2020-01-10 11:45 EDT------- It appears to get into a situation where a SIGFPE is being _generate_ within the signal handler, resulting in infinite recursion, which will exhaust the stack. Running on RHEL 8.0 / POWER9 here within gdb, the last "ecnt" I see is "126", then I start recursing in the handler. -- ... Program received signal SIGFPE, Arithmetic exception. 0x0000000010000af4 in main () (gdb) Continuing. ecnt = 124 Program received signal SIGFPE, Arithmetic exception. 0x0000000010000af4 in main () (gdb) Continuing. ecnt = 125 Program received signal SIGFPE, Arithmetic exception. 0x0000000010000af4 in main () (gdb) Continuing. ecnt = 126 Program received signal SIGFPE, Arithmetic exception. 0x0000000010000af4 in main () (gdb) Continuing. Program received signal SIGFPE, Arithmetic exception. 0x000000001000097c in handler () (gdb) Continuing. Program received signal SIGFPE, Arithmetic exception. 0x000000001000097c in handler () (gdb) Continuing. Program received signal SIGFPE, Arithmetic exception. 0x000000001000097c in handler () (gdb) bt #0 0x000000001000097c in handler () #1 <signal handler called> #2 0x000000001000097c in handler () #3 <signal handler called> #4 0x000000001000097c in handler () #5 <signal handler called> #6 0x0000000010000af4 in main () -- I believe this is an artifact of how the setup_rt_frame error is reported. The failing call is in arch/powerpc/kernel/signal_64.c, handle_rt_signal64, which I find a bit strange: err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set)); Can you reproduce this with upstream kernels later than Linux 5.1? I'm trying to bisect it to find the commit that caused the problem to disappear. ------- Comment From pacman.com 2020-01-10 16:20 EDT------- Some muddy results... I was NOT able to reproduce on 5.4.0-2-powerpc64. (Note: BigEndian) I was able to reproduce the problem on a system with 4.19.0-248916-g6a81548889f9 (ppc64le). I was NOT able to reproduce on 4.19.0-6-powerpc64le. (None of these are Red Hat Enterprise Linux systems, FYI.) Emulators like QEMU and Mambo are also inconsistent. I don't have easy access to a system on which I can boot distro kernels, but I can try a system in our Beaker instance (next week at the earliest). (In reply to IBM Bug Proxy from comment #5) > ------- Comment From pacman.com 2020-01-10 16:20 EDT------- > Some muddy results... > > I was NOT able to reproduce on 5.4.0-2-powerpc64. (Note: BigEndian) > I was able to reproduce the problem on a system with > 4.19.0-248916-g6a81548889f9 (ppc64le). > I was NOT able to reproduce on 4.19.0-6-powerpc64le. > > (None of these are Red Hat Enterprise Linux systems, FYI.) > > Emulators like QEMU and Mambo are also inconsistent. I don't have easy > access to a system on which I can boot distro kernels, but I can try a > system in our Beaker instance (next week at the earliest). I can reproduce it under KVM, on a POWER9 host. I'm using a custom initrd with a statically-linked test. Bisecting is much faster this way. I should have results pretty soon. Bisecting points to this upstream commit: commit fe1ef6bcdb4fca33434256a802a3ed6aacf0bd2f Author: Mark Cave-Ayland <mark.cave-ayland.uk> Date: Fri Feb 8 14:33:19 2019 +0000 powerpc: Fix 32-bit KVM-PR lockup and host crash with MacOS guest Commit 8792468da5e1 "powerpc: Add the ability to save FPU without giving it up" unexpectedly removed the MSR_FE0 and MSR_FE1 bits from the bitmask used to update the MSR of the previous thread in __giveup_fpu() causing a KVM-PR MacOS guest to lockup and panic the host kernel. Leaving FE0/1 enabled means unrelated processes might receive FPEs when they're not expecting them and crash. In particular if this happens to init the host will then panic. eg (transcribed): qemu-system-ppc[837]: unhandled signal 8 at 12cc9ce4 nip 12cc9ce4 lr 12cc9ca4 code 0 systemd[1]: unhandled signal 8 at 202f02e0 nip 202f02e0 lr 001003d4 code 0 Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Reinstate these bits to the MSR bitmask to enable MacOS guests to run under 32-bit KVM-PR once again without issue. Fixes: 8792468da5e1 ("powerpc: Add the ability to save FPU without giving it up") Cc: stable.org # v4.6+ Signed-off-by: Mark Cave-Ayland <mark.cave-ayland.uk> Signed-off-by: Michael Ellerman <mpe.au> I verified that applying this change on top of v5.0 fixes the reproducer. The commit subject is a bit misleading, but the FE0/FE1 update clearly matters because feenableexcept calls prctl (PR_SET_FP_MODE) call changes the MSR. Patch(es) available on kernel-4.18.0-171.el8 A modified test program crashed with a segmentation fault under 4.18.0-170.el8 and succeeded under 4.18.0-171.el8; the modifications to the test program follow. --- d.c 2020-01-20 17:08:20.607776489 -0500 +++ test.c 2020-01-20 17:08:23.394633053 -0500 @@ -1,6 +1,6 @@ -/* Compile: gcc -g -o d.out -D_GNU_SOURCE d.c -lm */ -/* Good: The value of ecnt is printed out 5000 times */ +/* Compile: gcc -g -o test.out -D_GNU_SOURCE test.c -lm */ +/* Good: The value of ecnt is incremented 5000 times */ /* Bad: A crash occurs after the 255 iteration of the loop ecnt=256 */ /* ONLY failes on ppc64le - RHEL 7.6alt, 8.0, 8.1, 8.2beta */ @@ -21,7 +21,7 @@ { feclearexcept(FE_ALL_EXCEPT); feenableexcept(FE_INVALID|FE_OVERFLOW|FE_DIVBYZERO); - printf("ecnt = %d\n",++ecnt); + ++ecnt; siglongjmp(jb,1); } @@ -58,5 +58,5 @@ } } - exit(0); + exit(ecnt != LIMIT); } In the modified test the counter is incremented but not printed, and the exit status is a sanity check on its expected value. In actual testing that sanity check does not fail because the exit status comes from the segmentation fault. As mentioned in comment 1 and other comments here, the 4.18.0-170.el8 testing included the following kernel message. [ 81.322242] test.out[19893]: bad frame in setup_rt_frame: 00007fffd443f740 nip 000000001000091c lr 00007fff84c504d8 As expected, no such kernel message occurred in the 4.18.0-171.el8 testing. Moving to VERIFIED based on the test results. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:1769 |
Created attachment 1651068 [details] Standalone 'C' program which exhibits problem described above Description of problem: The attached Version-Release number of selected component (if applicable): How reproducible: Reproduces consistently on ppc64le Steps to Reproduce: 1. Compile attached 'C' code as follows: gcc -g -o d.out -D_GNU_SOURCE d.c -lm 2. Run the resultant executable: d.out Actual results: ecnt = 0 . . . ecnt = 255 Segmentation fault (core dumped) Expected results: ecnt should be printed out 0-5000 Additional info: This program fails on ppc64le when compiled and executed on RHEL 7.6alt, 8.0, 8.1 and 8.2beta. This program works fine when compiled and executed on RHEL 7.6, 8.0 for Intel x86_64 and ARM.