Bug 1899805

Summary: [5.8.9 -> 5.9 REGRESSION] Constant hard freezes with "BUG: Bad page state in process swapper/8", works fine with previous kernel
Product: [Fedora] Fedora Reporter: ell1e <el>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WORKSFORME QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 33CC: acaringi, adscvr, airlied, arnik, bskeggs, hdegoede, itamar, jarodwilson, jdobes, jeremy, jglisse, jonathan, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, mjg59, msandova, ptalbert, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-16 19:50:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/messages.txt (includes the "Bad page state" error at the top, then the follow-up reboot info)
none
journalctl -r (includes the "Bad page state" error at the bottom, then the follow-up reboot info above)
none
/proc/cpuinfo output on affected machine none

Description ell1e 2020-11-20 04:57:14 UTC
Created attachment 1731137 [details]
/var/log/messages.txt (includes the "Bad page state" error at the top, then the follow-up reboot info)

Description of problem:
Under Fedora 33 I am constantly getting hard kernel freezes (screen output freezes, sound either cuts out or repeats in last buffer stutter loop) with system log messages like these: BUG: Bad page state in process swapper/8  pfn:17ba8a

Under Fedora 32 this was fine. I think initially on 33 as well, it started showing up when packagekitd started after I applied this workaround: https://bugzilla.redhat.com/show_bug.cgi?id=1461313#c92 packagekitd also segfaults immediately every time I boot now, [   49.028851] packagekitd[2045]: segfault at 8 ip 0000561599ab93ea sp 00007fffddea8650 error 4 in packagekitd[561599ab5000+28000]. When I disable packagekitd I get the kernel freeze later, but it still happens after a few hours. It happens immediately when I launch packagekitd.

Packagekitd might be unrelated though and just the trigger, I'm just mentioning it in case it might be relevant.

I ran filesystem checks on all partitions and memchecker for hours with no problems coming up. I already tried disabling zram since that was enabld with 33, and setenforce 0, both didn't seem to prevent the freeze.

Version-Release number of selected component (if applicable):
Linux falcon 5.9.8-200.fc33.x86_64 #1 SMP Tue Nov 10 21:58:19 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux


How reproducible:
100%, just wait for a few hours and a freeze is guaranteed

Steps to Reproduce:
1. Use computer for a few hours

Actual results: freezes, after reboot I find the "Bad page state" error in log


Expected results: doesn't freeze


Additional info:

Comment 1 ell1e 2020-11-20 06:16:06 UTC
Created attachment 1731159 [details]
journalctl -r (includes the "Bad page state" error at the bottom, then the follow-up reboot info above)

Comment 2 msandova 2020-11-20 15:27:11 UTC
I am also affected by this bug, without doing anything in particular the machine freezes and in the logs I can see

    nov 20 16:14:32 alpha kernel: BUG: Bad page state in process swapper/8  pfn:3aa55a

This started happening today after an update in Fedora Silverblue 33 (I don't know when was the last time I updated, but it was probably not more than a week ago). I have a Ryzen 2700x cpu, talking with ell1e it seem they are also on a ryzen cpu.

Comment 3 ell1e 2020-11-20 15:39:15 UTC
Created attachment 1731290 [details]
/proc/cpuinfo output on affected machine

Since this seems to be possibly AMD CPU-related, I'm hereby also attaching my /proc/cpuinfo output.

Comment 4 ell1e 2020-11-21 02:04:23 UTC
Ok, I have been using 5.8.18-300.fc33.x86_64 for 12 hours ish now including the packagekitd uses that previously were good at triggering the freeze, and so far nothing. So given I didn't see this a few days ago, neither has msandova apparently, and Fedora upgraded to 5.9 roughly a few days ago, I feel pretty confident suggesting that this is a 5.9 regression.

I think at this point it might also be interesting to know how many AMD CPUs are really affected, and whether there is maybe any point in reverting the main repos back to an older kernel until the cause of this is found.

Comment 5 Ali 2020-11-22 16:31:30 UTC
 This Bug also happened for me. There is no specific reason that I can guess, 
but system will crash randomly after some unknown minute. it happened after I 
upgraded to kernel 5.9 .

Comment 6 ell1e 2020-12-03 03:32:59 UTC
Was there ever any consideration of a rollback until this is investigated? This seems like quite a disruptive regression, especially for an innocuous mid-cycle update.

Comment 7 ell1e 2021-01-23 14:02:16 UTC
I tested an upstream kernel now where it also happened, and made an upstream ticket: https://bugzilla.kernel.org/show_bug.cgi?id=211317

Comment 8 Jan Dobes 2021-01-27 10:18:34 UTC
Happens also to me (CPU Ryzen 3200G) after upgrade to F33, it's not always swapper process, I saw also chrome process causing this. Now I've updated to 5.10.10-200.fc33.x86_64, no freeze yet but I'm getting kerneloops with reason "BUG: Bad page state in process swapper/0" like every minute now. Definitely some change here even it doesn't seem fully fixed yet.

Comment 9 ell1e 2021-01-27 17:43:22 UTC
My exact CPU is AMD Ryzen 5 1600, and I tested 5.11.0-0.rc4.129.vanilla.1.fc33.x86_64 which I assume is newer? So maybe you seeing no freeze yet might have just been coincidence or something, who knows. (Or maybe it affects different Ryzen gens differently?) I do hope there will be some sort of kernel bugzilla response soon.

Comment 10 ell1e 2021-02-04 21:28:31 UTC
Will a package pin on 5.8 in dnf get removed by an upgrade to Fedora 34 (since I assume that one no longer ships 5.8 officially)? If yes then this is actually kind of a significant upgrade blocker... I'm actually not sure if even the installer would run without a freeze, possibly messing up the install, since it boots into its own initramfs thing :(

Comment 11 ell1e 2021-03-29 00:24:48 UTC
Just a reminder this basically turns computers unusable, I might as well switch to FreeBSD or Debian Stable at this point once Fedora 33 support runs out. Is there ANY plan to have this looked into?

Comment 12 ell1e 2021-04-15 06:24:26 UTC
This appears to have been fixed in 5.11.11-200.fc33.x86_6 (or earlier), I can no longer reproduce after running the machine for days. Unless somebody protests soon, I will close the ticket.

Comment 13 Jan Dobes 2021-04-29 14:52:38 UTC
I'm running this kernel for several weeks and I can no longer reproduce it as well.