Bug 1899805 - [5.8.9 -> 5.9 REGRESSION] Constant hard freezes with "BUG: Bad page state in process swapper/8", works fine with previous kernel
Summary: [5.8.9 -> 5.9 REGRESSION] Constant hard freezes with "BUG: Bad page state in ...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 33
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-20 04:57 UTC by ell1e
Modified: 2021-09-16 19:50 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-16 19:50:32 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
/var/log/messages.txt (includes the "Bad page state" error at the top, then the follow-up reboot info) (535.49 KB, text/plain)
2020-11-20 04:57 UTC, ell1e
no flags Details
journalctl -r (includes the "Bad page state" error at the bottom, then the follow-up reboot info above) (488.20 KB, text/plain)
2020-11-20 06:16 UTC, ell1e
no flags Details
/proc/cpuinfo output on affected machine (16.19 KB, text/plain)
2020-11-20 15:39 UTC, ell1e
no flags Details

Description ell1e 2020-11-20 04:57:14 UTC
Created attachment 1731137 [details]
/var/log/messages.txt (includes the "Bad page state" error at the top, then the follow-up reboot info)

Description of problem:
Under Fedora 33 I am constantly getting hard kernel freezes (screen output freezes, sound either cuts out or repeats in last buffer stutter loop) with system log messages like these: BUG: Bad page state in process swapper/8  pfn:17ba8a

Under Fedora 32 this was fine. I think initially on 33 as well, it started showing up when packagekitd started after I applied this workaround: https://bugzilla.redhat.com/show_bug.cgi?id=1461313#c92 packagekitd also segfaults immediately every time I boot now, [   49.028851] packagekitd[2045]: segfault at 8 ip 0000561599ab93ea sp 00007fffddea8650 error 4 in packagekitd[561599ab5000+28000]. When I disable packagekitd I get the kernel freeze later, but it still happens after a few hours. It happens immediately when I launch packagekitd.

Packagekitd might be unrelated though and just the trigger, I'm just mentioning it in case it might be relevant.

I ran filesystem checks on all partitions and memchecker for hours with no problems coming up. I already tried disabling zram since that was enabld with 33, and setenforce 0, both didn't seem to prevent the freeze.

Version-Release number of selected component (if applicable):
Linux falcon 5.9.8-200.fc33.x86_64 #1 SMP Tue Nov 10 21:58:19 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux


How reproducible:
100%, just wait for a few hours and a freeze is guaranteed

Steps to Reproduce:
1. Use computer for a few hours

Actual results: freezes, after reboot I find the "Bad page state" error in log


Expected results: doesn't freeze


Additional info:

Comment 1 ell1e 2020-11-20 06:16:06 UTC
Created attachment 1731159 [details]
journalctl -r (includes the "Bad page state" error at the bottom, then the follow-up reboot info above)

Comment 2 msandova 2020-11-20 15:27:11 UTC
I am also affected by this bug, without doing anything in particular the machine freezes and in the logs I can see

    nov 20 16:14:32 alpha kernel: BUG: Bad page state in process swapper/8  pfn:3aa55a

This started happening today after an update in Fedora Silverblue 33 (I don't know when was the last time I updated, but it was probably not more than a week ago). I have a Ryzen 2700x cpu, talking with ell1e it seem they are also on a ryzen cpu.

Comment 3 ell1e 2020-11-20 15:39:15 UTC
Created attachment 1731290 [details]
/proc/cpuinfo output on affected machine

Since this seems to be possibly AMD CPU-related, I'm hereby also attaching my /proc/cpuinfo output.

Comment 4 ell1e 2020-11-21 02:04:23 UTC
Ok, I have been using 5.8.18-300.fc33.x86_64 for 12 hours ish now including the packagekitd uses that previously were good at triggering the freeze, and so far nothing. So given I didn't see this a few days ago, neither has msandova apparently, and Fedora upgraded to 5.9 roughly a few days ago, I feel pretty confident suggesting that this is a 5.9 regression.

I think at this point it might also be interesting to know how many AMD CPUs are really affected, and whether there is maybe any point in reverting the main repos back to an older kernel until the cause of this is found.

Comment 5 Ali 2020-11-22 16:31:30 UTC
 This Bug also happened for me. There is no specific reason that I can guess, 
but system will crash randomly after some unknown minute. it happened after I 
upgraded to kernel 5.9 .

Comment 6 ell1e 2020-12-03 03:32:59 UTC
Was there ever any consideration of a rollback until this is investigated? This seems like quite a disruptive regression, especially for an innocuous mid-cycle update.

Comment 7 ell1e 2021-01-23 14:02:16 UTC
I tested an upstream kernel now where it also happened, and made an upstream ticket: https://bugzilla.kernel.org/show_bug.cgi?id=211317

Comment 8 Jan Dobes 2021-01-27 10:18:34 UTC
Happens also to me (CPU Ryzen 3200G) after upgrade to F33, it's not always swapper process, I saw also chrome process causing this. Now I've updated to 5.10.10-200.fc33.x86_64, no freeze yet but I'm getting kerneloops with reason "BUG: Bad page state in process swapper/0" like every minute now. Definitely some change here even it doesn't seem fully fixed yet.

Comment 9 ell1e 2021-01-27 17:43:22 UTC
My exact CPU is AMD Ryzen 5 1600, and I tested 5.11.0-0.rc4.129.vanilla.1.fc33.x86_64 which I assume is newer? So maybe you seeing no freeze yet might have just been coincidence or something, who knows. (Or maybe it affects different Ryzen gens differently?) I do hope there will be some sort of kernel bugzilla response soon.

Comment 10 ell1e 2021-02-04 21:28:31 UTC
Will a package pin on 5.8 in dnf get removed by an upgrade to Fedora 34 (since I assume that one no longer ships 5.8 officially)? If yes then this is actually kind of a significant upgrade blocker... I'm actually not sure if even the installer would run without a freeze, possibly messing up the install, since it boots into its own initramfs thing :(

Comment 11 ell1e 2021-03-29 00:24:48 UTC
Just a reminder this basically turns computers unusable, I might as well switch to FreeBSD or Debian Stable at this point once Fedora 33 support runs out. Is there ANY plan to have this looked into?

Comment 12 ell1e 2021-04-15 06:24:26 UTC
This appears to have been fixed in 5.11.11-200.fc33.x86_6 (or earlier), I can no longer reproduce after running the machine for days. Unless somebody protests soon, I will close the ticket.

Comment 13 Jan Dobes 2021-04-29 14:52:38 UTC
I'm running this kernel for several weeks and I can no longer reproduce it as well.


Note You need to log in before you can comment on or make changes to this bug.