Bug 2253789

Summary: [regression] erratic state with kernel version 6.6.5
Product: [Fedora] Fedora Reporter: Baptiste Mille-Mathias <baptiste.millemathias>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 39CC: acaringi, adscvr, airlied, alciregi, bskeggs, dimitris.on.linux, hdegoede, hpa, ian.s.mcinerney, jarod, jforbes, josef, kernel-maint, linville, masami256, mchehab, nixuser, ptalbert, steved
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-12-11 12:40:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2240811    
Attachments:
Description Flags
journal kernel log none

Description Baptiste Mille-Mathias 2023-12-09 18:41:02 UTC
1. Please describe the problem:
My machine at some point looses it's network access. NetworkManager takes 100% and a lot of process are killed by the OOM killer
Model: framework laptop with AMD CPU. After that the laptop is not usable anymore, doing a proper shutdown takes 7 minutes because not unit stop properly and have to be killed reaching their shutdown timeout.
```
➜  ~ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 7 7840U w/ Radeon  780M Graphics
```

2. What is the Version-Release number of the kernel:
kernel-6.6.5-200.fc39.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
Yes it works until I received the new kernel version 6.6.5.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
yes I got the same behaviour after a while each time.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:


6. Are you running any modules that not shipped with directly Fedora's kernel?:
Nope

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

done

I saw another framework AMD Owner with the same kernel reported some issue on bug #2253756.

Reproducible: Always

Comment 1 Baptiste Mille-Mathias 2023-12-09 18:41:29 UTC
Created attachment 2003449 [details]
journal kernel log

Comment 2 Dimitris 2023-12-09 20:09:33 UTC
I have a very similar symptom on the same hardware with bug 2253756, in my case I get into this takes-forever-to-shutdown-resort-to-hold-power-switch state after an attempted suspend fails.  I may just not have had been running long enough with 6.6.5 to see this manifest before the suspend attempt.

My stack trace for wpa_supplicant looks the same to yours here.

So I suspect we're hitting the same issue.

I've tried 6.7.0-0.rc4.20231208git5e3f5b81de80.38.fc40 from rawhide and my issue seemed resolved, now going through kernel bugzilla looking for possibly already fixed issues that might be worth backporting to 6.6.

Comment 3 Justin M. Forbes 2023-12-11 12:40:31 UTC

*** This bug has been marked as a duplicate of bug 2253756 ***