Bug 2184048

Summary: System hangs on suspend
Product: [Fedora] Fedora Reporter: Douglas <doug.hs>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 37CC: acaringi, adscvr, airlied, alciregi, bskeggs, doug.hs, hdegoede, hpa, jarodwilson, jglisse, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, ptalbert, steved, voj-tech
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: ---
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kernel logs from previous boot (6.2.8-200.fc37)
none
kernel logs from rawhide kernel
none
kernel logs from rawhide kernel after failed suspend
none
Boot which first succeeded to suspend and resume but subsequently failed to suspend none

Description Douglas 2023-04-03 13:48:03 UTC
Created attachment 1955452 [details]
kernel logs from previous boot (6.2.8-200.fc37)

Created attachment 1955452 [details]
kernel logs from previous boot

1. Please describe the problem:
I left the system idle, and after 15 minutes it entered suspend (to RAM). I noticed all LEDs turned off except one, and the next day I came close to the computer case and heard the fans were still running. It seemed like the system never really suspended correctly.

I pressed a key on the keyboard, which is what I do to resume, but it didn't do anything other than turn on its LEDs. I tried the REISUB combination, but it didn't work. Pressed the "reset" button on the case, but no response. Only way to reboot was to toggle the PSU on/off switch.

After the reboot, with the desktop loaded, the problem reporting tool popped up showing non-reportable errors in kernel-core. Their reason was:

> traps: gldriverquery[26260] general protection fault ip:7fdfc85bf43d sp:7ffcba4f2520 error:0 in libLLVM-15.so[7fdfc823e000+33d2000]

2. What is the Version-Release number of the kernel:
6.2.8-200.fc37.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
Fedora 36 worked correctly. I believe it had kernel 6.1 or 6.0.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
I only reproduced it once.

1. Leave the system idle on the desktop until it suspends.
2. Check if all LEDs are off and that the fans have stopped.
3. Attempt to resume.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
No, the problem is fixed in the rawhide kernel I tested: 6.3.0-0.rc4.20230331git62bad54b26db.39.fc39.x86_64

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Douglas 2023-04-03 23:30:45 UTC
Created attachment 1955630 [details]
kernel logs from rawhide kernel

Comment 2 Douglas 2023-04-03 23:34:58 UTC
Update: I can reproduce this 100% of the time, even with a manually triggered suspension. The computer being idle or not doesn't matter.

Comment 3 Douglas 2023-04-04 15:28:57 UTC
Created attachment 1955706 [details]
kernel logs from rawhide kernel after failed suspend

Update: The rawhide kernel is also failing to suspend, although not as often as the current F37 kernel. It shows the same symptoms. I'm sure it's the same bug. Will now try an older kernel version to pinpoint where this started.

Comment 4 Douglas 2023-04-08 15:01:22 UTC
Cannot reproduce problem on kernel 6.0.18-300.fc37.x86_64. The problem started in 6.1.

Comment 5 Vojtech Sobota 2023-07-08 20:19:32 UTC
Created attachment 1974798 [details]
Boot which first succeeded to suspend and resume but subsequently failed to suspend

Attached logs of boot which first succeeded to suspend and resume but
subsequently failed to suspend. You can see the 'Filesystems sync' log message
is only present for the first suspend but not for the subsequent one, which
might be relevant.

Comment 6 Vojtech Sobota 2023-07-08 20:22:50 UTC
I experience the same issue, although it doesn't happen always, it appears to
be random in my case.

1. Please describe the problem:

   The system doesn't properly suspend and hangs whilst keeping the case fan
   and power LED on (HDDs are shut down). Only a hard power off and a boot from
   scratch is possible when it hangs during suspend.

2. What is the Version-Release number of the kernel:

   6.3.8-100.fc37.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

   Haven't tried older kernels to see at what point this issue started
   happening but it certainly only started happening in the last year or so.
   I've had this Fedora installation for 5+ years.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

   Yes, but it cannot be reproduced reliably in my case. It only happens
   sometimes (about 50% of the time).

   1. Simply attempt to suspend the system.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

   Did not try.

6. Are you running any modules that not shipped with directly Fedora's kernel?:

   No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

   Attached logs of boot which first succeeded to suspend and resume but
   subsequently failed to suspend. You can see the 'Filesystems sync' log
   message is only present for the first suspend but not for the subsequent
   one, which might be relevant.

Comment 7 Douglas 2023-07-08 20:34:29 UTC
(In reply to Vojtech Sobota from comment #6)
> I experience the same issue, although it doesn't happen always, it appears to
> be random in my case.
> 
> 1. Please describe the problem:
> 
>    The system doesn't properly suspend and hangs whilst keeping the case fan
>    and power LED on (HDDs are shut down). Only a hard power off and a boot
> from
>    scratch is possible when it hangs during suspend.
> 
> 2. What is the Version-Release number of the kernel:
> 
>    6.3.8-100.fc37.x86_64
> 
> 3. Did it work previously in Fedora? If so, what kernel version did the issue
>    *first* appear?  Old kernels are available for download at
>    https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
> 
>    Haven't tried older kernels to see at what point this issue started
>    happening but it certainly only started happening in the last year or so.
>    I've had this Fedora installation for 5+ years.
> 
> 4. Can you reproduce this issue? If so, please provide the steps to reproduce
>    the issue below:
> 
>    Yes, but it cannot be reproduced reliably in my case. It only happens
>    sometimes (about 50% of the time).
> 
>    1. Simply attempt to suspend the system.
> 
> 5. Does this problem occur with the latest Rawhide kernel? To install the
>    Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
>    ``sudo dnf update --enablerepo=rawhide kernel``:
> 
>    Did not try.
> 
> 6. Are you running any modules that not shipped with directly Fedora's
> kernel?:
> 
>    No.
> 
> 7. Please attach the kernel logs. You can get the complete kernel log
>    for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
>    issue occurred on a previous boot, use the journalctl ``-b`` flag.
> 
>    Attached logs of boot which first succeeded to suspend and resume but
>    subsequently failed to suspend. You can see the 'Filesystems sync' log
>    message is only present for the first suspend but not for the subsequent
>    one, which might be relevant.


Sorry, I don't know why I said I can reproduce it 100% of the time. It is as you said, more like 50% of the time. I can usually suspend successfully 2 times before a failure occurs. It's totally random.

I encourage you to test kernel 6.0. From my tests, it doesn't have this problem, but 6.1 does. We need attention from the maintainers to proceed. They will probably ask us to perform a bisect to locate the exact version the bug is introduced.

Comment 8 Douglas 2023-07-12 00:44:12 UTC
I have been able to reproduce this on shutdown as well, although not as often as on suspension. In this case the system doesn't completely shutdown, and some fans and LEDs remain on. A forced power off is needed to recover from this.

Comment 9 Vojtech Sobota 2023-07-17 20:24:14 UTC
I can confirm the issue disappears when I use kernel 6.0.

Happy to help with bisecting and/or testing.