Bug 2184048 - System hangs on suspend
Summary: System hangs on suspend
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 39
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-03 13:48 UTC by Douglas
Modified: 2024-06-02 21:53 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-06-02 15:42:29 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
kernel logs from previous boot (6.2.8-200.fc37) (123.15 KB, text/plain)
2023-04-03 13:48 UTC, Douglas
no flags Details
kernel logs from rawhide kernel (119.12 KB, text/plain)
2023-04-03 23:30 UTC, Douglas
no flags Details
kernel logs from rawhide kernel after failed suspend (122.69 KB, text/plain)
2023-04-04 15:28 UTC, Douglas
no flags Details
Boot which first succeeded to suspend and resume but subsequently failed to suspend (108.50 KB, text/plain)
2023-07-08 20:19 UTC, Vojtech Sobota
no flags Details

Description Douglas 2023-04-03 13:48:03 UTC
Created attachment 1955452 [details]
kernel logs from previous boot (6.2.8-200.fc37)

Created attachment 1955452 [details]
kernel logs from previous boot

1. Please describe the problem:
I left the system idle, and after 15 minutes it entered suspend (to RAM). I noticed all LEDs turned off except one, and the next day I came close to the computer case and heard the fans were still running. It seemed like the system never really suspended correctly.

I pressed a key on the keyboard, which is what I do to resume, but it didn't do anything other than turn on its LEDs. I tried the REISUB combination, but it didn't work. Pressed the "reset" button on the case, but no response. Only way to reboot was to toggle the PSU on/off switch.

After the reboot, with the desktop loaded, the problem reporting tool popped up showing non-reportable errors in kernel-core. Their reason was:

> traps: gldriverquery[26260] general protection fault ip:7fdfc85bf43d sp:7ffcba4f2520 error:0 in libLLVM-15.so[7fdfc823e000+33d2000]

2. What is the Version-Release number of the kernel:
6.2.8-200.fc37.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
Fedora 36 worked correctly. I believe it had kernel 6.1 or 6.0.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
I only reproduced it once.

1. Leave the system idle on the desktop until it suspends.
2. Check if all LEDs are off and that the fans have stopped.
3. Attempt to resume.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
No, the problem is fixed in the rawhide kernel I tested: 6.3.0-0.rc4.20230331git62bad54b26db.39.fc39.x86_64

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Douglas 2023-04-03 23:30:45 UTC
Created attachment 1955630 [details]
kernel logs from rawhide kernel

Comment 2 Douglas 2023-04-03 23:34:58 UTC
Update: I can reproduce this 100% of the time, even with a manually triggered suspension. The computer being idle or not doesn't matter.

Comment 3 Douglas 2023-04-04 15:28:57 UTC
Created attachment 1955706 [details]
kernel logs from rawhide kernel after failed suspend

Update: The rawhide kernel is also failing to suspend, although not as often as the current F37 kernel. It shows the same symptoms. I'm sure it's the same bug. Will now try an older kernel version to pinpoint where this started.

Comment 4 Douglas 2023-04-08 15:01:22 UTC
Cannot reproduce problem on kernel 6.0.18-300.fc37.x86_64. The problem started in 6.1.

Comment 5 Vojtech Sobota 2023-07-08 20:19:32 UTC
Created attachment 1974798 [details]
Boot which first succeeded to suspend and resume but subsequently failed to suspend

Attached logs of boot which first succeeded to suspend and resume but
subsequently failed to suspend. You can see the 'Filesystems sync' log message
is only present for the first suspend but not for the subsequent one, which
might be relevant.

Comment 6 Vojtech Sobota 2023-07-08 20:22:50 UTC
I experience the same issue, although it doesn't happen always, it appears to
be random in my case.

1. Please describe the problem:

   The system doesn't properly suspend and hangs whilst keeping the case fan
   and power LED on (HDDs are shut down). Only a hard power off and a boot from
   scratch is possible when it hangs during suspend.

2. What is the Version-Release number of the kernel:

   6.3.8-100.fc37.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

   Haven't tried older kernels to see at what point this issue started
   happening but it certainly only started happening in the last year or so.
   I've had this Fedora installation for 5+ years.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

   Yes, but it cannot be reproduced reliably in my case. It only happens
   sometimes (about 50% of the time).

   1. Simply attempt to suspend the system.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

   Did not try.

6. Are you running any modules that not shipped with directly Fedora's kernel?:

   No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

   Attached logs of boot which first succeeded to suspend and resume but
   subsequently failed to suspend. You can see the 'Filesystems sync' log
   message is only present for the first suspend but not for the subsequent
   one, which might be relevant.

Comment 7 Douglas 2023-07-08 20:34:29 UTC
(In reply to Vojtech Sobota from comment #6)
> I experience the same issue, although it doesn't happen always, it appears to
> be random in my case.
> 
> 1. Please describe the problem:
> 
>    The system doesn't properly suspend and hangs whilst keeping the case fan
>    and power LED on (HDDs are shut down). Only a hard power off and a boot
> from
>    scratch is possible when it hangs during suspend.
> 
> 2. What is the Version-Release number of the kernel:
> 
>    6.3.8-100.fc37.x86_64
> 
> 3. Did it work previously in Fedora? If so, what kernel version did the issue
>    *first* appear?  Old kernels are available for download at
>    https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
> 
>    Haven't tried older kernels to see at what point this issue started
>    happening but it certainly only started happening in the last year or so.
>    I've had this Fedora installation for 5+ years.
> 
> 4. Can you reproduce this issue? If so, please provide the steps to reproduce
>    the issue below:
> 
>    Yes, but it cannot be reproduced reliably in my case. It only happens
>    sometimes (about 50% of the time).
> 
>    1. Simply attempt to suspend the system.
> 
> 5. Does this problem occur with the latest Rawhide kernel? To install the
>    Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
>    ``sudo dnf update --enablerepo=rawhide kernel``:
> 
>    Did not try.
> 
> 6. Are you running any modules that not shipped with directly Fedora's
> kernel?:
> 
>    No.
> 
> 7. Please attach the kernel logs. You can get the complete kernel log
>    for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
>    issue occurred on a previous boot, use the journalctl ``-b`` flag.
> 
>    Attached logs of boot which first succeeded to suspend and resume but
>    subsequently failed to suspend. You can see the 'Filesystems sync' log
>    message is only present for the first suspend but not for the subsequent
>    one, which might be relevant.


Sorry, I don't know why I said I can reproduce it 100% of the time. It is as you said, more like 50% of the time. I can usually suspend successfully 2 times before a failure occurs. It's totally random.

I encourage you to test kernel 6.0. From my tests, it doesn't have this problem, but 6.1 does. We need attention from the maintainers to proceed. They will probably ask us to perform a bisect to locate the exact version the bug is introduced.

Comment 8 Douglas 2023-07-12 00:44:12 UTC
I have been able to reproduce this on shutdown as well, although not as often as on suspension. In this case the system doesn't completely shutdown, and some fans and LEDs remain on. A forced power off is needed to recover from this.

Comment 9 Vojtech Sobota 2023-07-17 20:24:14 UTC
I can confirm the issue disappears when I use kernel 6.0.

Happy to help with bisecting and/or testing.

Comment 10 Douglas 2023-09-03 23:10:57 UTC
@voj-tech Are you running an AMD or Intel graphics card?

I own an AMD RX 6600. Tired of waiting for a fix, a few days ago I installed FreeBSD 13 and OpenBSD 7.3 on this machine. Suspend to RAM worked okay, but then I was surprised to find that this bug is also reproducible on both of these systems! The one thing I know that Linux and BSDs have in common is the DRM (Direct Rendering Manager) code that enables our graphics cards. See here: https://man.openbsd.org/drm.4

Comment 11 Vojtech Sobota 2023-09-04 00:06:00 UTC
I'm running the integrated Intel graphics.

Interesting that it's reproducible on BSD too, oh well... I've actually switched to the kernel 5.15 LTS branch for now. It works fine for me as I'm not missing anything from the newer kernel versions. Hoping the bug gets fixed at some point and if not then getting new HW will "fix" it for me I guess.

Comment 12 Douglas 2023-09-18 15:42:15 UTC
Reported upstream to AMD DRM: https://gitlab.freedesktop.org/drm/amd/-/issues/2857

Comment 13 Douglas 2023-09-18 15:46:00 UTC
I couldn't find a more generic repository (for both Intel and AMD graphics), but I'll mention that this happens with Intel as well.

Comment 14 Douglas 2023-09-27 05:14:01 UTC
So, it seems a BIOS upgrade I did yesterday fixed it. I went through about 10 suspend cycles with no issues - previously it would fail somewhere around the third suspension. It's a very confusing outcome, since the problem was connected to a specific Linux kernel version.

The changelog [1] between F5 and F6 does not say anything relevant. Either they omitted this information or they really don't know they just fixed my issue. It was released in june this year.

[1] https://www.gigabyte.com/us/Motherboard/B460M-DS3H-rev-10/support#support-dl-bios

@voj-tech Which motherboard do you have? I suggest you check for updates too.

Comment 15 Vojtech Sobota 2023-09-27 21:09:15 UTC
That's interesting but good news! I've got GA-B250M-D3H [1]. Sadly for me I've been on the latest version (F10) all along. It's an old motherboard which has been unsupported for a while now.

It can't be a coincidence that we both have a Gigabyte DS?3H motherboard :-)

[1] https://www.gigabyte.com/Motherboard/GA-B250M-D3H-rev-10

Comment 16 Douglas 2023-09-28 18:48:40 UTC
The problem is back after a reboot :(
I'm really tired of this, you know. I don't even know for sure what causes this problem. I don't know what hardware to replace in order to avoid the problem. I don't know what to do. Maybe I should avoid Gigabyte motherboards.

Comment 17 Aoife Moloney 2023-11-23 01:37:14 UTC
This message is a reminder that Fedora Linux 37 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 37 on 2023-12-05.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '37'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 37 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 18 Douglas 2024-06-02 03:37:44 UTC
@voj-tech Do you still have this motherboard and this problem? Could you try to disable the BIOS option "Race To Halt (RTH)"? I believe this fixed it for me.

Comment 19 Vojtech Sobota 2024-06-02 09:43:46 UTC
Good to know that there is a workaround. I've actually switched hardware since then, partly because of this issue, so I cannot verify on my side.

Comment 20 Douglas 2024-06-02 21:53:18 UTC
Nevermind. It happened again.


Note You need to log in before you can comment on or make changes to this bug.