Bug 1602808 - System fails to complete a suspend to ram
Summary: System fails to complete a suspend to ram
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 28
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-18 14:35 UTC by Robert Moskowitz
Modified: 2018-10-21 09:02 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-21 07:24:44 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Robert Moskowitz 2018-07-18 14:35:39 UTC
Description of problem:

Running F28 with Xfce desktop.  Use the Xfce system tray Suspend choice.  Depending on what is running on the system, the suspend fails and I have to hard power cycle and do a complete boot.



Version-Release number of selected component (if applicable):

I have had this problem since kernel 4.17.4.  4.17.3 does not have the problem.


How reproducible:

I suspend many times a day at a conference, perhaps a dozen times per day.  It is unclear what causes the failure, as I can go through a number of suspend/resume cycles before it fails.  I *THINK* it occurs when I have QEMU running with a VM running and active connections.  If you look at the messages, the last message on a hang is before a normal suspend's stopping the network.


Steps to Reproduce:
1.  Get lots of things running.  Particularly QEMU with a VM with an active IP connection.
2.  Suspend using the Xfce system tray suspend (I have not tried command line)
3.  Suspend locks and have to physically power off

Actual results:


Expected results:


Additional info:

Here are the last messages on a failed suspend:

Jul 18 08:08:17 lx121e audit: NETFILTER_CFG table=mangle family=2 entries=41
Jul 18 08:08:17 lx121e audit: NETFILTER_CFG table=raw family=2 entries=28
Jul 18 08:08:17 lx121e systemd[1]: Reached target Sleep.
Jul 18 08:08:17 lx121e systemd[1]: Starting Suspend...
Jul 18 08:08:17 lx121e audit: NETFILTER_CFG table=filter family=10 entries=86
Jul 18 08:08:17 lx121e audit: NETFILTER_CFG table=nat family=10 entries=52
Jul 18 08:08:17 lx121e audit: NETFILTER_CFG table=mangle family=10 entries=40
Jul 18 08:08:17 lx121e audit: NETFILTER_CFG table=raw family=10 entries=31
Jul 18 08:08:17 lx121e systemd-sleep[15277]: Suspending system...
Jul 18 08:08:17 lx121e kernel: PM: suspend entry (deep)

Here are the similar messages from an earlier successful suspend/resume:

Jul 15 09:54:11 lx121e systemd[1]: Started Network Manager Script Dispatcher Service.
Jul 15 09:54:12 lx121e kernel: IPv6: ADDRCONF(NETDEV_UP): wlp4s0: link is not ready
Jul 15 09:54:12 lx121e dnsmasq[1197]: no servers found in /etc/resolv.conf, will retry
Jul 15 09:54:12 lx121e NetworkManager[714]: <warn>  [1531662852.1546] sup-iface[0x55af9d3d6c30,wlp4s0]: connection disconnected (reason -3)
Jul 15 09:54:12 lx121e NetworkManager[714]: <info>  [1531662852.1610] device (wlp4s0): state change: disconnected -> unmanaged (reason 'sleeping', sys-iface-state: 'managed')
Jul 15 09:54:12 lx121e NetworkManager[714]: <info>  [1531662852.1823] device (wlp4s0): set-hw-addr: reset MAC address to EC:55:F9:C7:8C:DF (unmanage)
Jul 15 09:54:12 lx121e nm-dispatcher[3232]: req:2 'down' [wlp4s0]: new request (5 scripts)
Jul 15 09:54:12 lx121e wpa_supplicant[901]: nl80211: deinit ifname=wlp4s0 disabled_11b_rates=0
Jul 15 09:54:12 lx121e systemd[1]: Reached target Sleep.
Jul 15 09:54:12 lx121e systemd[1]: Starting Suspend...
Jul 15 09:54:12 lx121e systemd-sleep[3252]: Suspending system...
Jul 15 09:54:12 lx121e kernel: PM: suspend entry (deep)
Jul 15 09:54:12 lx121e nm-dispatcher[3232]: req:2 'down' [wlp4s0]: start running ordered scripts...
Jul 15 09:54:12 lx121e kernel: PM: Syncing filesystems ... done.
Jul 15 09:54:12 lx121e chronyd[611]: Source 162.213.2.253 offline
Jul 15 09:54:12 lx121e chronyd[611]: Source 195.21.137.209 offline
Jul 15 09:54:12 lx121e chronyd[611]: Source 199.233.236.226 offline
Jul 15 09:54:12 lx121e chronyd[611]: Source 24.248.217.23 offline
Jul 15 09:54:12 lx121e chronyd[611]: Source 45.33.84.208 offline
Jul 15 09:54:12 lx121e chronyd[611]: Source 64.113.44.54 offline
Jul 15 09:54:12 lx121e chronyd[611]: Source 69.89.207.199 offline

...

Jul 15 14:16:49 lx121e kernel: Freezing user space processes ... (elapsed 0.005 seconds) done.
Jul 15 14:16:49 lx121e kernel: OOM killer disabled.
Jul 15 14:16:49 lx121e kernel: Freezing remaining freezable tasks ... (elapsed 0.105 seconds) done.
Jul 15 14:16:49 lx121e kernel: Suspending console(s) (use no_console_suspend to debug)
Jul 15 14:16:49 lx121e kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Jul 15 14:16:49 lx121e kernel: sd 0:0:0:0: [sda] Stopping disk
Jul 15 14:16:49 lx121e kernel: PM: suspend devices took 0.730 seconds
Jul 15 14:16:49 lx121e kernel: ACPI: EC: interrupt blocked
Jul 15 14:16:49 lx121e kernel: ACPI: Preparing to enter system sleep state S3
Jul 15 14:16:49 lx121e kernel: ACPI: EC: event blocked
Jul 15 14:16:49 lx121e kernel: ACPI: EC: EC stopped
Jul 15 14:16:49 lx121e kernel: PM: Saving platform NVS memory
Jul 15 14:16:49 lx121e kernel: Disabling non-boot CPUs ...
Jul 15 14:16:49 lx121e kernel: IRQ 17: no longer affine to CPU1
Jul 15 14:16:49 lx121e kernel: smpboot: CPU 1 is now offline
Jul 15 14:16:49 lx121e kernel: ACPI: Low-level resume complete
Jul 15 14:16:49 lx121e kernel: ACPI: EC: EC started
Jul 15 14:16:49 lx121e kernel: PM: Restoring platform NVS memory
Jul 15 14:16:49 lx121e kernel: LVT offset 0 assigned for vector 0x400
Jul 15 14:16:49 lx121e kernel: microcode: CPU0: new patch_level=0x05000029

Comment 1 Tony Nelson 2018-07-26 12:11:27 UTC
Suspend to Disk (Hibernate) by the power button fails to turn off the computer on my Acer Aspire E15, but the data is on disk and after a forcible power-off, the session resumes from disk when the computer is powered on.  This worked on F25, and works again when booted from the F25 kernel.  Fully updated F28, XFCE, 4.17.7 kernel.

This seems to me to be a manifestation of the same problem, of not completing the operation of suspending or hibernating, after doing all the setup for it.

Comment 2 Robert Moskowitz 2018-08-23 10:53:48 UTC
Seems to have been fixed since the 4.17.11 kernel.  Running on the 4.17.14 kernel now and still successfully suspending with one caveat:

It is common that the suspend does not 'take'.  That is, shortly after going into suspend, the system restarts.  I unlock the system screensaver, suspend again, and this time it stays suspended.

Just plain weird.

Comment 3 Laura Abbott 2018-10-01 21:24:22 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 28 kernel bugs.
 
Fedora 28 has now been rebased to 4.18.10-300.fc28.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 29, and are still experiencing this issue, please change the version to Fedora 29.
 
If you experience different issues, please open a new bug report for those.

Comment 4 Tony Nelson 2018-10-01 23:27:46 UTC
Suspend to RAM has completed for a while now, so it is fixed for me with a kernel update (don't remember which one).  I just assumed y'all had fixed it on purpose and didn't remember to speak up and say it worked again.

Comment 5 Robert Moskowitz 2018-10-15 22:26:06 UTC
Seems all is well now, and has been for a few kernel releases.

The first suspend after a new boot always seems to take time.  Here is the suspend I did yesterday afternoon.  Note that later, I did 2 suspends and they went fast to suspend state:

Oct 14 16:46:49 lx121e kernel: Freezing user space processes ... (elapsed 0.028 
seconds) done.
Oct 14 16:46:49 lx121e kernel: OOM killer disabled.
Oct 14 16:46:49 lx121e kernel: Freezing remaining freezable tasks ... (elapsed 0
.060 seconds) done.
Oct 14 16:46:49 lx121e kernel: Suspending console(s) (use no_console_suspend to 
debug)
Oct 14 16:46:49 lx121e kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Oct 14 16:46:49 lx121e kernel: sd 0:0:0:0: [sda] Stopping disk
Oct 14 16:46:49 lx121e kernel: PM: suspend devices took 42.800 seconds
Oct 14 16:46:49 lx121e kernel: ------------[ cut here ]------------
Oct 14 16:46:49 lx121e kernel: Component: suspend devices, time: 42800
Oct 14 16:46:49 lx121e kernel: WARNING: CPU: 0 PID: 20112 at kernel/power/suspen
d_test.c:55 suspend_test_finish+0x6b/0x70
Oct 14 16:46:49 lx121e kernel: Modules linked in: fuse cp210x xfs btrfs xor zstd
_compress raid6_pq zstd_decompress xxhash vhost_net vhost tap xt_CHECKSUM ipt_MA
SQUERADE tun ccm rfcomm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack de
vlink ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf
_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table
_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_con
ntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebta
bles ip6table_filter ip6_tables bnep sunrpc vfat fat arc4 rtl8192ce rtl_pci rtl8
192c_common rtlwifi mac80211 kvm_amd kvm wmi_bmof cfg80211 irqbypass uvcvideo vi
deobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev snd_hd
a_codec_conexant snd_hda_codec_generic
Oct 14 16:46:49 lx121e kernel: snd_hda_codec_hdmi btusb k10temp snd_hda_intel sn
d_hda_codec media btrtl btbcm btintel bluetooth snd_hda_core joydev snd_hwdep sn
d_seq snd_seq_device snd_pcm thinkpad_acpi ecdh_generic snd_timer sp5100_tco i2c
_piix4 snd soundcore rfkill wmi video pcc_cpufreq acpi_cpufreq radeon i2c_algo_b
it drm_kms_helper serio_raw ums_realtek uas ttm usb_storage drm r8169 mii
Oct 14 16:46:50 lx121e kernel: CPU: 0 PID: 20112 Comm: systemd-sleep Not tainted
 4.18.12-200.fc28.x86_64 #1
Oct 14 16:46:50 lx121e kernel: Hardware name: LENOVO 0611AH9/0611AH9, BIOS 8FET3
3WW (1.17 ) 11/07/2012
Oct 14 16:46:50 lx121e kernel: RIP: 0010:suspend_test_finish+0x6b/0x70
Oct 14 16:46:50 lx121e kernel: Code: 06 69 c2 e8 03 00 00 29 c1 e8 0f a4 00 00 8
1 fd 10 27 00 00 77 03 5b 5d c3 89 ea 48 89 de 48 c7 c7 b9 29 0c 8e e8 ff 77 fa 
ff <0f> 0b eb e8 90 0f 1f 44 00 00 0f b6 05 f9 6d 89 01 c3 0f 1f 00 0f 
Oct 14 16:46:50 lx121e kernel: RSP: 0018:ffffb6814543fd30 EFLAGS: 00010286
Oct 14 16:46:50 lx121e kernel: RAX: 0000000000000000 RBX: ffffffff8e0c274e RCX: 
0000000000000006
Oct 14 16:46:50 lx121e kernel: RDX: 0000000000000007 RSI: 0000000000000082 RDI: 
ffff9ff11ec16930
Oct 14 16:46:50 lx121e kernel: RBP: 000000000000a730 R08: 00005bb8bfc22db8 R09: 
0000000000000027
Oct 14 16:46:50 lx121e kernel: R10: 00000000000004ee R11: 0000000000023ae4 R12: 
0000000000000000
Oct 14 16:46:50 lx121e kernel: R13: 0000000000000004 R14: ffffffff8e99ce80 R15: 
0000000000000003
Oct 14 16:46:50 lx121e kernel: FS:  00007f0375c39940(0000) GS:ffff9ff11ec00000(0
000) knlGS:0000000000000000
Oct 14 16:46:50 lx121e kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 14 16:46:50 lx121e kernel: CR2: 00007f6931ed988a CR3: 0000000008f06000 CR4: 
00000000000006f0
Oct 14 16:46:50 lx121e kernel: Call Trace:
Oct 14 16:46:50 lx121e kernel: suspend_devices_and_enter+0x11e/0x7f0
Oct 14 16:46:50 lx121e kernel: pm_suspend.cold.4+0x33c/0x392
Oct 14 16:46:50 lx121e kernel: state_store+0x80/0xe0
Oct 14 16:46:50 lx121e kernel: kernfs_fop_write+0x116/0x190
Oct 14 16:46:50 lx121e kernel: __vfs_write+0x36/0x190
Oct 14 16:46:50 lx121e kernel: ? selinux_file_permission+0xf0/0x130
Oct 14 16:46:50 lx121e kernel: vfs_write+0xa5/0x1a0
Oct 14 16:46:50 lx121e kernel: ksys_write+0x4f/0xb0
Oct 14 16:46:50 lx121e kernel: do_syscall_64+0x5b/0x160
Oct 14 16:46:50 lx121e kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 14 16:46:50 lx121e kernel: RIP: 0033:0x7f03750ebef4
Oct 14 16:46:50 lx121e kernel: Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 8
4 00 00 00 00 00 66 90 48 8d 05 f1 3a 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 
05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53 
Oct 14 16:46:50 lx121e kernel: RSP: 002b:00007fff86030c18 EFLAGS: 00000246 ORIG_
RAX: 0000000000000001
Oct 14 16:46:50 lx121e kernel: RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 
00007f03750ebef4
Oct 14 16:46:50 lx121e kernel: RDX: 0000000000000004 RSI: 00005637a6841ed0 RDI: 
0000000000000004
Oct 14 16:46:50 lx121e kernel: RBP: 00005637a6841ed0 R08: 00005637a6840390 R09: 
00007f0375c39940
Oct 14 16:46:50 lx121e kernel: R10: 000000000000000a R11: 0000000000000246 R12: 
00005637a68402b0
Oct 14 16:46:50 lx121e kernel: R13: 0000000000000004 R14: 00007f03753b6760 R15: 
0000000000000004
Oct 14 16:46:50 lx121e kernel: ---[ end trace 10ba2307caba4d26 ]---
Oct 14 16:46:50 lx121e kernel: ACPI: EC: interrupt blocked
Oct 14 16:46:50 lx121e kernel: ACPI: Preparing to enter system sleep state S3
Oct 14 16:46:50 lx121e kernel: ACPI: EC: event blocked
Oct 14 16:46:50 lx121e kernel: ACPI: EC: EC stopped
Oct 14 16:46:50 lx121e kernel: PM: Saving platform NVS memory
Oct 14 16:46:50 lx121e kernel: Disabling non-boot CPUs ...
Oct 14 16:46:50 lx121e kernel: IRQ 19: no longer affine to CPU1
Oct 14 16:46:50 lx121e kernel: smpboot: CPU 1 is now offline
Oct 14 16:46:50 lx121e kernel: ACPI: Low-level resume complete
Oct 14 16:46:50 lx121e kernel: ACPI: EC: EC started

Comment 6 Robert Moskowitz 2018-10-15 22:37:36 UTC
Forgot to include, this is Fedora 28, kernel 4.18.12-200.fc28.x86_64

Comment 7 Laura Abbott 2018-10-21 07:24:44 UTC
Since it seems like things are mostly working, I'm going to close this bug. That warning just means suspend took a long time.

Comment 8 Robert Moskowitz 2018-10-21 09:02:19 UTC
Fine with me.  Thank you.


Note You need to log in before you can comment on or make changes to this bug.