Description of problem: Booting into kernel 4.17.2-200.fc28.x86_64, suspend by closing the lid of the laptop, system resumes normally. When trying to poweroff the screen goes black but the system is not down (fan working, led-indicator on the powerbutton on). Version-Release number of selected component (if applicable): kernel 4.17.2-200.fc28.x86_64 How reproducible: It happens all the time. Steps to Reproduce: 1. start fedora 28 2. suspend 3. resume 4. poweroff Actual results: the system freezes, is not powering off Expected results: the system powers off Additional info: watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/2:3:583] Modules linked in: fuse rfcomm ccm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack devlink ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep sunrpc vfat fat arc4 snd_soc_skl snd_soc_skl_ipc joydev snd_hda_ext_core snd_soc_sst_dsp snd_soc_sst_ipc snd_soc_acpi snd_hda_codec_hdmi intel_rapl snd_soc_core snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp hid_multitouch spi_pxa2xx_platform snd_compress kvm_intel snd_pcm_dmaengine iwlmvm iTCO_wdt ac97_bus iTCO_vendor_support kvm mac80211 irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf iwlwifi btusb btrtl btbcm uvcvideo btintel videobuf2_vmalloc bluetooth videobuf2_memops videobuf2_v4l2 cfg80211 videobuf2_common asus_nb_wmi videodev asus_wmi snd_hda_intel snd_hda_codec sparse_keymap wmi_bmof media snd_hda_core ecdh_generic rfkill snd_hwdep snd_seq snd_seq_device snd_pcm int3403_thermal snd_timer mei_me snd int3400_thermal acpi_thermal_rel idma64 mei soundcore processor_thermal_device i2c_i801 intel_lpss_pci int340x_thermal_zone intel_lpss intel_pch_thermal intel_soc_dts_iosf shpchp asus_wireless acpi_pad binfmt_misc nouveau i915 ttm i2c_algo_bit drm_kms_helper mxm_wmi drm serio_raw i2c_hid crc32c_intel wmi video CPU: 2 PID: 583 Comm: kworker/2:3 Tainted: G W L 4.17.2-200.fc28.x86_64 #1 Hardware name: ASUSTeK COMPUTER INC. X510UQR/X510UQR, BIOS X510UQR.301 09/25/2017 Workqueue: rcu_gp wait_rcu_exp_gp RIP: 0010:smp_call_function_single+0x88/0xf0 RSP: 0018:ffffc27a81197de0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 RAX: 0000000000000001 RBX: ffffffffb525dd00 RCX: 0000000000000000 RDX: ffffffffb525dd00 RSI: ffffffffb411f1f0 RDI: 0000000000000006 RBP: ffffc27a81197e28 R08: ffff9db92eca1e00 R09: 0000000000000040 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9db92eda1b80 R13: 0000000000000040 R14: 0000000000000006 R15: 0000000000000040 FS: 0000000000000000(0000) GS:ffff9db92ec80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fda7e9aec90 CR3: 000000023220a005 CR4: 00000000003606e0 Call Trace:
Forgot to mention that powering off without suspending first, is working properly.
I have the same problem. A second suspend also causes the issue. Switching back to 4.16.16-300 fixes it for now. I would raise the severity of this issue. Not thinking about it I put my laptop into my backpack and went home, only to find it really hot some hours later.
As a side note: I do not get any watchdog output in my logs. I get several ACPI errors. But I am getting these with both 4.16 and 4.17
I tried some of the techniques from https://www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt echo devices > /sys/power/pm_test echo mem > /sys/power/state This works for the first try, but not for the next one. Sometimes the screen stays enabled, but is frozen. I will do some more testing later. My system is a Dell Latitude E7440, intel graphics
I have a very similar issue on a Dell XPS 9570 - the symptoms are exactly the same. Suspend stopped working following the upgrade, as described by the OP. However: I am running the third party NVIDIA drivers for the NVIDIA GeForce GTX 1050 included in the laptop. The drivers were installed from the gnome software center using the third party repo rpmfusion. As a troubleshooting step, I did a fresh install of Fedora 28 on the laptop. From this point on and using the base Intel video card, suspend was working well again. I upgraded all the packages, including the kernel, but without installing the NVDIA drivers, suspend still worked fine on kernel 4.17.2-200. I then proceeded to install the NVIDIA drivers as described above, and suspend stopped working. Just to be clear, this was working fine with the NVIDIA drivers on the previous kernel version: 4.16.3-301. Even now, when I boot back into 4.16.3-301, the NVIDIA drivers are enabled and suspend works fine.
4.17.4-200 does not solve it.
I got some more info on this. 1. Boot kernel 4.17.7-200.fc28.x86_64 2. switch to VT2 and log in 3. suspend by closing lid 4. wake up by closing lid -> messages are printed on console: [ 406.976821] mei_wdt mei::05b79a6f-4628-4d7f-899d-a91514cb32ab:01: get hw module failed [ 406.976822] mei_wdt mei::05b79a6f-4628-4d7f-899d-a91514cb32ab:01: Could not enable cl device 5. lsmod | grep mei pn544_mei 16384 0 mei_phy 16384 1 pn544_mei pn544 20480 1 pn544_mei hci 53248 2 mei_phy,pn544 mei_wdt 16384 0 mei_me 45056 -1 mei 110592 5 mei_wdt,mei_phy,mei_me,pn544_mei I tried to unload all these modules When unloading pn544, I get the following error: [ 200.015393] Removing pn544 [ 200.015761] WARNING: CPU: 2 PID: 2424 at kernel/module.c:1142 module_put+0x80/0x90 [ 200.015764] Modules linked in: ccm xt_CHECKSUM tun ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_addrtype br_netfilter qcserial usb_wwan devlink ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables overlay bnep vfat fat arc4 pn544_mei(-) mei_phy pn544 hci nfc iTCO_wdt iTCO_vendor_support mei_wdt intel_rapl dell_wmi wmi_bmof sparse_keymap x86_pkg_temp_thermal intel_powerclamp ppdev coretemp dell_laptop kvm_intel dell_smbios iwlmvm dell_wmi_descriptor dcdbas dell_smm_hwmon [ 200.015883] kvm mac80211 irqbypass intel_cstate intel_uncore intel_rapl_perf btusb iwlwifi uvcvideo btrtl snd_hda_codec_realtek btbcm btintel videobuf2_vmalloc snd_hda_codec_hdmi videobuf2_memops videobuf2_v4l2 snd_hda_codec_generic videobuf2_common bluetooth cfg80211 snd_hda_intel joydev videodev snd_hda_codec i2c_i801 snd_hda_core ecdh_generic snd_hwdep snd_seq media snd_seq_device shpchp lpc_ich snd_pcm mei_me snd_timer mei snd soundcore wmi parport_pc parport dell_smo8800 dell_rbtn rfkill binfmt_misc dm_crypt cdc_mbim cdc_wdm cdc_ncm usbnet mii i915 uas usb_storage crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit ghash_clmulni_intel drm_kms_helper sdhci_pci cqhci drm sdhci e1000e serio_raw mmc_core video i2c_dev [ 200.016017] CPU: 2 PID: 2424 Comm: rmmod Not tainted 4.17.7-200.fc28.x86_64 #1 [ 200.016020] Hardware name: Dell Inc. Latitude E7440/07F3F4, BIOS A25 02/01/2018 [ 200.016028] RIP: 0010:module_put+0x80/0x90 [ 200.016032] RSP: 0018:ffffb198089d3df0 EFLAGS: 00010297 [ 200.016037] RAX: ffffffffc07ae850 RBX: ffff8e5d7b734800 RCX: 00000000ffffffff [ 200.016040] RDX: 0000000000000000 RSI: ffffeff1d016af00 RDI: ffffffffc07aefc0 [ 200.016044] RBP: ffff8e5d94993e00 R08: ffff8e5dc5abc360 R09: 00000001802a0028 [ 200.016047] R10: ffffeff1d016af00 R11: 00000000ffffff00 R12: ffff8e5dc72c1900 [ 200.016050] R13: ffff8e5dc72c1828 R14: 0000000000000000 R15: 0000000000000000 [ 200.016056] FS: 00007f64f0f120c0(0000) GS:ffff8e5ddeb00000(0000) knlGS:0000000000000000 [ 200.016060] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 200.016070] CR2: 0000559b97861e08 CR3: 0000000409fc0004 CR4: 00000000001606e0 [ 200.016073] Call Trace: [ 200.016099] mei_cldev_disable+0x5d/0xd0 [mei] [ 200.016111] nfc_mei_phy_free+0x11/0x20 [mei_phy] [ 200.016119] pn544_mei_remove+0x2b/0x2f [pn544_mei] [ 200.016133] mei_cl_device_remove+0x37/0x70 [mei] [ 200.016146] device_release_driver_internal+0x15a/0x220 [ 200.016154] driver_detach+0x32/0x5f [ 200.016162] bus_remove_driver+0x74/0xc6 [ 200.016178] mei_cldev_driver_unregister+0xe/0x30 [mei] [ 200.016186] __x64_sys_delete_module+0x139/0x270 [ 200.016197] do_syscall_64+0x5b/0x160 [ 200.016208] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 200.016220] RIP: 0033:0x7f64f04199e7 [ 200.016223] RSP: 002b:00007ffc43ffcc38 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 200.016229] RAX: ffffffffffffffda RBX: 0000559b97857860 RCX: 00007f64f04199e7 [ 200.016232] RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559b978578c8 [ 200.016236] RBP: 0000000000000000 R08: 00007ffc43ffbbb1 R09: 0000000000000000 [ 200.016239] R10: 00007f64f0489f00 R11: 0000000000000206 R12: 00007ffc43ffce60 [ 200.016242] R13: 00007ffc43ffdcdc R14: 0000559b97857260 R15: 0000559b97857860 [ 200.016246] Code: 74 23 48 8b 45 00 48 89 fb 48 8b 7d 08 48 83 c5 18 4c 89 e2 48 89 de e8 bf 1f ac 00 48 8b 45 00 48 85 c0 75 e4 5b 5d 41 5c c3 c3 <0f> 0b eb a5 89 c2 eb 8c 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 [ 200.016354] ---[ end trace 17ee821c1ea0ca46 ]--- When I unload all the modules (including mei_me) BEFORE doing the suspend, I get no error and the subsequent poweroff works.
Created attachment 1468511 [details] mei error Step to reproduce this: 1. Boot 2. Switch to VT2 and login 3. suspend once by closing lid 4. opening lid 5. execute "poweroff"
blacklisting mei_me via /etc/modprobe.d solves this issue with the second suspend or poweroff But with mei_me disabled, I am running into some other issues after resume, like not starting xhci driver [ 72.614840] xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead [ 72.619297] xhci_hcd 0000:00:14.0: HC died; cleaning up
With kernel 4.17.11-200.fc28.x86_64, the oops is gone (and also the USB problems described in my last comment disappeared), but without blacklisting mei_me, the system still hangs on second suspend. So I will continue blacklisting the mei_me...
4.17.12-200 does not solve it. I also should switch back to 4.16.16-301 in order to fix it....
Does blacklisting mei_me solves it for you? cat > /etc/modprobe.d/blacklist-mei.conf << EOF blacklist mei_me EOF
4.17.14-202 does not solve it. If I blacklist the module mei_me this fix me the issue, thanks @Georg Müller!!!!
Ok. Good to hear. Question is if it is mei itself or one of the modules depending on it. With mei_me blacklisted, I also lose pn544 NFC driver which I currently not use. There were only three commits in the mei subdirectory in the 4.17 development cycle: $ git shortlog v4.16..v4.17 -- drivers/misc/mei/ Alexander Usyskin (1): mei: limit the number of queued writes Colin Ian King (1): mei: remove dev_err message on an unsupported ioctl Tomas Winkler (1): mei: make module referencing local to the bus.c One of them was just a removed log message, the other two are a bit bigger. I will try to revert both of them in a local build and see what happens. If this fixes it, I will try to undo only one of them. My guess would be the commit "mei: limit the number of queued writes" The commit message states: "Limit the number of queued writes per client. Writes above this threshold are blocked till place in the transmit queue is available." Maybe it blocks because the transmitting is already suspended. But hey, just a guess. I will check.
Related: https://bugzilla.redhat.com/show_bug.cgi?id=1597481
I think this is not just related, this looks like a duplicate. As mentioned in bug 1597481, reverting commit 257355a44b9929e55d6fd47bfff66971dc4de948 (mei: make module referencing local to the bus.c) solved it for me.
Just a question: I didn't get a chance to try blacklisting mei, but was wondering if the revert of the above commit will be included in a future kernel build? I assume so...? Thanks!
There are already patches sent to lkml and stable, so they are hopefully included in 4.17.20. Please see bug 1597481.
Thanks Georg, I am not yet familiar with the method to determine which version of the kernel a patch is included in (I'm reading up on it). Thanks again for the version number - and for troubleshooting this issue.
The patches are now 2 weeks old and still nobody merged them into the mainline kernel or stable. Can the patches please be included in the next 4.18.x release of fedora? https://bugzilla.kernel.org/attachment.cgi?id=278055 https://bugzilla.kernel.org/attachment.cgi?id=278057
Kernel 4.18.10 contains the necessary patches which solve the issue
Thanks for the update!
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 28 kernel bugs. Fedora 28 has now been rebased to 4.20.5-100.fc28. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 29, and are still experiencing this issue, please change the version to Fedora 29. If you experience different issues, please open a new bug report for those.
*********** MASS BUG UPDATE ************** This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.