Bug 1834277 - Dell XPS 13 9300 (2020) hangs with black screen on shutdown or reboot and 5.7.0-0.rc4
Summary: Dell XPS 13 9300 (2020) hangs with black screen on shutdown or reboot and 5.7...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 31
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-11 12:46 UTC by Marc C
Modified: 2020-11-24 17:21 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-24 17:21:35 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Kernel log @ ticket creation (101.54 KB, text/plain)
2020-05-11 12:46 UTC, Marc C
no flags Details
kernel log for s2idle suspend test (comment 6) (91.95 KB, text/plain)
2020-05-11 21:30 UTC, Marc C
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 206571 0 None None None 2020-05-14 20:29:18 UTC

Internal Links: 1848224

Description Marc C 2020-05-11 12:46:17 UTC
Created attachment 1687305 [details]
Kernel log @ ticket creation

1. Please describe the problem:

At the moment a reboot or shutdown won't fully complete.  Systemctl finishes properly without any errors, then the laptop just doesn't turn off (or reset).  Black screen, keyboard backlight will respond when touched.  Press and hold power for >8 seconds and the machine is forced off.  It's like Linux can't get the laptop to reset or power off!

Creating new "me too" bug as per Steve on bug # 1825298


2. What is the Version-Release number of the kernel:

5.6.8-200.fc31.x86_64


3. Did it work previously in Fedora? 

Unknown.  Brand new laptop from March with fresh FC31 installed.  Has not functioned with kernel 5.4 or 5.5, and now 5.6


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

It can be reproduced simply be attempting to reboot or shutdown the laptop, as described in the description. 


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Unknown, but can attempt and report back.


6. Are you running any modules that not shipped with directly Fedora's kernel?:

No


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Attached as dmesg.txt

Comment 1 Steve 2020-05-11 16:31:56 UTC
Thanks for your report and for attaching the log. As expected, the log is truncated, however it does show:

$ egrep -n 'PM: suspend entry|serio1: Failed' dmesg.txt 
1182:May 08 17:51:33 kernel: PM: suspend entry (s2idle)
1189:May 09 12:51:42 kernel: psmouse serio1: Failed to disable mouse on isa0060/serio1

There is another bug report* with a Dell system suspending with "s2idle" instead of with "deep". Could you post the output from:

$ grep -s . /sys/power/*

And "serio1" is the touchpad:

$ grep 'input:.*serio1' dmesg.txt 
May 08 09:56:59 kernel: input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio1/input/input5

For the record:

$ grep 'DMI:' dmesg.txt 
May 08 09:56:59 kernel: DMI: Dell Inc. XPS 13 9300/08MT7P, BIOS 1.0.7 02/26/2020

* Bug 1812010 - Kernel 5.5.7-200 does not suspend on Dell XPS-13

Comment 2 Steve 2020-05-11 17:16:59 UTC
Do you know what this USB device could be? The vendor/device ID, [9636:9311], is not in the database here:

$ grep '9636' /usr/share/hwdata/usb.ids

$ fgrep -n 'usb 3-8.2:' dmesg.txt 
842:May 08 09:56:59 kernel: usb 3-8.2: new full-speed USB device number 7 using xhci_hcd
844:May 08 09:56:59 kernel: usb 3-8.2: not running at top speed; connect to a high speed hub
845:May 08 09:56:59 kernel: usb 3-8.2: New USB device found, idVendor=9636, idProduct=9311, bcdDevice= 2.01
846:May 08 09:56:59 kernel: usb 3-8.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
847:May 08 09:56:59 kernel: usb 3-8.2: Product: USB C Video Adaptor      
848:May 08 09:56:59 kernel: usb 3-8.2: Manufacturer: USB C  
849:May 08 09:56:59 kernel: usb 3-8.2: SerialNumber: 000000000001
1126:May 08 16:03:42 kernel: usb 3-8.2: USB disconnect, device number 7
1223:May 09 12:51:42 kernel: usb 3-8.2: new full-speed USB device number 10 using xhci_hcd
1224:May 09 12:51:42 kernel: usb 3-8.2: not running at top speed; connect to a high speed hub
1225:May 09 12:51:42 kernel: usb 3-8.2: New USB device found, idVendor=9636, idProduct=9311, bcdDevice= 2.01
1226:May 09 12:51:42 kernel: usb 3-8.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
1227:May 09 12:51:42 kernel: usb 3-8.2: Product: USB C Video Adaptor      
1228:May 09 12:51:42 kernel: usb 3-8.2: Manufacturer: USB C  
1229:May 09 12:51:42 kernel: usb 3-8.2: SerialNumber: 000000000001

Comment 3 Marc C 2020-05-11 19:41:55 UTC
(In reply to Steve from comment #1)
> Thanks for your report and for attaching the log. As expected, the log is
> truncated, however it does show:
> 
> $ egrep -n 'PM: suspend entry|serio1: Failed' dmesg.txt 
> 1182:May 08 17:51:33 kernel: PM: suspend entry (s2idle)
> 1189:May 09 12:51:42 kernel: psmouse serio1: Failed to disable mouse on
> isa0060/serio1
> 
> There is another bug report* with a Dell system suspending with "s2idle"
> instead of with "deep". Could you post the output from:
> 
> $ grep -s . /sys/power/*

grep -s . /sys/power/*

/sys/power/disk:[platform] shutdown reboot suspend test_resume 
/sys/power/image_size:3143057408
/sys/power/mem_sleep:[s2idle] deep
/sys/power/pm_async:1
/sys/power/pm_debug_messages:0
/sys/power/pm_freeze_timeout:20000
/sys/power/pm_print_times:0
/sys/power/pm_test:[none] core processors platform devices freezer
/sys/power/pm_trace:0
/sys/power/pm_trace_dev_match:platform
/sys/power/pm_trace_dev_match:acpi
/sys/power/pm_wakeup_irq:9
/sys/power/reserved_size:1048576
/sys/power/resume:253:2
/sys/power/resume_offset:0
/sys/power/state:freeze mem disk
/sys/power/sync_on_suspend:1
/sys/power/wakeup_count:25870

Comment 4 Marc C 2020-05-11 19:46:56 UTC
(In reply to Steve from comment #2)
> Do you know what this USB device could be? The vendor/device ID,
> [9636:9311], is not in the database here:
> 
> $ grep '9636' /usr/share/hwdata/usb.ids
> 
> $ fgrep -n 'usb 3-8.2:' dmesg.txt 
> 842:May 08 09:56:59 kernel: usb 3-8.2: new full-speed USB device number 7
> using xhci_hcd
> 844:May 08 09:56:59 kernel: usb 3-8.2: not running at top speed; connect to
> a high speed hub
> 845:May 08 09:56:59 kernel: usb 3-8.2: New USB device found, idVendor=9636,
> idProduct=9311, bcdDevice= 2.01
> 846:May 08 09:56:59 kernel: usb 3-8.2: New USB device strings: Mfr=1,
> Product=2, SerialNumber=3
> 847:May 08 09:56:59 kernel: usb 3-8.2: Product: USB C Video Adaptor      
> 848:May 08 09:56:59 kernel: usb 3-8.2: Manufacturer: USB C  
> 849:May 08 09:56:59 kernel: usb 3-8.2: SerialNumber: 000000000001
> 1126:May 08 16:03:42 kernel: usb 3-8.2: USB disconnect, device number 7
> 1223:May 09 12:51:42 kernel: usb 3-8.2: new full-speed USB device number 10
> using xhci_hcd
> 1224:May 09 12:51:42 kernel: usb 3-8.2: not running at top speed; connect to
> a high speed hub
> 1225:May 09 12:51:42 kernel: usb 3-8.2: New USB device found, idVendor=9636,
> idProduct=9311, bcdDevice= 2.01
> 1226:May 09 12:51:42 kernel: usb 3-8.2: New USB device strings: Mfr=1,
> Product=2, SerialNumber=3
> 1227:May 09 12:51:42 kernel: usb 3-8.2: Product: USB C Video Adaptor      
> 1228:May 09 12:51:42 kernel: usb 3-8.2: Manufacturer: USB C  
> 1229:May 09 12:51:42 kernel: usb 3-8.2: SerialNumber: 000000000001


That USB device is a little aftermarket USBC dongle that contains a USB-C, USB A, and HDMI port for hooking up a mouse, monitor and charging through the USB C.  I have used the laptop without it, with no such changes to the shutdown/restart behaviour.  If you'd like me to perform a complete startup and attempted restart or shutdown without the dongle plugged in, just let me know, followed by capturing dmesg logs, let me know.

The dongle on Amazon: https://www.amazon.ca/gp/product/B0829M7Z5H/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1

Comment 5 Steve 2020-05-11 20:49:21 UTC
> I have used the laptop without it, with no such changes to the shutdown/restart behaviour.

Thanks for reporting that.

> /sys/power/mem_sleep:[s2idle] deep

That confirms that you are seeing the same thing as in Bug 1812010.*

What you should be getting is this:

$ cat /sys/power/mem_sleep 
s2idle [deep]

(That's on my prime system with 5.6.10-100.fc30.x86_64.)

I'm not sure if it is directly related to the shutdown problem, but could you disconnect all external devices and try a suspend/resume cycle?

Also, Steven's results, which are summarized in Bug 1812010, Comment 24, show the expected behavior with 5.3.16-300. If you have that kernel installed, could you test both shutdown and suspend/resume with it?

* See, also: Bug 1812055 - Suspend cancelled

Comment 6 Marc C 2020-05-11 21:28:16 UTC
OK I can confirm that a full reboot cycle with nothing plugged in makes no difference.

With still nothing plugged in, I also tried a full suspend cycle via sudo systemctl suspend and while the screen went black, I really don't know if the laptop was suspended or not.  It did return me to the lock screen instead of directly to the desktop, but conversely, I only tapped the space bar to wake it back up?  I didn't need to use the power button at all.

The issue with "me & suspend" is that (a) the laptop has no power light to indicate it's ON vs ASLEEP, and (b) I don't use suspend at all.  Even on my old Zenbook I never suspended it.  I just shutdown at night and went home with it powered off.  I wouldn't know what suspend would look like in Linux if it punched me in the face :P


--

From this conversation:

https://www.dell.com/community/XPS/XPS-13-9300-Wake-from-sleep/td-p/7523159

People are disabling the "Sign of Life" options in the BIOS which gets their laptop to switch to [deep] instead of [s2idle].  I'm going to try that now and report back.  In the meantime, here's the dmesg output from the suspend cycle with s2idle.. 


I'll also see if I can pull down old 5.3 kernel and new 5.7 kernel for testing..


-Marc

Comment 7 Marc C 2020-05-11 21:30:07 UTC
Created attachment 1687475 [details]
kernel log for s2idle suspend test (comment 6)

Comment 8 Marc C 2020-05-11 21:46:45 UTC
As per the trailing remarks on comment 6,

I attempted to disable both "Sign of Life" settings in the BIOS to see if the default suspend mode would boot into "deep" as opposed to "s2idle".  It did not.

--

I then attempted to set the mode to [deep] by setting the value (as root) with 'echo deep > /sys/power/mem_sleep' and then 'systemctl suspend' only to have the laptop go completely dark and completely unresponsive.  I had to long-press hard reset it to come back.

Checking on 5.3 and 5.7 kernel next.

Comment 9 Steve 2020-05-11 22:41:59 UTC
> It did return me to the lock screen instead of directly to the desktop, ...

That should be configurable in the desktop settings (under "Privacy", with gnome). Also, with gnome, you have to press the "Alt" key to change the power button to a suspend button.

> (a) the laptop has no power light to indicate it's ON vs ASLEEP, and (b) I don't use suspend at all.  Even on my old Zenbook I never suspended it.  I just shutdown at night and went home with it powered off.  I wouldn't know what suspend would look like in Linux if it punched me in the face :P

There is a light on the front of the case below the touchpad.* When the system is suspended, the usual indication is a slowly blinking light.

In Bug 1812010, Comment 5, Steven says that sometimes the system resumes immediately after suspending. That would be a suspend/resume bug.

* See page 6 here:

Dell XPS 13 9300 Setup and Specifications
https://topics-cdn.dell.com/pdf/xps-13-9300-laptop_reference-guide_en-us.pdf

BTW, suspend can also be useful when moving the system a short distance, since there is less risk of rattling a hard drive, and for power-saving when not using the system for a while.

Comment 10 Steve 2020-05-11 22:57:10 UTC
> https://www.dell.com/community/XPS/XPS-13-9300-Wake-from-sleep/td-p/7523159
> People are disabling the "Sign of Life" options in the BIOS which gets their laptop to switch to [deep] instead of [s2idle].

Thanks for the tip! That sounds like a good, if arcane, workaround. The "Sign of Life" options are documented here:

Dell XPS 13 9300 Service Manual, page 46
https://topics-cdn.dell.com/pdf/xps-13-9300-laptop_service-manual_en-us.pdf

Comment 11 Steve 2020-05-11 23:17:35 UTC
(In reply to Marc C from comment #8)
> As per the trailing remarks on comment 6,
> 
> I attempted to disable both "Sign of Life" settings in the BIOS to see if the default suspend mode would boot into "deep" as opposed to "s2idle".  It did not.
> 
> --
> 
> I then attempted to set the mode to [deep] by setting the value (as root) with 'echo deep > /sys/power/mem_sleep' and then 'systemctl suspend' only to have the laptop go completely dark and completely unresponsive.  I had to long-press hard reset it to come back.
> 
> Checking on 5.3 and 5.7 kernel next.

Obviously, I didn't read your test results before posting Comment 10. :-) Thanks for running those tests.

Comment 12 Marc C 2020-05-11 23:42:40 UTC
OK, bringing all the thread topics together (it's 7:30PM, I'm done fussing with my Dell for the time being)..

Recap..
* Use kernels 5.4/5.5/5.6 and the machine doesn't reboot/shutdown (original ticket description)
* Even with all dongles/usb/power attachments unplugged [tested w/ 5.6.8]
* Regardless of BIOS "sign of life" settings, I can't get the sleep mode to [deep].  It always uses [s2idle]. [tested w/ 5.3.7, 5.6.8]
* Forcing the setting to [deep] and doing a systemctl suspend on 5.6.8 kernel puts the computer into a dead state.  Hard reset via long-power-press is the only way to get it back.


New Info..
* Installing and trying kernel 5.3.7 from FC31 and it does reboot/shutdown properly!  Wifi also works on that kernel version.  Graphics are broken though. Weird fuzz.  I digress since wifi+graphics are not the point of this ticket.
* Suspend in 5.3.7 is also s2idle after boot and exhibits the same behaviour as 5.6.8.  No idication that the light under the touchpad "pulses" because it is suspended.
* kernel 5.7 from Rawhide failed to install.  Post install scripts (something to do with depmod) just got stuck.  I ended up having to "kill -9" the processes and then uninstall the rawhide 5.7 kernel.  Perhaps trying FC33 kernel on FC31 was a bit too ambitious?

-Marc

Comment 13 Steve 2020-05-12 00:07:43 UTC
Thanks for testing with 5.3.7. It sounds like we have two regressions on Dells since 5.3.7 (reboot/shutdown, wifi).

> kernel 5.7 from Rawhide failed to install.  Post install scripts (something to do with depmod) just got stuck.

That was supposed to have been fixed.* A possible workaround is:

$ cd /usr/sbin
# mv -i weak-modules weak-modules.DISABLE

BTW, did you mean to leave "acpi_rev_override=1" on the kernel command-line?** That's a non-standard option, and ACPI plays a part in power management, so that option could affect test results.

* Bug 1825940 - kernel-core-5.7.0-0.rc1.20200416git9786cab67457.1 took very long time to install 

** Per attachment in Comment 7.

Comment 14 Steve 2020-05-12 00:43:55 UTC
> No idication that the light under the touchpad "pulses" because it is suspended.

Thanks for reporting that. There are Dell-specific device drivers that are supposed control the LEDs, so that sounds like a third bug.

$ lsmod | grep dell

$ find /lib/modules/$(uname -r) -name dell\*

Comment 15 Marc C 2020-05-12 03:23:12 UTC
re: comment 13

* I renamed /usr/bin/weak-modules and successfully installed 5.7 Kernel from rawhide.  
* I also reverted the kernel command line to standard options (see below) -- no acpi rev override.
Suspend via s2idle did the usual screen blanking, no pulsing light bar, AFAIK it suspended properly?   I'm using "sudo systemctl suspend" to trigger the suspend.  Let me know if you want something from journalctl sent over.

$ uname -r
5.7.0-0.rc4.20200508git79dede78c057.1.fc33.x86_64

$ cat /sys/power/mem_sleep 
[s2idle] deep

$ cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_SAVEDEFAULT=true
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="resume=/dev/mapper/fedora-swap rd.lvm.lv=fedora/root rd.luks.uuid=luks-3dccadab-6ea1-4e2f-a8a0-1e809f2f50d4 rd.lvm.lv=fedora/swap rhgb quiet"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true



re: comment #14, here's the output..

$ lsmod | grep dell
dell_laptop            32768  0
ledtrig_audio          16384  4 snd_hda_codec_generic,snd_hda_codec_realtek,snd_sof,dell_laptop
dell_wmi               20480  0
dell_smbios            36864  2 dell_wmi,dell_laptop
dcdbas                 20480  1 dell_smbios
dell_wmi_descriptor    20480  2 dell_wmi,dell_smbios
rfkill                 32768  8 bluetooth,dell_laptop,cfg80211
sparse_keymap          16384  2 intel_hid,dell_wmi
wmi                    36864  5 intel_wmi_thunderbolt,dell_wmi,wmi_bmof,dell_smbios,dell_wmi_descriptor
video                  53248  3 dell_wmi,dell_laptop,i915


$ find /lib/modules/$(uname -r) -name dell\*
/lib/modules/5.7.0-0.rc4.20200508git79dede78c057.1.fc33.x86_64/kernel/drivers/hwmon/dell-smm-hwmon.ko.xz
/lib/modules/5.7.0-0.rc4.20200508git79dede78c057.1.fc33.x86_64/kernel/drivers/platform/x86/dell-laptop.ko.xz
/lib/modules/5.7.0-0.rc4.20200508git79dede78c057.1.fc33.x86_64/kernel/drivers/platform/x86/dell-rbtn.ko.xz
/lib/modules/5.7.0-0.rc4.20200508git79dede78c057.1.fc33.x86_64/kernel/drivers/platform/x86/dell-smbios.ko.xz
/lib/modules/5.7.0-0.rc4.20200508git79dede78c057.1.fc33.x86_64/kernel/drivers/platform/x86/dell-smo8800.ko.xz
/lib/modules/5.7.0-0.rc4.20200508git79dede78c057.1.fc33.x86_64/kernel/drivers/platform/x86/dell-wmi-aio.ko.xz
/lib/modules/5.7.0-0.rc4.20200508git79dede78c057.1.fc33.x86_64/kernel/drivers/platform/x86/dell-wmi-descriptor.ko.xz
/lib/modules/5.7.0-0.rc4.20200508git79dede78c057.1.fc33.x86_64/kernel/drivers/platform/x86/dell-wmi-led.ko.xz
/lib/modules/5.7.0-0.rc4.20200508git79dede78c057.1.fc33.x86_64/kernel/drivers/platform/x86/dell-wmi.ko.xz

Comment 16 Steve 2020-05-12 04:38:31 UTC
> * I renamed /usr/bin/weak-modules and successfully installed 5.7 Kernel from rawhide.  
> * I also reverted the kernel command line to standard options (see below) -- no acpi rev override.
> Suspend via s2idle did the usual screen blanking, no pulsing light bar, ...

Thanks for testing with 5.7. Did you try a shutdown test?

> ... AFAIK it suspended properly?

That's a good question. On my desktop system, the fans stop spinning and the hard drive head parks. And the power light slowly blinks. On resume, the fans start and the optical drive cycles.

The Dell appears to have an SSD*, but can you hear a fan?

If you have Windows installed, what does it do?

> I'm using "sudo systemctl suspend" to trigger the suspend.

OK. I mentioned the way it is done with gnome because pressing the "Alt" key to suspend is not at all intuitive.

> Let me know if you want something from journalctl sent over.


* May 08 09:57:00 kernel:  nvme0n1: p1 p2 p3 p4 p5 p6 p7 p8

Comment 17 Steve 2020-05-12 13:38:08 UTC
Re Comment 15:

dell-wmi-led isn't being loaded, yet the kernel source code* appears to show that it controls the blinking LED when suspended.

See if loading it gets the LED to blink:

# modprobe dell-wmi-led

* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/platform/x86/dell-wmi-led.c?h=v5.7-rc4#n155

Comment 18 Marc C 2020-05-12 15:12:35 UTC
(In reply to Steve from comment #16)
> > * I renamed /usr/bin/weak-modules and successfully installed 5.7 Kernel from rawhide.  
> > * I also reverted the kernel command line to standard options (see below) -- no acpi rev override.
> > Suspend via s2idle did the usual screen blanking, no pulsing light bar, ...
> 
> Thanks for testing with 5.7. Did you try a shutdown test?

Yeah, shutdown, reboot, with no dongles or USB devices connected (running on battery) and it does all the work except actually power off.

> 
> > ... AFAIK it suspended properly?
> 
> That's a good question. On my desktop system, the fans stop spinning and the
> hard drive head parks. And the power light slowly blinks. On resume, the
> fans start and the optical drive cycles.
> 
> The Dell appears to have an SSD*, but can you hear a fan?

I tried a suspend this AM when the network and CPU's were under a bit of load, and I can report this.. CPU cooler continued to function (could hear it), but my home wifi and VPN connection to the office definitely disconnected.

> 
> If you have Windows installed, what does it do?

I did try this last night.  WHen on battery, suspend in Windows 10 shows no pulsing power bar.  When plugged into AC, the bar is lit up, which I think is the correct effect since I had been on battery for hours and it wanted to charge!

> 
> > I'm using "sudo systemctl suspend" to trigger the suspend.
> 
> OK. I mentioned the way it is done with gnome because pressing the "Alt" key
> to suspend is not at all intuitive.

Yes that is appreciated, because I had no idea.  I'm a long time KDE user, but I figured I'd give Gnome a try with this new laptop.  I had no idea how to suspend in Gnome.


> 
> > Let me know if you want something from journalctl sent over.
> 
> 
> * May 08 09:57:00 kernel:  nvme0n1: p1 p2 p3 p4 p5 p6 p7 p8

Comment 19 Marc C 2020-05-12 15:15:34 UTC
(In reply to Steve from comment #17)
> Re Comment 15:
> 
> dell-wmi-led isn't being loaded, yet the kernel source code* appears to show
> that it controls the blinking LED when suspended.
> 
> See if loading it gets the LED to blink:
> 
> # modprobe dell-wmi-led
> 
> *
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/
> drivers/platform/x86/dell-wmi-led.c?h=v5.7-rc4#n155


I'm sorry Steve, I'm afraid I can't do that.


$ sudo modprobe dell-wmi-led
modprobe: ERROR: could not insert 'dell_wmi_led': No such device

$ sudo modprobe -vv dell-wmi-led
modprobe: INFO: custom logging function 0x564432284a20 registered
insmod /lib/modules/5.7.0-0.rc4.20200508git79dede78c057.1.fc33.x86_64/kernel/drivers/platform/x86/dell-wmi-led.ko.xz 
modprobe: INFO: Failed to insert module '/lib/modules/5.7.0-0.rc4.20200508git79dede78c057.1.fc33.x86_64/kernel/drivers/platform/x86/dell-wmi-led.ko.xz': No such device
modprobe: ERROR: could not insert 'dell_wmi_led': No such device
modprobe: INFO: context 0x5644333c5460 released

$ sudo lsmod |grep led
ledtrig_audio          16384  4 snd_hda_codec_generic,snd_hda_codec_realtek,snd_sof,dell_laptop

$ sudo lsmod |grep wmi
dell_wmi               20480  0
dell_smbios            36864  2 dell_wmi,dell_laptop
dell_wmi_descriptor    20480  2 dell_wmi,dell_smbios
wmi_bmof               16384  0
intel_wmi_thunderbolt    20480  0
sparse_keymap          16384  2 intel_hid,dell_wmi
wmi                    36864  5 intel_wmi_thunderbolt,dell_wmi,wmi_bmof,dell_smbios,dell_wmi_descriptor
video                  53248  3 dell_wmi,dell_laptop,i915

Comment 20 Steve 2020-05-12 16:46:51 UTC
Thanks for the additional test results. Some of these issues should probably be opened as separate bug reports.

> Yeah, shutdown, reboot, with no dongles or USB devices connected (running on battery) and it does all the work except actually power off.

ATM, the only concrete suggestion I can make is to update the bug summary to mention the latest kernel you tested:

"Dell XPS 13 9300 (2020) hangs with black screen on shutdown or reboot and 5.7.0-0.rc4"
                                                                      ^^^^^^^^^^^^^^^^

I truncated the kernel version for brevity. What follows is FYI:

The hex number in the kernel version indicates a Fedora pre-release snapshot build. The number is the git commit ID of HEAD when the git repo was cloned for the build:

5.7.0-0.rc4.20200508git79dede78c057.1.fc33.x86_64
                       ^^^^^^^^^^^^

If you scroll down starting here in the kernel mainline git repo, you will see the v5.7-rc4 tag. Fedora snapshot builds are sometimes done between "-rcN" kernel releases:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?qt=range&q=79dede78c057
                                                                                   ^^^^^^^^^^^^

Comment 21 Steve 2020-05-12 17:21:14 UTC
(In reply to Steve from comment #20)
> Some of these issues should probably be opened as separate bug reports.

The wifi problem appears to have been reported here:

Bug 1825051 - Killer Wi-Fi 6 AX1650i 160MHz Wireless Network Adapter (201NGW) does not function on F31 

That is identical to what the attached log shows:

May 08 14:01:40 kernel: iwlwifi 0000:00:14.3: Detected Killer(R) Wi-Fi 6 AX1650i 160MHz Wireless Network Adapter (201NGW), REV=0x338

Comment 22 Steve 2020-05-12 18:50:36 UTC
Re Comment 12:

> Installing and trying kernel 5.3.7 from FC31 and it does reboot/shutdown properly!

Actually, there is more that can be done. See if you can find a pair of kernels that bracket the regression as closely as possible.

Old kernels can be downloaded from Koji:

https://koji.fedoraproject.org/koji/packageinfo?packageID=8

However, the "koji" command can be used to do everything from the command-line:

# dnf install koji

This will give you a list of F32 kernels built after 5.3 was released:*

$ koji list-builds --package=kernel --state=COMPLETE --after=2019-09-15 --quiet | fgrep '.fc32' | sort -Vr

Since you found 5.3.7 is "good" and said 5.4 is "bad" (Comment 0), I suggest testing the 5.4.0 release kernel:**

In an empty directory:

$ koji download-build --rpm kernel-core-5.4.0-2.fc32.x86_64
$ koji download-build --rpm kernel-modules-5.4.0-2.fc32.x86_64

Install with:

# dnf install kernel*.rpm

* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tag/?h=v5.3

** This includes some Fedora-specific patches (per changelog):

kernel-5.4.0-2.fc32
https://koji.fedoraproject.org/koji/buildinfo?buildID=1416758

Comment 23 Steve 2020-05-12 19:21:53 UTC
(In reply to Steve from comment #22)
> See if you can find a pair of kernels that bracket the regression as closely as possible.

Where this is headed is toward doing a kernel bisection, which is the process of systematically building and testing kernels until a specific commit in the git history is identified as causing or exposing the bug.

To do a kernel bisection you need three things:

1. A computer than can build kernels without overheating and enough disk space to hold the kernel source code, some of its commit history, and all the compiled files (~4 GB).

2. A basic familiarity with a few git commands, specifically with "git clone" and "git bisect":

https://git-scm.com/docs/git-clone
https://git-scm.com/docs/git-bisect

3. A basic familiarity with using "make" to be build software.

Comment 24 Marc C 2020-05-13 12:50:25 UTC
Steve,

re: comment #20: done.  Subject of ticket adjusted to include "and 5.7.0-0.rc4"

re: comment #21: yeah, I found the Wifi problem all over the Internet, including a workaround.  WiFi is fixed in 5.6, so I'm actually not concerned with it anymore.   I used a workaround in the 5.4/5.5 kernels until 5.6 came out and fixed the problem.  I was actually watching bug # 1788150

re: comment #22: I think I could swing this.  The only question I have is.. in your koji command you're searching for fc32 kernels, but I'm actually still running fc31.  Does that matter?  Part of me thinks not, but then again, developers are finicky beasts and sometimes it's the little details that matter.

I'll leave the choice up to you:
 
option 1. I can update to fc32, re-test, then start to use koji to find the kernel version that introduce the behaviour, or  
option 2. stay on fc31 and check kernels from koji delivered for fc31
option 3. stay on fc31 and check kernels from koji delivered for fc32

Comment 25 Steve 2020-05-13 14:32:34 UTC
> in your koji command you're searching for fc32 kernels, but I'm actually still running fc31.  Does that matter?

No. The Fedora release number in the kernel version basically indicates which Fedora repo the kernel came from. When you download from Koji, there is no Fedora repo, so you can "mix and match". :-)

However, there is a definite reason for testing F32 kernels. The early F32 builds are from the kernel mainline repo -- they have "-rcN" in the version strings. Later F32 kernels are from the kernel stable repo -- they have versions with three parts: 5.4.y.

So what we are looking for is a pair of kernels between 5.3.0 and 5.4.0. However, 5.3.7 is not in that range because it is on a different branch. I suggest comparing the output from the koji command in Comment 22 with "f32" and with "f31" in the fgrep command to see what I mean. (Or you can remove the fgrep command entirely to see everything at once.)

See, also, the upstream kernel repos at kernel.org (Click "refs" to see branches and tags.):

mainline: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
stable:   https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/

In conclusion, it is fine to stay with F31.

BTW, I am not a "developer", but I have spent a lot of time figuring out how all of these repos and versions and builds are related. :-)

Comment 26 Steve 2020-05-13 14:47:09 UTC
Caution: When testing kernels, it is a good idea to have a backup of anything important on the system under test, because some of the kernels may not have been thoroughly tested.

BTW, the process of downloading and testing kernels from Koji is called "lite" bisection, because no kernel builds are required.

Comment 27 Marc C 2020-05-13 15:26:57 UTC
OK, well, I went ahead with using fc31 kernels instead of fc32 from koji and here's what I found out:

All of 5.4 functions.  Reboots/Power off work.  I also devised a test for suspend that shows it is work.
As soon as I jump to the first 5.5 kernel for fc31, it stops shutting down and rebooting properly.  So my initial comment in the ticket isn't quite correct.

kernel 5.3/5.4 = good. 
kernel 5.5/5.6/5.7 = bad.

I'll now use fc32 kernels with more granularity via koji in an attempt to pipoint a kernel package.


^^^^^^ THIS ENDS THE TL;DR. ^^^^^^^^^^^^^^^^^



My testing methodology is as follows:

* boot into 5.7-rc4 (has working WiFi drivers) 
* download and install kernel to check
* reboot
* check wifi
* check for video artifacts (glitches/snow on screen, especially under activity)
* record "cat /sys/power/mem_sleep" output
* start an infinite loop of "date; sleep 2;" in Terminal
** suspend system
** wait 10+ seconds
** wake system
** record if dates had a lapse in times, indicating the system paused the process properly
* sudo reboot
<once back in test kernel>
* sudo poweroff
<go to top, starting in 5.7 kernel again to download new kernel version>


Note: the reason to check the WiFi and Graphics fidelity was to attempt to rule out one thing effecting the other; and there's no correlation.
* [5.3] Graphics were glitchy and Wifi worked in 5.3, but reboot/poweroff still functioned.
* [5.4] Graphics glitched at the beginning of 5.4 but repaired by the end of 5.4 series kernel.  Meanwhile, WiFi didn't work & reboot/poweroff did!
* [5.5] This is where reboot/shutdown breaks for the first time, wifi remains broken, but graphics are good.
* [5.6, 5.7] This is where Wifi gets repaired, graphics remain stable, but reboot/poweroff is a no-go.

So do graphics or wifi issues come&go with reboot/shutdown issues?  No.


---
 

Here's the results from testing, less the "SUDO DNF" and "KOJI" commands I kept in my log.  They are in chronological order.


$ grep -E -v '^sudo|koji' checking-kernels.log

5.4.2-300.fc31
--------------
"Beginning of 5.4 version for FC31"

Reboot?  YES
Poweroff?  YES
Suspend?  YES
cat /sys/power/mem_sleep?  s2idle

Wifi?  NO
Graphic Artifacts?  YES



kernel-5.4.20-200.fc31
----------------------
"End of 5.4 version for FC1"

Reboot? YES  
Poweroff? YES
Suspend? YES 
cat /sys/power/mem_sleep? s2idle

Wifi?  NO
Graphic Artifacts?  NO



kernel-5.5.2-200.fc31
---------------------
"Start of 5.5 version for FC1"

Reboot? NO 
Poweroff? NO
Suspend? YES
cat /sys/power/mem_sleep? s2idle

Wifi? NO
Graphic Artifacts? NO

Comment 28 Steve 2020-05-13 16:50:34 UTC
(In reply to Marc C from comment #27)

Very nice work on the testing and reporting. I really like your suspend test with the "date" command.

> I'll now use fc32 kernels with more granularity via koji in an attempt to pipoint a kernel package.

> kernel-5.5.2-200.fc31
...
> Reboot? NO 
> Poweroff? NO
...

You may not need to test everything. The ".rc1" release is at the end of the kernel mainline merge window. That is where most of the commits are merged and where the bugs are introduced :-)

This is the one that I would suggest testing next:

kernel-5.5.0-0.rc1.git0.1.fc32                           jforbes           COMPLETE

gitN, where N=0, is a kernel mainline release candidate (rc).

gitN, where N>0, is a Fedora snapshot build.

Comment 29 Marc C 2020-05-13 17:23:02 UTC
Understood.  I'm going to test this afternoon, I'll start with that one.

Comment 30 Steve 2020-05-13 19:49:00 UTC
(In reply to Steve from comment #28)
> The ".rc1" release is at the end of the kernel mainline merge window. That is where most of the commits are merged ...

Here is Linus's release announcement for Linux 5.5-rc1:
https://lkml.org/lkml/2019/12/8/242

"But with 12,500+ non-merge commits, there's obviously a little bit of everything going on."

A word search for "power" will find some mentions in the appended "mergelog".

Comment 31 Marc C 2020-05-14 13:22:09 UTC
Ok, so the behaviour was as expected..

kernel-5.5.0-0.rc1.git0.1.fc32 = :(

Comment 32 Steve 2020-05-14 14:06:52 UTC
(In reply to Marc C from comment #31)
> Ok, so the behaviour was as expected..
> 
> kernel-5.5.0-0.rc1.git0.1.fc32 = :(

Thanks for your report. I suggest moving on to a kernel bisection.

Do you have a computer that can build kernels without overheating? It doesn't have to be the same computer that you are testing on.

In a bisection for a USB bug, William arranged for a Fedora 31 VM to be configured on his company's server. He then used ssh to build his kernels and scp to copy his kernels to his test system (a Sony Vaio laptop).

This has extensive details (and several wrong turns :-)):

Bug 1818952 - [BISECTED] built-in laptop webcam no longer found on Sony Vaio on Fedora 31

Comment 33 Steve 2020-05-14 14:34:42 UTC
(In reply to Steve from comment #32)
...
> Thanks for your report. I suggest moving on to a kernel bisection.

That may not be necessary. An upstream search for "dell shutdown" found this:

Bug_206571 - [Bisected] On kernel 5.5 system hangs on shutdown or reboot 
https://bugzilla.kernel.org/show_bug.cgi?id=206571

The commit that Edoardo identified* appears to have gone into 5.5-rc1:

$ git describe --contains 6c3a44ed3c553c324845744f30bcd1d3b07d61fd
v5.5-rc1~77^2^9

> Do you have a computer that can build kernels without overheating? It doesn't have to be the same computer that you are testing on.

The commit revert would still need to be tested, so at least one kernel build would be needed.

* iommu/vt-d: Turn off translations at shutdown
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6c3a44ed3c553c324845744f30bcd1d3b07d61fd

Comment 34 Marc C 2020-05-14 15:39:54 UTC
OK, I do have a home server running FC31, so I could load up dev tools and build away on it.  Or slap the tools on this machine and then remove them later.  I have not built a kernel in years, and certainly not for Redhat (fedora/rhel/centos), so I'm going to need to some steps on what I'm doing here.



But before you provide any help on that topic, I just want to understand.. is the end goal to 100% confirm that my bug is the same as that kernel bug?  _OR_ is that truth already known and we're looking to get me a stable kernel?

I ask because if it's the latter option, then we're wasting each other's time.  I've operated with this problem for weeks now, and I can continue to operate like this until it gets fixed in a new kernel release.  Much like how wifi was broken, was corrected in the kernel eventually, and I just needed to wait for that kernel to land one day..


And thanks for all your help so far.  I know "please & thanks" sometimes go by the wayside when a flurry of back-and-forths happen, but I just wanted to state that your help and super quick replies have been.. refreshing!


-Marc

Comment 35 Marc C 2020-05-14 15:54:11 UTC
Oh snap.  Someone's got a working laptop that can shutdown/reboot again!

Going into the BIOS and disabling the "VT for Direct I/O" option under "Virtualization" gets the machine to properly shutdown and reboot properly.


The option is described as "Enables the computer to perform Virtualization Technology for Direct I/O (VT-d). VT-d is an Intel method that provides virtualization for memory map I/O." and it is basically directly related to the kernel bug you references in comment # 33

So we're now officially in "workaround" mode and functioning fine!?!



Oh, and I'm using fc31's latest installed version from the other day.. 
 
$ uname -r
5.6.11-200.fc31.x86_64

Comment 36 Steve 2020-05-14 17:11:20 UTC
> So we're now officially in "workaround" mode and functioning fine!?!

Good work finding that BIOS option. What you do next is up to you.

My suggestion would be to post your workaround upstream as a comment on Edoardo's bug* and link to this bug. If your workaround works for Edoardo (with an unmodified kernel), that would be fairly good evidence that the two bugs are the same.

You could also post a link to Edoardo's bug in the "Links" section near the top of this bug report.

> ... I just wanted to state that your help and super quick replies have been.. refreshing!

Thanks for saying that. It is especially nice when a reporter finds a workaround or even a fix. :-)

* Bug_206571 - [Bisected] On kernel 5.5 system hangs on shutdown or reboot 
https://bugzilla.kernel.org/show_bug.cgi?id=206571

Comment 37 Steve 2020-05-14 17:48:50 UTC
> I have not built a kernel in years, and certainly not for Redhat (fedora/rhel/centos), ...

FYI, the source for the kernel being built would be cloned from the upstream mainline git repo.* However, the ".config" file would be copied from a Fedora config file in /boot and tweaked for the build.

The build itself is easy: "make", wait, "make binrpm-pkg". The latter make target builds an rpm package, so you can install and remove the kernel with the "dnf" command.

* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Comment 38 Marc C 2020-05-14 20:43:51 UTC
Steve,

Done and done.  I updated the kernel.org ticket with my workaround and also adjusted this ticket's links.  Here's hoping someone else with a Dell XPS laptop can benefit from this! 

Also, what about the other person who I orignially replied to?  Should something be posted to that ticket?  # 1825298


-Marc

Comment 39 Steve 2020-05-14 21:04:52 UTC
> Also, what about the other person who I orignially replied to?  Should something be posted to that ticket?  # 1825298

Thanks for posting your results upstream and for adding the link to the upstream bug in the the "Links" section.

I updated Mohan on your workaround in Bug 1825298, Comment 10.

BTW, if you prefix a BZ bug number with "Bug", BZ will automatically create a link to the bug. That's called "autolinkification":

https://bugzilla.redhat.com/docs/en/html/using/tips.html?highlight=autolinkification

Comment 40 Ben Cotton 2020-11-03 16:41:27 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 41 Ben Cotton 2020-11-24 17:21:35 UTC
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.