Bug 2219024 - second nvme ssd disappears after suspend
Summary: second nvme ssd disappears after suspend
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 38
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-01 03:43 UTC by Eric M
Modified: 2023-07-08 16:45 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-08 16:45:04 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Eric M 2023-07-01 03:43:00 UTC
I have a Lemur Pro (lemp12) from System76. I installed Fedora 38 KDE spin. I have two nvme drives:

# nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme1n1          /dev/ng1n1            S6S1NS0W314806L      Samsung SSD 970 EVO Plus 1TB             0x1          3.11  GB /   1.00  TB    512   B +  0 B   4B2QEXM7
/dev/nvme0n1          /dev/ng0n1            23085Q800011         WD_BLACK SN850X 4000GB                   0x1          4.00  TB /   4.00  TB    512   B +  0 B   624311WD

The default install put them in raid1 btrfs:
# btrfs filesystem show
Label: 'fedora_localhost-live'  uuid: ffb47540-848e-40c3-b0a8-32cb0886d093
        Total devices 2 FS bytes used 141.87GiB
        devid    1 size 3.64TiB used 175.02GiB path /dev/nvme0n1p3
        devid    2 size 931.51GiB used 3.01GiB path /dev/nvme1n1p1

# btrfs filesystem df /
Data, single: total=172.01GiB, used=139.82GiB
System, RAID1: total=8.00MiB, used=48.00KiB
Metadata, RAID1: total=3.00GiB, used=2.05GiB
GlobalReserve, single: total=264.06MiB, used=0.00B

Sleep is set for deep:
# cat /sys/power/mem_sleep 
s2idle [deep]

After closing the lid the system goes to sleep as expected. However on waking up the second ssd is missing:
# nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            23085Q800011         WD_BLACK SN850X 4000GB                   0x1          4.00  TB /   4.00  TB    512   B +  0 B   624311WD


Relevant part of kernel log that shows the system trying to wake up:

kernel: ACPI: PM: Waking up from system sleep state S3
kernel: ACPI: EC: interrupt unblocked
kernel: pcieport 0000:00:1d.0: Unable to change power state from D3hot to D0, device inaccessible
kernel: nvme 0000:2e:00.0: Unable to change power state from D3hot to D0, device inaccessible
kernel: ACPI: EC: event unblocked
kernel: xhci_hcd 0000:00:0d.0: xHC error in resume, USBSTS 0x401, Reinit
kernel: usb usb1: root hub lost power or was reset
kernel: usb usb2: root hub lost power or was reset
kernel: nvme 0000:2e:00.0: Unable to change power state from D3cold to D0, device inaccessible
kernel: nvme nvme1: Disabling device after reset failure: -19
kernel: pcieport 0000:00:06.0: can't derive routing for PCI INT A
kernel: nvme 0000:01:00.0: PCI INT A: no GSI

This is a well-known problem with nvme drives in Linux. The solution for most people is to set the kernel parameter
nvme_core.default_ps_max_latency_us=0, but this fails. It also fails to set this latency to *any* of the Ex_latency values for the
power states of either ssd (as are in the 'solved' solutions for many of the other identical nvme problems already reported elsewhere). Another suggestion was setting iommu=soft or iommu=pt. Neither of these work.

I've also tried re-seating the ssd as suggested in several other reports. This does not work either.

Given the numerous other reports out there (just google "linux nvme disappear suspend") it seems to be a problem with the nvme driver, but I don't know.
The most relevant link I think I can post is from an interesting patch suggested here: 
https://lore.kernel.org/lkml/20230309093657.GA24373@lst.de/T/
But the kernel parameter they suggest is not in my nvme_core module, so I can't test it.



Reproducible: Always

Steps to Reproduce:
1. close lid to suspend
2. open lid to wake
3. run 'nvme list' and see that the drive is missing.
Actual Results:  
The second drive is not visible after waking.

Expected Results:  
The second drive should wake up from sleep.

kernel: 6.3.8-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jun 15 02:15:40 UTC 2023 x86_64 GNU/Linux
Running latest updates with fedora 38.

# inxi --admin --verbosity=7 --filter --no-host --width
System:
  Kernel: 6.3.8-200.fc38.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.39-9.fc38 parameters: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.3.8-200.fc38.x86_64
  Console: pty pts/1 wm: kwin_wayland DM: SDDM Distro: Fedora release 38
    (Thirty Eight)
Machine:
  Type: Laptop System: System76 product: Lemur Pro v: lemp12 serial: <filter>
    Chassis: type: 9 serial: N/A
  Mobo: System76 model: Lemur Pro v: lemp12 serial: <filter> UEFI: coreboot
    v: 2023-05-16_e9b9ea8 date: 05/16/2023

Comment 1 Eric M 2023-07-08 16:45:04 UTC
I ended up pulling out the Samsung ssd and replacing it with a WD. The problem went away.
I'm marking this as closed.


Note You need to log in before you can comment on or make changes to this bug.