Description of problem: NVMe IO hang after offline CPU Version-Release number of selected component (if applicable): 4.16.1-300.fc28.x86_64 How reproducible: 100% Steps to Reproduce: #fio -filename=/dev/nvme0n1p1 -iodepth=1 -thread -rw=randwrite -ioengine=psync -bssplit=5k/10:9k/10:13k/10:17k/10:21k/10:25k/10:29k/10:33k/10:37k/10:41k/10 -direct=1 -runtime=120 -size=-group_reporting -name=mytest -numjobs=60 & #sleep 10 #echo 0 > /sys/devices/system/cpu/cpu1/online #echo 0 > /sys/devices/system/cpu/cpu2/online #echo 0 > /sys/devices/system/cpu/cpu3/online Actual results: fio cannot be finished Expected results: fio can be finished Additional info: [ 122.024084] smpboot: CPU 1 is now offline [ 122.055263] smpboot: CPU 2 is now offline [ 122.299090] smpboot: CPU 3 is now offline [ 153.009736] nvme nvme0: I/O 898 QID 2 timeout, completion polled [ 153.016460] nvme nvme0: I/O 900 QID 2 timeout, aborting [ 153.022310] nvme nvme0: I/O 901 QID 2 timeout, aborting [ 153.028135] nvme nvme0: Abort status: 0x0 [ 153.032627] nvme nvme0: I/O 902 QID 2 timeout, aborting [ 153.038458] nvme nvme0: Abort status: 0x0 [ 153.042948] nvme nvme0: I/O 765 QID 7 timeout, completion polled [ 153.049652] nvme nvme0: Abort status: 0x0 [ 153.054146] nvme nvme0: I/O 767 QID 7 timeout, aborting [ 153.059986] nvme nvme0: I/O 768 QID 7 timeout, aborting [ 153.065817] nvme nvme0: Abort status: 0x0 [ 153.070304] nvme nvme0: I/O 769 QID 7 timeout, aborting [ 153.076133] nvme nvme0: Abort status: 0x0 [ 153.080619] nvme nvme0: I/O 1015 QID 8 timeout, completion polled [ 153.087420] nvme nvme0: Abort status: 0x0 [ 153.091913] nvme nvme0: I/O 1018 QID 8 timeout, completion polled [ 153.098728] nvme nvme0: I/O 1019 QID 8 timeout, completion polled [ 183.650509] nvme nvme0: I/O 900 QID 2 timeout, reset controller [ 186.020822] nvme nvme0: I/O 901 QID 2 timeout, disable controller [ 186.038595] nvme nvme0: I/O 902 QID 2 timeout, disable controller [ 186.054499] nvme nvme0: I/O 767 QID 7 timeout, disable controller [ 186.072495] nvme nvme0: I/O 768 QID 7 timeout, disable controller [ 186.087494] nvme nvme0: I/O 769 QID 7 timeout, disable controller [ 218.659194] nvme nvme0: I/O 900 QID 2 timeout, completion polled [ 218.665926] nvme nvme0: I/O 901 QID 2 timeout, disable controller # ps aux | grep fio root 1471 1.1 9.6 6428504 1568544 pts/0 Sl+ 01:06 0:14 fio -filename=/dev/nvme0n1p1 -iodepth=1 -thread -rw=randwrite -ioengine=psync -bssplit=5k/10:9k/10:13k/10:17k/10:21k/10:25k/10:29k/10:33k/10:37k/10:41k/10 -direct=1 -runtime=600 -size=-group_reporting -name=mytest -numjobs=60 # cat /proc/1471/status Name: fio Umask: 0022 State: S (sleeping) Tgid: 1471 Ngid: 1471 Pid: 1471 PPid: 1470 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 64 Groups: 0 NStgid: 1471 NSpid: 1471 NSpgid: 1272 NSsid: 1093 VmPeak: 6494040 kB VmSize: 6428504 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 1568544 kB VmRSS: 1568544 kB RssAnon: 1157260 kB RssFile: 3284 kB RssShmem: 408000 kB VmData: 1673888 kB VmStk: 132 kB VmExe: 632 kB VmLib: 3908 kB VmPTE: 3700 kB VmSwap: 0 kB HugetlbPages: 0 kB CoreDumping: 0 Threads: 62 SigQ: 3/62985 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000000000 SigCgt: 0000000180004202 CapInh: 0000000000000000 CapPrm: 0000003fffffffff CapEff: 0000003fffffffff CapBnd: 0000003fffffffff CapAmb: 0000000000000000 NoNewPrivs: 0 Seccomp: 0 Cpus_allowed: fff Cpus_allowed_list: 0-11 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003 Mems_allowed_list: 0-1 voluntary_ctxt_switches: 122885 nonvoluntary_ctxt_switches: 20 # cat /proc/1471/stack [<0>] hrtimer_nanosleep+0xd4/0x1e0 [<0>] SyS_nanosleep+0x75/0xa0 [<0>] do_syscall_64+0x74/0x180 [<0>] 0xffffffffffffffff
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 28 kernel bugs. Fedora 28 has now been rebased to 4.17.7-200.fc28. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those.
*********** MASS BUG UPDATE ************** This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 5 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.