Bug 1699161 - kernel-5.x creates unkillable processes after suspend with Intel Wireless 8260
Summary: kernel-5.x creates unkillable processes after suspend with Intel Wireless 8260
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 28
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-11 23:29 UTC by Stuart D Gathman
Modified: 2019-05-29 01:11 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-28 23:57:01 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
kernel logs (809.64 KB, text/plain)
2019-04-11 23:29 UTC, Stuart D Gathman
no flags Details

Description Stuart D Gathman 2019-04-11 23:29:05 UTC
Created attachment 1554665 [details]
kernel logs

1. Please describe the problem:
Beginning with all 5.x kernels, when Dell Latitude 3570 laptop resumes from suspend, seemingly random process are frozen and unkillable.  Systemd takes 30 minutes to shutdown, patiently sending SIGTERM and then SIGKILL - and ultimately failing to unmount filesystems used by those processes.

2. What is the Version-Release number of the kernel:
kernel-5.0.5-100.fc28.x86_64
kernel-5.0.6-100.fc28.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Yes, works perfectly with 4.x kernels.  I am currently running 4.20.17-100.fc28.x86_64

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

While it is probably hardware specific, the steps are:
1) boot with kernel-5.0.x
2) suspend laptop
3) resume laptop
4) if no frozen processes are immediately evident (a frozen gui process is visibly obvious - background ones, not so much), try to shutdown.  Systemd will complain about the frozen processes.

Sometimes, the number of frozen processes is 0 - so you might have to suspend/resume more than once.

My intuitive hunch is that it has to do with a network driver, as all the unkillable processes do something with the wifi interface - whether displaying activity, or often NetworkManager itself becomes unkillable.


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

yes.  Although it just got pushed to stable after I retested.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Stuart D Gathman 2019-04-11 23:32:07 UTC
Oh sorry, I didn't actually try kernel from rawhide, just updates-testing.

Comment 2 Stuart D Gathman 2019-04-11 23:32:48 UTC
Workaround - don't use suspend.

Comment 3 Stuart D Gathman 2019-04-14 14:22:28 UTC
kernel-5.0.7-100.fc28.x86_64 is also broken.  It definitely seems network related.  The unkillable processes always come from a list of usual suspects - all doing something with the network, specifically wireless.  (Intel 8260)

Comment 4 Stuart D Gathman 2019-04-20 16:29:09 UTC
Notes:  kernel-5.0.8-100.fc28.x86_64 is also broken.

While sometimes the first suspend works, it nearly always crashes on the second suspend/resume.

One thing that is always hung is sudo and su - neither can be used after the failed resume, although the terminal, graphics, window manager otherwise continue to work normally.

Comment 5 Stuart D Gathman 2019-04-20 19:58:57 UTC
A better workaround:

Before suspending, turn off wireless, then sudo modprobe -r iwlmvm
After resuming, sudo modprobe iwlmvm, then turn on wireless

This should be able to be automated.  It also narrows down the problem to something in the iwlmvm (or subsidiary) driver.

Comment 6 Stuart D Gathman 2019-04-20 21:45:47 UTC
Ok, to automate the workaround, I added this script:

# cat /usr/lib/systemd/system-sleep/iwlmvm.sh
#!/bin/sh
if [ "${1}" == "pre" ]; then
  # iwlmvm driver creates unkillable processes on suspend on kernel-5.0.x
  modprobe -r iwlmvm
elif [ "${1}" == "post" ]; then
  # Do the thing you want after resume here, e.g.:
  modprobe iwlmvm
fi

Comment 7 Stuart D Gathman 2019-04-24 14:36:20 UTC
The workaround reliably prevents the unkillable processes.  However, the resume doesn't always successfully reload the module - there seems to be a timing problem.  When this happens, the module can be loaded manually.

Comment 8 Ben Cotton 2019-05-02 19:39:47 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 9 Ben Cotton 2019-05-28 23:57:01 UTC
Fedora 28 changed to end-of-life (EOL) status on 2019-05-28. Fedora 28 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 10 Stuart D Gathman 2019-05-29 01:11:46 UTC
Just tested again, and this is fixed is current kernels.


Note You need to log in before you can comment on or make changes to this bug.