Bug 1759879 - System hang up when memory swapping (kswapd deadlock)
Summary: System hang up when memory swapping (kswapd deadlock)
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 31
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-09 10:12 UTC by Mirek Svoboda
Modified: 2020-04-07 17:21 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-25 22:25:15 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
kernel 5.3.5 log (87.43 KB, text/plain)
2019-10-09 11:58 UTC, Mirek Svoboda
no flags Details
kernel log 5.4.0-0.rc1.git1.1.fc32.x86_64 (177.88 KB, text/plain)
2019-10-09 12:40 UTC, Mirek Svoboda
no flags Details

Description Mirek Svoboda 2019-10-09 10:12:24 UTC
1. Please describe the problem:
I run Fedora 31 with KDE Plasma DE.
Swapping with 5.2.17 works fine, even when using 4GB of swap for several days. Swapping with 5.3.2 causes a complete freeze, usually soon after swapping occures, i.e. screen freezes up, no mouse movement, no TTY access. I did not try SYSRQ keys.

System under test:
HP Elitebook 850 G4
CPU: Intel i5-7200U with embedded GPU
RAM: 4GB unbuffered, memtest OK
Disk: SSD Samsung PM961 (256GB), LVM+LUKS, root volume with XFS filesystem
Swap: swapping to file of size 20GB at path /swapfile

2. What is the Version-Release number of the kernel:
kernel-5.3.2-300.fc30.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
It works with 5.2.17

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
- enable swap
- allocate enough RAM so system starts swapping, use e.g. multiple web browser tabs with a memory consuming webpages
- after while, usually in less than two hours, the system freezes

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
I did not try yet.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
Yes, kmod-VirtualBox, in both 5.2.17 and 5.3.2.
The issue happens in 5.3.2 even when VirtualBox is never used.
The issue does not happen in 5.2.17 even when VirtualBox is used and causes a lot of swapping.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.
There is no message in the kernel log at the time of the freeze.
Will try to reproduce with 5.3.5 and attach the log.

Comment 1 Mirek Svoboda 2019-10-09 11:03:51 UTC
The issue happens also on non-tainted kernel-5.3.5-300.fc31.x86_64.

Comment 2 Mirek Svoboda 2019-10-09 11:57:26 UTC
I/O Scheduler: BFQ

While the issue is happening, the disk LED indicates no disk activity, although the LED works otherwise.

Attaching kernel log dmesg-5.3.5.txt from an affected run of 5.3.5 kernel.

Comment 3 Mirek Svoboda 2019-10-09 11:58:19 UTC
Created attachment 1623777 [details]
kernel 5.3.5 log

Comment 4 Mirek Svoboda 2019-10-09 12:39:59 UTC
Trying rawhide kernel 5.4.0-0.rc1.git1.1.fc32.x86_64. It is also affected by this bug. I see following warning in dmesg:

<truncated>
Oct 09 13:47:08 kernel: WARNING: possible circular locking dependency detected
Oct 09 13:47:08 kernel: 5.4.0-0.rc1.git1.1.fc32.x86_64 #1 Not tainted

<truncated>

*** DEADLOCK ***
Oct 09 13:47:08 kernel: 4 locks held by kswapd0/157:
Oct 09 13:47:08 kernel:  #0: ffffffff83781540 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
Oct 09 13:47:08 kernel:  #1: ffffffff837743d8 (shrinker_rwsem){++++}, at: shrink_slab+0x134/0x2b0
Oct 09 13:47:08 kernel:  #2: ffff8e2e0dd920e8 (&type->s_umount_key#56){++++}, at: trylock_super+0x16/0x50
Oct 09 13:47:08 kernel:  #3: ffff8e2e0dd57a58 (&pag->pag_ici_reclaim_lock){+.+.}, at: xfs_reclaim_inodes_ag+0x95/0x450 [xfs]

Full dmesg is attached.

Comment 5 Mirek Svoboda 2019-10-09 12:40:47 UTC
Created attachment 1623793 [details]
kernel log 5.4.0-0.rc1.git1.1.fc32.x86_64

Comment 6 Mirek Svoboda 2019-10-09 12:52:20 UTC
I have opened an upstream bug https://bugzilla.kernel.org/show_bug.cgi?id=205135

Comment 7 Mirek Svoboda 2019-10-10 08:57:32 UTC
Only known recovery from the freeze is a hard reset. This is not a temporary freeze, but system hang up.

Comment 8 samoht0 2019-10-10 19:17:18 UTC
This unreproduced bot crash sounds related to me:

https://lore.kernel.org/linux-mm/20190910071804.2944-1-hdanton@sina.com/

Something for the maintainers to look into.

Comment 9 Mirek Svoboda 2019-10-11 07:40:33 UTC
kernel-5.4.0-0.rc2.git1.1.fc32.x86_64 is also affected.

Comment 10 Mirek Svoboda 2019-10-22 08:59:24 UTC
Everyone who uses a swapfile on XFS filesystem seem affected by this hang up.

Comment 11 Justin M. Forbes 2020-03-03 16:16:36 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 31 kernel bugs.

Fedora 31 has now been rebased to 5.5.7-200.fc31.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 32, and are still experiencing this issue, please change the version to Fedora 32.

If you experience different issues, please open a new bug report for those.

Comment 12 Justin M. Forbes 2020-03-25 22:25:15 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 13 Mirek Svoboda 2020-04-07 05:34:41 UTC
The issue does not happen on FC31 with kernel 5.5.13.


Note You need to log in before you can comment on or make changes to this bug.