Bug 1759879

Summary: System hang up when memory swapping (kswapd deadlock)
Product: [Fedora] Fedora Reporter: Mirek Svoboda <goodmirek>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 31CC: airlied, bskeggs, goodmirek, haydn.reysenbach, hdegoede, ichavero, itamar, jarodwilson, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, masami256, mchehab, mihai, mjg59, pasik, patdung100, redhat, samoht0-bugzilla, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
kernel 5.3.5 log
none
kernel log 5.4.0-0.rc1.git1.1.fc32.x86_64 none

Description Mirek Svoboda 2019-10-09 10:12:24 UTC
1. Please describe the problem:
I run Fedora 31 with KDE Plasma DE.
Swapping with 5.2.17 works fine, even when using 4GB of swap for several days. Swapping with 5.3.2 causes a complete freeze, usually soon after swapping occures, i.e. screen freezes up, no mouse movement, no TTY access. I did not try SYSRQ keys.

System under test:
HP Elitebook 850 G4
CPU: Intel i5-7200U with embedded GPU
RAM: 4GB unbuffered, memtest OK
Disk: SSD Samsung PM961 (256GB), LVM+LUKS, root volume with XFS filesystem
Swap: swapping to file of size 20GB at path /swapfile

2. What is the Version-Release number of the kernel:
kernel-5.3.2-300.fc30.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
It works with 5.2.17

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
- enable swap
- allocate enough RAM so system starts swapping, use e.g. multiple web browser tabs with a memory consuming webpages
- after while, usually in less than two hours, the system freezes

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
I did not try yet.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
Yes, kmod-VirtualBox, in both 5.2.17 and 5.3.2.
The issue happens in 5.3.2 even when VirtualBox is never used.
The issue does not happen in 5.2.17 even when VirtualBox is used and causes a lot of swapping.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.
There is no message in the kernel log at the time of the freeze.
Will try to reproduce with 5.3.5 and attach the log.

Comment 1 Mirek Svoboda 2019-10-09 11:03:51 UTC
The issue happens also on non-tainted kernel-5.3.5-300.fc31.x86_64.

Comment 2 Mirek Svoboda 2019-10-09 11:57:26 UTC
I/O Scheduler: BFQ

While the issue is happening, the disk LED indicates no disk activity, although the LED works otherwise.

Attaching kernel log dmesg-5.3.5.txt from an affected run of 5.3.5 kernel.

Comment 3 Mirek Svoboda 2019-10-09 11:58:19 UTC
Created attachment 1623777 [details]
kernel 5.3.5 log

Comment 4 Mirek Svoboda 2019-10-09 12:39:59 UTC
Trying rawhide kernel 5.4.0-0.rc1.git1.1.fc32.x86_64. It is also affected by this bug. I see following warning in dmesg:

<truncated>
Oct 09 13:47:08 kernel: WARNING: possible circular locking dependency detected
Oct 09 13:47:08 kernel: 5.4.0-0.rc1.git1.1.fc32.x86_64 #1 Not tainted

<truncated>

*** DEADLOCK ***
Oct 09 13:47:08 kernel: 4 locks held by kswapd0/157:
Oct 09 13:47:08 kernel:  #0: ffffffff83781540 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
Oct 09 13:47:08 kernel:  #1: ffffffff837743d8 (shrinker_rwsem){++++}, at: shrink_slab+0x134/0x2b0
Oct 09 13:47:08 kernel:  #2: ffff8e2e0dd920e8 (&type->s_umount_key#56){++++}, at: trylock_super+0x16/0x50
Oct 09 13:47:08 kernel:  #3: ffff8e2e0dd57a58 (&pag->pag_ici_reclaim_lock){+.+.}, at: xfs_reclaim_inodes_ag+0x95/0x450 [xfs]

Full dmesg is attached.

Comment 5 Mirek Svoboda 2019-10-09 12:40:47 UTC
Created attachment 1623793 [details]
kernel log 5.4.0-0.rc1.git1.1.fc32.x86_64

Comment 6 Mirek Svoboda 2019-10-09 12:52:20 UTC
I have opened an upstream bug https://bugzilla.kernel.org/show_bug.cgi?id=205135

Comment 7 Mirek Svoboda 2019-10-10 08:57:32 UTC
Only known recovery from the freeze is a hard reset. This is not a temporary freeze, but system hang up.

Comment 8 samoht0 2019-10-10 19:17:18 UTC
This unreproduced bot crash sounds related to me:

https://lore.kernel.org/linux-mm/20190910071804.2944-1-hdanton@sina.com/

Something for the maintainers to look into.

Comment 9 Mirek Svoboda 2019-10-11 07:40:33 UTC
kernel-5.4.0-0.rc2.git1.1.fc32.x86_64 is also affected.

Comment 10 Mirek Svoboda 2019-10-22 08:59:24 UTC
Everyone who uses a swapfile on XFS filesystem seem affected by this hang up.