Bug 1740690 - zswap z3fold BUG: unable to handle page fault for address or GPF z3fold_zpool_malloc or z3fold_zpool_map
Summary: zswap z3fold BUG: unable to handle page fault for address or GPF z3fold_zpool...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 30
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-13 13:35 UTC by Markus Linnala
Modified: 2019-11-21 15:11 UTC (History)
20 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-11-21 15:11:17 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg-zswap-problem-5.2.0-0.rc1.git0.1.fc31.x86_64.txt (25.58 KB, text/plain)
2019-08-13 13:35 UTC, Markus Linnala
no flags Details
dmesg-zswap-z3fold_zpool_map-5.2.8-200.fc30.x86_64.txt (6.00 KB, text/plain)
2019-08-13 13:36 UTC, Markus Linnala
no flags Details
dmesg-zswap-z3fold_zpool_malloc-5.2.8-200.fc30.x86_64.txt (5.98 KB, text/plain)
2019-08-13 13:36 UTC, Markus Linnala
no flags Details
dmesg-zswap-writeback_entry-5.2.8-200.fc30.x86_64.txt (22.52 KB, text/plain)
2019-08-13 13:41 UTC, Markus Linnala
no flags Details
dmesg-zswap-5.2.8-200.fc30.x86_64.txt (59.60 KB, text/plain)
2019-08-13 13:55 UTC, Markus Linnala
no flags Details
dmesg-zswap-z3fold_zpool_free-5.1.20.txt (17.37 KB, text/plain)
2019-08-13 15:30 UTC, Markus Linnala
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 204563 0 None None None 2019-08-22 18:01:57 UTC

Description Markus Linnala 2019-08-13 13:35:29 UTC
Created attachment 1603384 [details]
dmesg-zswap-problem-5.2.0-0.rc1.git0.1.fc31.x86_64.txt

1. Please describe the problem:

Configuring kernel to boot with zswap.enabled=1 and zswap.zpool=z3fold causes kernel to BUG.

This also makes data in swap to be inaccessible and all swap using processes become stuck. Shutdown does not to work.

[   78.257803] BUG: unable to handle page fault for address: ffffcb6203000028
[   78.258762] #PF: supervisor read access in kernel mode
[   78.260306] #PF: error_code(0x0000) - not-present page
[   78.261073] PGD 0 P4D 0 
[   78.261586] Oops: 0000 [#1] SMP PTI
[   78.262328] CPU: 0 PID: 152 Comm: kswapd0 Not tainted 5.2.8-200.fc30.x86_64 #1
[   78.263387] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[   78.266573] RIP: 0010:z3fold_zpool_map+0x50/0xf0
[   78.268259] Code: e8 48 01 ea 0f 82 b3 00 00 00 48 c7 c3 00 00 00 80 48 2b 1d 5a b9 ec 00 48 01 d3 48 c1 eb 0c 48 c1 e3 06 48 03 1d 38 b9 ec 00 <48> 8b 53 28 83 e2 01 74 05 5b 5d 41 5c c3 48 8d 7d 10 e8 69 d8 6d
[   78.271306] RSP: 0018:ffffa9108033b868 EFLAGS: 00010286
[   78.272469] RAX: 0000000000000000 RBX: ffffcb6203000000 RCX: 0000000000000000
[   78.273603] RDX: 0000000080000000 RSI: ffff953f6bac5e18 RDI: ffff953f7c8fc600
[   78.274486] RBP: 0000000000000000 R08: ffff953f7c8fc600 R09: 0000000000000001
[   78.275356] R10: ffff953f6bac5e18 R11: ffffc9b700aef020 R12: ffff953f6bac5e18
[   78.276192] R13: ffffa9108033b8a0 R14: ffff953f45910000 R15: ffff953f7c8fc600
[   78.277158] FS:  0000000000000000(0000) GS:ffff953f7e600000(0000) knlGS:0000000000000000
[   78.278172] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   78.279182] CR2: ffffcb6203000028 CR3: 000000003583a005 CR4: 0000000000160ef0
[   78.280195] Call Trace:
[   78.280660]  zswap_writeback_entry+0x50/0x3a0
[   78.282058]  z3fold_zpool_shrink+0x37d/0x480
[   78.283096]  zswap_frontswap_store+0x2dc/0x600
[   78.283852]  __frontswap_store+0xab/0x12a
[   78.284646]  swap_writepage+0x39/0x70
[   78.285103]  pageout.isra.0+0x13c/0x350
[   78.285698]  shrink_page_list+0xc14/0xdf0
[   78.286179]  shrink_inactive_list+0x1e5/0x3c0
[   78.286693]  shrink_node_memcg+0x202/0x760
[   78.287403]  shrink_node+0xdc/0x490
[   78.288608]  balance_pgdat+0x2d1/0x510
[   78.289089]  kswapd+0x210/0x3f0
[   78.289482]  ? finish_wait+0x80/0x80
[   78.290093]  kthread+0xfb/0x130
[   78.290526]  ? balance_pgdat+0x510/0x510
[   78.291066]  ? kthread_park+0x80/0x80
[   78.291783]  ret_from_fork+0x35/0x40
[   78.292480] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables vfat fat kvm_intel kvm irqbypass snd_hda_codec_generic crct10dif_pclmul ledtrig_audio crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec iTCO_wdt snd_hda_core iTCO_vendor_support snd_pcsp snd_hwdep snd_pcm snd_timer i2c_i801 snd lpc_ich soundcore virtio_net virtio_balloon joydev net_failover failover qxl serio_raw drm_kms_helper crc32c_intel ttm drm virtio_blk virtio_console qemu_fw_cfg
[   78.305499] CR2: ffffcb6203000028
[   78.305908] ---[ end trace 460d9373d78c8439 ]---
[   78.306536] RIP: 0010:z3fold_zpool_map+0x50/0xf0
[   78.307131] Code: e8 48 01 ea 0f 82 b3 00 00 00 48 c7 c3 00 00 00 80 48 2b 1d 5a b9 ec 00 48 01 d3 48 c1 eb 0c 48 c1 e3 06 48 03 1d 38 b9 ec 00 <48> 8b 53 28 83 e2 01 74 05 5b 5d 41 5c c3 48 8d 7d 10 e8 69 d8 6d
[   78.309585] RSP: 0018:ffffa9108033b868 EFLAGS: 00010286
[   78.310208] RAX: 0000000000000000 RBX: ffffcb6203000000 RCX: 0000000000000000
[   78.311049] RDX: 0000000080000000 RSI: ffff953f6bac5e18 RDI: ffff953f7c8fc600
[   78.311900] RBP: 0000000000000000 R08: ffff953f7c8fc600 R09: 0000000000000001
[   78.312777] R10: ffff953f6bac5e18 R11: ffffc9b700aef020 R12: ffff953f6bac5e18
[   78.313621] R13: ffffa9108033b8a0 R14: ffff953f45910000 R15: ffff953f7c8fc600
[   78.314420] FS:  0000000000000000(0000) GS:ffff953f7e600000(0000) knlGS:0000000000000000
[   78.315386] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   78.316069] CR2: ffffcb6203000028 CR3: 000000003583a005 CR4: 0000000000160ef0
[   78.316923] ------------[ cut here ]------------
[   78.317500] WARNING: CPU: 0 PID: 152 at kernel/exit.c:783 do_exit.cold+0x79/0x91
[   78.318391] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables vfat fat kvm_intel kvm irqbypass snd_hda_codec_generic crct10dif_pclmul ledtrig_audio crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec iTCO_wdt snd_hda_core iTCO_vendor_support snd_pcsp snd_hwdep snd_pcm snd_timer i2c_i801 snd lpc_ich soundcore virtio_net virtio_balloon joydev net_failover failover qxl serio_raw drm_kms_helper crc32c_intel ttm drm virtio_blk virtio_console qemu_fw_cfg
[   78.326746] CPU: 0 PID: 152 Comm: kswapd0 Tainted: G      D           5.2.8-200.fc30.x86_64 #1
[   78.327784] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[   78.328733] RIP: 0010:do_exit.cold+0x79/0x91
[   78.329255] Code: 42 10 48 89 44 24 18 48 8b 44 24 18 e8 51 f6 8d 00 e9 17 f0 ff ff 8b b3 c8 04 00 00 eb ad 48 c7 c7 d0 98 0c a5 e8 06 1b 06 00 <0f> 0b e9 a1 ef ff ff 48 c7 c7 e0 2d 0e a5 e8 3d a9 ff ff 90 90 90
[   78.331493] RSP: 0018:ffffa9108033bee0 EFLAGS: 00010046
[   78.332132] RAX: 0000000000000024 RBX: ffff953f7d26ddc0 RCX: 0000000000000006
[   78.333015] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff953f7e617900
[   78.333888] RBP: 0000000000000009 R08: ffffa9108033bd95 R09: 0000000000000328
[   78.334755] R10: ffffffffa5be9b80 R11: ffffa9108033bd95 R12: 0000000000000009
[   78.335616] R13: 0000000000000009 R14: ffff953f7d26ddc0 R15: 0000000000000046
[   78.336475] FS:  0000000000000000(0000) GS:ffff953f7e600000(0000) knlGS:0000000000000000
[   78.337468] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   78.338162] CR2: ffffcb6203000028 CR3: 000000003583a005 CR4: 0000000000160ef0
[   78.339017] Call Trace:
[   78.339326]  ? kthread+0xfb/0x130
[   78.339733]  rewind_stack_do_exit+0x17/0x20
[   78.340249] ---[ end trace 460d9373d78c843a ]---

2. What is the Version-Release number of the kernel:

5.2.8-200.fc30.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Yes.

5.1.20-300.fc30.x86_64 seems OK
5.2.0-0.rc0.git6.1.fc31.x86_64 seems OK
5.2.0-0.rc1.git0.1.fc31.x86_64 NOT ok
5.2.1-200.fc30.x86_64 NOT ok
5.2.8-200.fc30.x86_64 NOT ok

Also note, you need to install kernels

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Install virtual machine from: Fedora-Workstation-netinst-x86_64-30-1.2.iso

Use "defaults" and select Minimal Install.

Add kernel parameters to /etc/default/grub: zswap.enabled=1 zswap.zpool=z3fold
You can also add "console=ttyS0" to help debugging.

Power off virtual machine

Configure virtual machine to have 4 vCPU and 1GiB memory.

Start virtual machine.

Note it has 4 CPU, about 1GiB memory and 2GiB swap configured. I don't know if these conditions are relevant. But they seem to help here to make problem to come to light.

Note zswap configuration:
$ egrep -r ^ /sys/module/zswap/parameters
/sys/module/zswap/parameters/same_filled_pages_enabled:Y
/sys/module/zswap/parameters/enabled:Y
/sys/module/zswap/parameters/max_pool_percent:20
/sys/module/zswap/parameters/compressor:lzo
/sys/module/zswap/parameters/zpool:z3fold

Run:

stress --vm $(nproc) --vm-bytes $(($(awk '/MemAvail/{print $2}' /proc/meminfo)*1024*3/2/$(nproc))) &(vmstat 1|cat -n)

Wait test to fail. Usually it takes less than 5 minutes in my setup.


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:


6. Are you running any modules that not shipped with directly Fedora's kernel?:

No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Usually journald is swapped out during stress run and journalctl does not work. And journal does not contain relevant data.

Comment 1 Markus Linnala 2019-08-13 13:36:23 UTC
Created attachment 1603385 [details]
dmesg-zswap-z3fold_zpool_map-5.2.8-200.fc30.x86_64.txt

z3fold_zpool_map variant with 5.2.8-200.fc30.x86_64

Comment 2 Markus Linnala 2019-08-13 13:36:51 UTC
Created attachment 1603386 [details]
dmesg-zswap-z3fold_zpool_malloc-5.2.8-200.fc30.x86_64.txt

z3fold_zpool_malloc variant with 5.2.8-200.fc30.x86_64

Comment 3 Markus Linnala 2019-08-13 13:41:54 UTC
Created attachment 1603388 [details]
dmesg-zswap-writeback_entry-5.2.8-200.fc30.x86_64.txt

zswap_writeback_entry variant using 5.2.8-200.fc30.x86_64

I got this after I ran: pkill stress

Comment 4 Laura Abbott 2019-08-13 13:44:01 UTC
This needs to be reported to the upstream maintainers, Fedora isn't using any custom patches here.

$ ./scripts/get_maintainer.pl -f mm/zswap.c 
Seth Jennings <sjenning> (maintainer:ZSWAP COMPRESSED SWAP CACHING)
Dan Streetman <ddstreet> (maintainer:ZSWAP COMPRESSED SWAP CACHING)
linux-mm (open list:ZSWAP COMPRESSED SWAP CACHING)
linux-kernel.org (open list)

Comment 5 Markus Linnala 2019-08-13 13:55:05 UTC
Created attachment 1603389 [details]
dmesg-zswap-5.2.8-200.fc30.x86_64.txt

Normal boot with 5.2.8-200.fc30.x86_64

Comment 6 Markus Linnala 2019-08-13 14:05:53 UTC
There is at least one recent patch about z3fold issues: https://lkml.org/lkml/2019/8/9/758

Comment 7 Markus Linnala 2019-08-13 15:30:28 UTC
Created attachment 1603431 [details]
dmesg-zswap-z3fold_zpool_free-5.1.20.txt

Seems I can get issues with 5.1.20 too, but maybe issue is different.

I have found issue twice now. Of one I lost start of logs because too much info.

First time it took about 20 minutes to get issue and second time about 8 minutes.

With 5.2.8 sometimes problems start within seconds of starting stress job.

Comment 8 Markus Linnala 2019-08-13 17:46:46 UTC
5.0.20-200.fc29.x86_64 seemd to survive 1.5h until I stopped.

Comment 9 Markus Linnala 2019-08-13 19:17:10 UTC
Also 5.1.0-300.fc30.x86_64 survived 1.5h until I stopped.

Comment 10 Markus Linnala 2019-08-15 18:55:10 UTC
I did git bisect run from v5.1 (good) v5.3-rc4 (bad) and got this:

7c2b8baa61fe578af905342938ad12f8dbaeae79 is the first bad commit
commit 7c2b8baa61fe578af905342938ad12f8dbaeae79
Author: Vitaly Wool <...>
Date:   Mon May 13 17:22:49 2019 -0700

    mm/z3fold.c: add structure for buddy handles
    
    For z3fold to be able to move its pages per request of the memory
    subsystem, it should not use direct object addresses in handles.  Instead,
    it will create abstract handles (3 per page) which will contain pointers
    to z3fold objects.  Thus, it will be possible to change these pointers
    when z3fold page is moved.
    
    Link: http://lkml.kernel.org/r/20190417103826.484eaf18c1294d682769880f@gmail.com
    Signed-off-by: Vitaly Wool <...>
    Cc: Bartlomiej Zolnierkiewicz <...>
    Cc: Dan Streetman <...>
    Cc: Krzysztof Kozlowski <...>
    Cc: Oleksiy Avramchenko <...>
    Cc: Uladzislau Rezki <...>
    Signed-off-by: Andrew Morton <...>
    Signed-off-by: Linus Torvalds <...>

:040000 040000 1a27b311b3ad8556062e45fff84d46a57ba8a4b1 a79e463e14ab8ea271a89fb5f3069c3c84221478 M	mm
bisect run success

I used this as test program:

stress --vm $(($(nproc)+2)) --vm-bytes $(($(awk '/MemAvail/{print $2}' /proc/meminfo)*1024/$(nproc))) --timeout 900

Comment 11 Chris Murphy 2019-08-22 15:12:25 UTC
I'm not seeing a field for upstream bugs so I'm leaving this here...

https://bugzilla.kernel.org/show_bug.cgi?id=204563

Comment 12 Markus Linnala 2019-11-21 15:11:17 UTC
This issue is handled by upstream in at least kernel 5.3.11 or earlier. I'll close it.

Comment 13 Markus Linnala 2019-11-21 15:11:44 UTC
Fix resolution


Note You need to log in before you can comment on or make changes to this bug.