Bug 1829662
Summary: | kernel BUG at fs/fscache/operation.c:70! FS-Cache: 4 == 5 is false - current state is FSCACHE_OP_ST_COMPLETE but should be FSCACHE_OP_CANCELLED in fscache_enqueue_operation | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jay Shin <jaeshin> | ||||||||
Component: | kernel | Assignee: | Dave Wysochanski <dwysocha> | ||||||||
kernel sub component: | NFS | QA Contact: | JianHong Yin <jiyin> | ||||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||||
Severity: | urgent | ||||||||||
Priority: | urgent | CC: | dhoward, dhowells, dwysocha, jiyin, mmilgram, nmurray, swhiteho, xzhou, yieli, yoyang, yozone | ||||||||
Version: | 7.7 | Keywords: | Patch, Reproducer, ZStream | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | kernel-3.10.0-1145.el7 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 1839756 1839757 (view as bug list) | Environment: | |||||||||
Last Closed: | 2020-09-29 21:13:47 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1822123, 1839756, 1839757 | ||||||||||
Attachments: |
|
Comment 4
Dave Wysochanski
2020-04-30 13:25:43 UTC
fwiw, in the vmcore, the IOs are around 28kb reads to files of 4MB in size. I'm going to closely mimic these sizes especially the odder size that may cross page boundaries. I just reproduced this! [ 8921.107413] FS-Cache: [ 8921.122996] FS-Cache: Assertion failed [ 8921.124677] FS-Cache: 4 == 5 is false [ 8923.282147] ------------[ cut here ]------------ [ 8923.289081] kernel BUG at fs/fscache/operation.c:70! [ 8923.291119] invalid opcode: 0000 [#1] SMP [ 8923.292860] Modules linked in: nfsv3 nfs_acl nfs lockd grace cachefiles fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ext4 mbcache jbd2 iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev raid0 aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr virtio_balloon virtio_rng parport_pc parport i2c_piix4 auth_rpcgss sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk ata_piix crct10dif_pclmul [ 8923.323869] crct10dif_common libata virtio_pci virtio_ring crc32c_intel serio_raw virtio floppy dm_mirror dm_region_hash dm_log dm_mod [ 8923.328644] CPU: 2 PID: 32001 Comm: dd Kdump: loaded Not tainted 3.10.0-1062.23.1.el7.x86_64 #1 [ 8923.332067] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 8923.334347] task: ffff931db5ee1070 ti: ffff931d29c40000 task.ti: ffff931d29c40000 [ 8923.337296] RIP: 0010:[<ffffffffc07194cf>] [<ffffffffc07194cf>] fscache_enqueue_operation+0xff/0x170 [fscache] [ 8923.341320] RSP: 0018:ffff931dbfc83ca8 EFLAGS: 00010046 [ 8923.343432] RAX: 0000000000000019 RBX: ffff931d2232f300 RCX: 0000000000000006 [ 8923.346237] RDX: 0000000000000000 RSI: 0000000000000046 RDI: ffff931dbfc93890 [ 8923.349048] RBP: ffff931dbfc83cb0 R08: 0000000000000000 R09: 0000000000000000 [ 8923.351855] R10: 00000000000059e1 R11: ffff931dbfc839a6 R12: ffff931d2232f300 [ 8923.354629] R13: ffff931d2ee02d20 R14: ffff931dbfc83d48 R15: ffff931d015a81e0 [ 8923.357433] FS: 00007f539cbfd740(0000) GS:ffff931dbfc80000(0000) knlGS:0000000000000000 [ 8923.360598] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8923.362870] CR2: 00000000006f2000 CR3: 0000000052ca2000 CR4: 00000000000406e0 [ 8923.382822] Call Trace: [ 8923.383913] <IRQ> [ 8923.384732] [<ffffffffc0733c12>] cachefiles_read_waiter+0x92/0xf0 [cachefiles] [ 8923.387735] [<ffffffff858d3942>] __wake_up_common+0x82/0x120 [ 8923.389994] [<ffffffff858d3a63>] __wake_up_common_lock+0x83/0xc0 [ 8923.392394] [<ffffffff858d3510>] ? task_rq_unlock+0x20/0x20 [ 8923.394634] [<ffffffff858d3ab3>] __wake_up+0x13/0x20 [ 8923.396625] [<ffffffff858c72a0>] __wake_up_bit+0x50/0x70 [ 8923.398768] [<ffffffff859bdcdb>] unlock_page+0x2b/0x30 [ 8923.400845] [<ffffffff859bdd09>] page_endio+0x29/0x90 [ 8923.460251] [<ffffffff85a8fc81>] mpage_end_io+0x51/0x80 Similar discussion with patch but it seems this is not accepted at upstream. It's a new bug. https://lore.kernel.org/patchwork/patch/889373/ diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 883bc7bb12c5..9d5d13e150fb 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -58,9 +58,9 @@ static int cachefiles_read_waiter(wait_queue_entry_t *wait, unsigned mode, spin_lock(&object->work_lock); list_add_tail(&monitor->op_link, &monitor->op->to_do); + fscache_enqueue_retrieval(monitor->op); spin_unlock(&object->work_lock); - fscache_enqueue_retrieval(monitor->op); return 0; } ah below has been committed, instead. commit 934140ab028713a61de8bca58c05332416d037d1 Author: Kiran Kumar Modukuri <kiran.modukuri> Date: 2018-07-25 15:04:25 +0100 cachefiles: Fix refcounting bug in backing-file read monitoring (In reply to Jay Shin from comment #10) > ah below has been committed, instead. > > commit 934140ab028713a61de8bca58c05332416d037d1 > Author: Kiran Kumar Modukuri <kiran.modukuri> > Date: 2018-07-25 15:04:25 +0100 > > cachefiles: Fix refcounting bug in backing-file read monitoring This patch is already in RHEL7 oddly via https://bugzilla.redhat.com/show_bug.cgi?id=1356390#c52 O-Subject: [RHEL7 PATCH 2/5] cachefiles: Fix refcounting bug in backing-file read monitoring Bugzilla: 1356390 The patch header sounds like this identical problem though so I'm not sure what's happening here. Need to look further if this problem really was never fixed, multiple vectors to it, or something else. Created attachment 1683654 [details]
Reproducer outline v1
Reproducer outline - needs refined as I've only reproduced once so far, I do not know if md in the stack is significant or not, etc
I wonder if this is an overly aggressive assertion or a real bug... This crash may provide a clue but not sure. When I run the test I get a lot of console spam due to low memory situations but they do not produce the crash. However it looks like the crash happened due to an interrupt while during alloc_pages of another read so it may provide some clue for refining. KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-1062.23.1.el7.x86_64/vmlinux DUMPFILE: /cores/retrace/tasks/289876432/crash/vmcore [PARTIAL DUMP] CPUS: 8 DATE: Thu Apr 30 19:51:43 2020 UPTIME: 02:28:40 LOAD AVERAGE: 962.38, 667.31, 651.21 TASKS: 2083 NODENAME: rhel7u7-node1.dwysocha.net RELEASE: 3.10.0-1062.23.1.el7.x86_64 VERSION: #1 SMP Mon Apr 27 07:15:53 EDT 2020 MACHINE: x86_64 (1795 Mhz) MEMORY: 4 GB PANIC: "kernel BUG at fs/fscache/operation.c:70!" PID: 32001 COMMAND: "dd" TASK: ffff931db5ee1070 [THREAD_INFO: ffff931d29c40000] CPU: 2 STATE: TASK_RUNNING (PANIC) crash> bt PID: 32001 TASK: ffff931db5ee1070 CPU: 2 COMMAND: "dd" #0 [ffff931dbfc83950] machine_kexec at ffffffff85865b34 #1 [ffff931dbfc839b0] __crash_kexec at ffffffff859225d2 #2 [ffff931dbfc83a80] crash_kexec at ffffffff859226c0 #3 [ffff931dbfc83a98] oops_end at ffffffff85f87798 #4 [ffff931dbfc83ac0] die at ffffffff85830a7b #5 [ffff931dbfc83af0] do_trap at ffffffff85f86ee0 #6 [ffff931dbfc83b40] do_invalid_op at ffffffff8582d2a4 #7 [ffff931dbfc83bf0] invalid_op at ffffffff85f9322e [exception RIP: fscache_enqueue_operation+255] RIP: ffffffffc07194cf RSP: ffff931dbfc83ca8 RFLAGS: 00010046 RAX: 0000000000000019 RBX: ffff931d2232f300 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000000046 RDI: ffff931dbfc93890 RBP: ffff931dbfc83cb0 R8: 0000000000000000 R9: 0000000000000000 R10: 00000000000059e1 R11: ffff931dbfc839a6 R12: ffff931d2232f300 R13: ffff931d2ee02d20 R14: ffff931dbfc83d48 R15: ffff931d015a81e0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffff931dbfc83ca0] fscache_enqueue_operation at ffffffffc07194cf [fscache] #9 [ffff931dbfc83cb8] cachefiles_read_waiter at ffffffffc0733c12 [cachefiles] #10 [ffff931dbfc83ce0] __wake_up_common at ffffffff858d3942 #11 [ffff931dbfc83d38] __wake_up_common_lock at ffffffff858d3a63 #12 [ffff931dbfc83da8] __wake_up at ffffffff858d3ab3 #13 [ffff931dbfc83db8] __wake_up_bit at ffffffff858c72a0 #14 [ffff931dbfc83de8] unlock_page at ffffffff859bdcdb #15 [ffff931dbfc83e00] page_endio at ffffffff859bdd09 #16 [ffff931dbfc83e20] mpage_end_io at ffffffff85a8fc81 #17 [ffff931dbfc83e48] bio_endio at ffffffff85a89de7 #18 [ffff931dbfc83e70] blk_update_request at ffffffff85b519d0 #19 [ffff931dbfc83eb0] blk_mq_end_request at ffffffff85b5bb7a #20 [ffff931dbfc83ed0] virtblk_request_done at ffffffffc02bc3bf [virtio_blk] #21 [ffff931dbfc83ee0] blk_done_softirq at ffffffff85b592f6 #22 [ffff931dbfc83f20] __do_softirq at ffffffff858a54b5 #23 [ffff931dbfc83f90] call_softirq at ffffffff85f9342c #24 [ffff931dbfc83fa8] do_softirq at ffffffff8582f715 #25 [ffff931dbfc83fc8] irq_exit at ffffffff858a5835 #26 [ffff931dbfc83fe0] smp_call_function_single_interrupt at ffffffff85859479 #27 [ffff931dbfc83ff0] call_function_single_interrupt at ffffffff85f921aa --- <IRQ stack> --- #28 [ffff931d29c43638] call_function_single_interrupt at ffffffff85f921aa [exception RIP: get_page_from_freelist+679] RIP: ffffffff859c7ae7 RSP: ffff931d29c436e0 RFLAGS: 00000246 RAX: 0000000000000022 RBX: ffff931d29c436d0 RCX: 000000000001fc30 RDX: 000000000001fcff RSI: 000000000000001f RDI: 0000000000000246 RBP: ffff931d29c437e8 R8: fffffffffffffff6 R9: 0000000000000000 R10: ffff931dbfb83c60 R11: 000000000000327a R12: ffffe9b6406f9940 R13: 0000000000000001 R14: ffff931d22276d00 R15: ffffffff85b581ff ORIG_RAX: ffffffffffffff04 CS: 0010 SS: 0018 #29 [ffff931d29c437f0] __alloc_pages_nodemask at ffffffff859c8446 #30 [ffff931d29c43898] alloc_pages_current at ffffffff85a16d38 #31 [ffff931d29c438e0] __page_cache_alloc at ffffffff859bd957 #32 [ffff931d29c43918] cachefiles_read_or_alloc_pages at ffffffffc07349f2 [cachefiles] #33 [ffff931d29c43a18] __fscache_read_or_alloc_pages at ffffffffc071bba6 [fscache] #34 [ffff931d29c43a60] __nfs_readpages_from_fscache at ffffffffc077b294 [nfs] #35 [ffff931d29c43ab0] nfs_readpages at ffffffffc0771583 [nfs] #36 [ffff931d29c43ba8] __do_page_cache_readahead at ffffffff859cabef #37 [ffff931d29c43c68] ondemand_readahead at ffffffff859cad9f #38 [ffff931d29c43cb0] page_cache_sync_readahead at ffffffff859cb1c4 #39 [ffff931d29c43cf0] generic_file_aio_read at ffffffff859bf612 #40 [ffff931d29c43dc8] nfs_file_read at ffffffffc0764161 [nfs] #41 [ffff931d29c43e00] do_sync_read at ffffffff85a49fa3 #42 [ffff931d29c43ed8] vfs_read at ffffffff85a4a9df #43 [ffff931d29c43f08] sys_read at ffffffff85a4b84f #44 [ffff931d29c43f50] system_call_fastpath at ffffffff85f8fede RIP: 00007f539c709990 RSP: 00007fffc5dea030 RFLAGS: 00010206 RAX: 0000000000000000 RBX: 0000000000611640 RCX: ffffffff85f8fe21 RDX: 0000000000007000 RSI: 00000000006eb000 RDI: 0000000000000000 RBP: 0000000000007000 R8: 0000000000000003 R9: 0000000000009003 R10: 00007fffc5de9aa0 R11: 0000000000000246 R12: 00000000006eb000 R13: 0000000000000000 R14: 00007fffc5deb819 R15: 00007fffc5dea288 ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b I just got a second crash with only 18 minutes uptime - it does not have the same initial backtrace but the crashing thread got an interrupt from the idle thread. The second crash. Only up 18 minutes, and was from an idle thread. KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-1062.23.1.el7.x86_64/vmlinux DUMPFILE: /cores/retrace/tasks/596434237/crash/vmcore [PARTIAL DUMP] CPUS: 8 DATE: Fri May 1 10:48:39 2020 UPTIME: 00:18:02 LOAD AVERAGE: 388.53, 707.87, 547.47 TASKS: 1659 NODENAME: rhel7u7-node1.dwysocha.net RELEASE: 3.10.0-1062.23.1.el7.x86_64 VERSION: #1 SMP Mon Apr 27 07:15:53 EDT 2020 MACHINE: x86_64 (1795 Mhz) MEMORY: 4 GB PANIC: "kernel BUG at fs/fscache/operation.c:70!" crash> log | grep "==" [ 1083.310044] FS-Cache: 4 == 5 is false crash> bt PID: 0 TASK: ffff9ae279510000 CPU: 4 COMMAND: "swapper/4" #0 [ffff9ae27fd03950] machine_kexec at ffffffff85465b34 #1 [ffff9ae27fd039b0] __crash_kexec at ffffffff855225d2 #2 [ffff9ae27fd03a80] crash_kexec at ffffffff855226c0 #3 [ffff9ae27fd03a98] oops_end at ffffffff85b87798 #4 [ffff9ae27fd03ac0] die at ffffffff85430a7b #5 [ffff9ae27fd03af0] do_trap at ffffffff85b86ee0 #6 [ffff9ae27fd03b40] do_invalid_op at ffffffff8542d2a4 #7 [ffff9ae27fd03bf0] invalid_op at ffffffff85b9322e [exception RIP: fscache_enqueue_operation+255] RIP: ffffffffc04674cf RSP: ffff9ae27fd03ca8 RFLAGS: 00010046 RAX: 0000000000000019 RBX: ffff9ae1f80b2480 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000082 RBP: ffff9ae27fd03cb0 R8: 000000008e9111e2 R9: ffffffff864e49ad R10: 000000000001ef38 R11: 0000000000100000 R12: ffff9ae1f80b2480 R13: ffff9ae143ed0ba0 R14: ffff9ae27fd03d48 R15: ffff9ae1c56089c0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffff9ae27fd03ca0] fscache_enqueue_operation at ffffffffc04674cf [fscache] #9 [ffff9ae27fd03cb8] cachefiles_read_waiter at ffffffffc0443c12 [cachefiles] #10 [ffff9ae27fd03ce0] __wake_up_common at ffffffff854d3942 #11 [ffff9ae27fd03d38] __wake_up_common_lock at ffffffff854d3a63 #12 [ffff9ae27fd03da8] __wake_up at ffffffff854d3ab3 #13 [ffff9ae27fd03db8] __wake_up_bit at ffffffff854c72a0 #14 [ffff9ae27fd03de8] unlock_page at ffffffff855bdcdb #15 [ffff9ae27fd03e00] page_endio at ffffffff855bdd09 #16 [ffff9ae27fd03e20] mpage_end_io at ffffffff8568fc81 #17 [ffff9ae27fd03e48] bio_endio at ffffffff85689de7 #18 [ffff9ae27fd03e70] blk_update_request at ffffffff857519d0 #19 [ffff9ae27fd03eb0] blk_mq_end_request at ffffffff8575bb7a #20 [ffff9ae27fd03ed0] virtblk_request_done at ffffffffc010a3bf [virtio_blk] #21 [ffff9ae27fd03ee0] blk_done_softirq at ffffffff857592f6 #22 [ffff9ae27fd03f20] __do_softirq at ffffffff854a54b5 #23 [ffff9ae27fd03f90] call_softirq at ffffffff85b9342c #24 [ffff9ae27fd03fa8] do_softirq at ffffffff8542f715 #25 [ffff9ae27fd03fc8] irq_exit at ffffffff854a5835 #26 [ffff9ae27fd03fe0] smp_call_function_single_interrupt at ffffffff85459479 #27 [ffff9ae27fd03ff0] call_function_single_interrupt at ffffffff85b921aa --- <IRQ stack> --- #28 [ffff9ae27950fdf8] call_function_single_interrupt at ffffffff85b921aa [exception RIP: native_safe_halt+11] RIP: ffffffff85b842eb RSP: ffff9ae27950fea8 RFLAGS: 00000246 RAX: ffffffff85b840a0 RBX: ffffffff854d51ae RCX: 0100000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046 RBP: ffff9ae27950fea8 R8: 0000000000000000 R9: 0000000000000001 R10: 0000000000000000 R11: 000000000000000c R12: ffff9ae27950dfd8 R13: 0000000000000000 R14: ffff9ae279510698 R15: 0000000100000286 ORIG_RAX: ffffffffffffff04 CS: 0010 SS: 0018 #29 [ffff9ae27950feb0] default_idle at ffffffff85b840be #30 [ffff9ae27950fed0] arch_cpu_idle at ffffffff85437c80 #31 [ffff9ae27950fee0] cpu_startup_entry at ffffffff8550185a #32 [ffff9ae27950ff28] start_secondary at ffffffff8545a0c7 #33 [ffff9ae27950ff50] start_cpu at ffffffff854000d5 Just reproduced on upstream plus a few patches to address other upstream issues. This is going to be a longer road to get addressed because it is an upstream problem and is behind other upstream patches in the queue not yet accepted, just to reproduce it. [ 681.410052] 0 pages hwpoisoned [ 707.035117] bash (13115): drop_caches: 3 [ 707.107184] FS-Cache: [ 707.115381] FS-Cache: Assertion failed [ 707.117343] FS-Cache: 4 == 5 is false [ 707.122937] ------------[ cut here ]------------ [ 707.146624] kernel BUG at fs/fscache/operation.c:69! [ 707.157357] invalid opcode: 0000 [#1] SMP PTI [ 707.159160] CPU: 5 PID: 11070 Comm: kworker/u16:13 Kdump: loaded Not tainted 5.7.0-rc3-bz1829662-1+ #4 [ 707.176470] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 707.189125] Workqueue: 0x0 (fscache_object) [ 707.195660] RIP: 0010:fscache_enqueue_operation.cold+0x8c/0x12a [fscache] [ 707.198639] Code: c7 28 86 4b c0 e8 c5 0e c9 f3 48 c7 c7 36 86 4b c0 e8 b9 0e c9 f3 8b 75 40 ba 05 00 00 00 48 c7 c7 b8 6c 4b c0 e8 a5 0e c9 f3 <0f> 0b 65 48 8b 34 25 c0 8b 01 00 48 c7 c7 05 87 4b c0 48 81 c6 78 [ 707.206758] RSP: 0018:ffffb11680188d80 EFLAGS: 00010046 [ 707.209509] RAX: 0000000000000019 RBX: ffff9ea1e564e0c0 RCX: 0000000000000007 [ 707.213054] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff9ea3b7d59c20 [ 707.216444] RBP: ffff9ea1dcd23840 R08: 00000000000002d6 R09: 0000000000000028 [ 707.219751] R10: ffffb11680188c30 R11: 0000000000000000 R12: ffff9ea35a2ccf00 [ 707.222812] R13: ffff9ea1e564eac0 R14: ffff9ea1e564e100 R15: ffff9ea1dcd238c8 [ 707.225707] FS: 0000000000000000(0000) GS:ffff9ea3b7d40000(0000) knlGS:0000000000000000 [ 707.229339] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 707.231734] CR2: 00000000000000b0 CR3: 000000020927e000 CR4: 00000000000406e0 [ 707.234711] Call Trace: [ 707.235769] <IRQ> [ 707.236829] cachefiles_read_waiter+0x13b/0x160 [cachefiles] [ 707.239147] __wake_up_common+0x80/0x180 [ 707.240962] wake_up_page_bit+0xab/0x100 [ 707.242768] __read_end_io+0xb6/0x120 [ 707.244350] blk_update_request+0x17b/0x2d0 [ 707.246066] blk_mq_end_request+0x1c/0x130 [ 707.267961] blk_done_softirq+0xa1/0xd0 [ 707.269567] __do_softirq+0xee/0x2ff [ 707.271055] irq_exit+0xe9/0xf0 [ 707.272355] do_IRQ+0x4d/0xd0 [ 707.273595] common_interrupt+0xf/0xf [ 707.275057] </IRQ> [ 707.275936] RIP: 0010:finish_task_switch+0x7d/0x250 [ 707.277850] Code: 90 65 48 8b 1c 25 c0 8b 01 00 66 66 66 66 90 66 66 66 66 90 41 c7 45 38 00 00 00 00 4c 89 e7 e8 19 cf 02 00 66 90 fb 66 66 90 <66> 66 90 4d 85 f6 74 21 65 48 8b 04 Upon further review, I think Jay and Radomir are correct for suspecting that one of the patches to fix the previous bug is related to this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1683490#c11 Even though the commits fix obvious problems with inverted logic, I also started being suspicious of the below commit being related based on code paths and some testing data, even though so far I have not 100% put the pieces together: b27ddd462453 fscache: Pass the correct cancelled indications to fscache_op_complete() If we wanted, and people are concerned, we could revert the above commit and use that in the z-streams and for the shorter term. However this is an upstream bug as well so we will need to track this down and fix it. Note that we only started seeing this new bug _after_ we pulled in the above patch, which was a fix for inverted logic in multiple places in the code. However it was not the primary fix to the original bug, so in that sense was a peripheral change. What follows is a summary of my understanding of this so far. So far I have not obtained a very good trace showing definitively the full picture of what is happening, but I'm working on it. This bug involves hitting an assertion involving the op->state == FSCACHE_OP_ST_COMPLETE(4) but should be FSCACHE_OP_CANCELLED(5) kernel BUG at fs/fscache/operation.c:70! FS-Cache: 4 == 5 is false 50 /** 51 * fscache_enqueue_operation - Enqueue an operation for processing 52 * @op: The operation to enqueue 53 * 54 * Enqueue an operation for processing by the FS-Cache thread pool. 55 * 56 * This will get its own ref on the object. 57 */ 58 void fscache_enqueue_operation(struct fscache_operation *op) 59 { 60 struct fscache_cookie *cookie = op->object->cookie; 61 62 _enter("{OBJ%x OP%x,%u}", 63 op->object->debug_id, op->debug_id, atomic_read(&op->usage)); 64 65 ASSERT(list_empty(&op->pend_link)); 66 ASSERT(op->processor != NULL); 67 ASSERT(fscache_object_is_available(op->object)); 68 ASSERTCMP(atomic_read(&op->usage), >, 0); 69 ASSERTIFCMP(op->state != FSCACHE_OP_ST_IN_PROGRESS, ---> 70 op->state, ==, FSCACHE_OP_ST_CANCELLED); I started by looking at which code actually sets op->state == FSCACHE_OP_ST_COMPLETE. As it turns out, this is only done in one place, and not surprisingly this is inside the async operation code of fscache (for more info, see Documentation/filesystems/caching/operations.txt), inside fscache_op_complete() fs/fscache/operation.c 439 /* 440 * Record the completion or cancellation of an in-progress operation. 441 */ 442 void fscache_op_complete(struct fscache_operation *op, bool cancelled) 443 { 444 struct fscache_object *object = op->object; 445 446 _enter("OBJ%x", object->debug_id); 447 448 ASSERTCMP(op->state, ==, FSCACHE_OP_ST_IN_PROGRESS); 449 ASSERTCMP(object->n_in_progress, >, 0); 450 ASSERTIFCMP(test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags), 451 object->n_exclusive, >, 0); 452 ASSERTIFCMP(test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags), 453 object->n_in_progress, ==, 1); 454 455 spin_lock(&object->lock); 456 457 if (!cancelled) { 458 op->state = FSCACHE_OP_ST_COMPLETE; 459 } else { 460 op->cancel(op); 461 op->state = FSCACHE_OP_ST_CANCELLED; 462 } 463 I looked at that commit and said, ok so we're messing with that last argument because the logic was erroneously inverted. If we get into a situation where FSCACHE_OP_ST_COMPLETE was set by someone racing ahead then that could give us this issue whereas an erroneous FSCACHE_OP_ST_CANCELLED would be benign (the latter would pass the ASSERTIFCMP hit in this bug), and that only happens when we call fscache_op_complete(..., false). So then I look at the commit and noticed there were 3 instances where we change the last argument from true to false, and I built 3 test kernels, with each of the 3 hunks reverted. I ran with the first hunk reverted, and the problem reproduced easily. The second hunk reverted rendered fscache nonfunctional. But the third hunk reverted fscache_retrieval_complete(), and after I reverted it the test ran ok again. Why might this make sense? Well I'm pretty sure what is actually happening is some sort of race inside cachefiles_read_backing_file() and the "page monitor callback" cachefiles_read_waiter(). I'm pretty sure we somehow getting into the following situation: cachefiles_read_backing_file() -> fscache_retrieval_complete() -> fscache_op_complete(.., false) then at some later time cachefiles_read_waiter() gets called (via backtrace in this bug) -> fscache_enqueue_operation() The last operation blows up because the op->state is not correct - it has already completed. So if this is what is happening either the monitor callback needs cancelled or we need to recognize that inside the monitor and not call fscache_enqueue_operation(). But I need to get a trace of what is happening, unless myself or someone else can spot something obvious there. I also note that unfortunately, callers of fscache_enqueue_operation() are fairly large number so reverting hunk #3 is fairly significant logic change. Undo hunk #3 in commit b27ddd46245311850f850024df54d0537506f3c Test #3: diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h index 3b03e29e2f1a..b19fa8592fc2 100644 --- a/include/linux/fscache-cache.h +++ b/include/linux/fscache-cache.h @@ -185,7 +185,7 @@ static inline void fscache_retrieval_complete(struct fscache_retrieval *op, { atomic_sub(n_pages, &op->n_pages); if (atomic_read(&op->n_pages) <= 0) - fscache_op_complete(&op->op, true); + fscache_op_complete(&op->op, false); } Jay - thanks for the willingness to help. At this point it is a matter of tracking down the code path(s) in question and then crafting an acceptable patch. I was having problems with kdump and ftrace but I decided to just forgo ftrace and be more laser-focused on this specific problem and the code paths, so I wrote some tracing code specific to this bug. While ftrace is very useful to see general picture, and there are some good fscache trace points in upstream, to my knowledge ftrace does not have the ability to trace code paths in a per-structure buffer, which is what is needed here. With ftrace it is very easy to blow out the whole buffer with irrelevant information, and/or change the timing to affect reproducibility, both of which were happening to me. So after some custom tracing code to trace specific events on a per fscache_operation basis, I have a bit clearer picture of this bug per the below. It is a race between cachefiles_read_copier() and cachefiles_read_waiter(), where read_copier calls fscache_retrieval_complete() and then ends up inside that fscache_op_complete() (this is the same code that was changed in hunk #3 per comment #27). And right after read_copier runs, happens cachefiles_read_waiter() is called via the end_io() wake_up on the page queue. So I now have better clarity of what is going on in this bug, though a few of the specifics (and thus the solution) are still unclear. [ 79.653142] 4294745512 pid=11130 comm=kworker/u16:11 file:line=fs/fscache/operation.c:661 func=fscache_op_work_func op->state=3 event=fscache_put_operation^M [ 79.653143] 4294745512 pid=13074 comm=pmgetopt file:line=fs/cachefiles/rdwr.c:65 func=cachefiles_read_waiter op->state=3 event=fscache_enqueue_retrieval->fscache_enqueue_operation^M [ 79.653144] 4294745512 pid=13074 comm=pmgetopt file:line=include/linux/fscache-cache.h:252 func=fscache_put_retrieval op->state=3 event=fscache_put_operation^M [ 79.653145] 4294745512 pid=554 comm=kworker/u16:4 file:line=fs/cachefiles/rdwr.c:204 func=cachefiles_read_copier op->state=3 event=fscache_retrieval_complete^M [ 79.653146] 4294745512 pid=13074 comm=pmgetopt file:line=include/linux/fscache-cache.h:252 func=fscache_put_retrieval op->state=3 event=fscache_put_operation^M [ 79.653147] 4294745512 pid=13074 comm=pmgetopt file:line=fs/cachefiles/rdwr.c:65 func=cachefiles_read_waiter op->state=3 event=fscache_enqueue_retrieval->fscache_enqueue_operation^M [ 79.653147] 4294745512 pid=554 comm=kworker/u16:4 file:line=include/linux/fscache-cache.h:252 func=fscache_put_retrieval op->state=3 event=fscache_put_operation^M [ 79.653148] 4294745512 pid=13074 comm=pmgetopt file:line=fs/fscache/operation.c:110 func=fscache_enqueue_operation_real op->state=3 event=fscache_put_operation^M [ 79.653149] 4294745512 pid=554 comm=kworker/u16:4 file:line=fs/cachefiles/rdwr.c:204 func=cachefiles_read_copier op->state=3 event=fscache_retrieval_complete^M [ 79.653150] 4294745512 pid=554 comm=kworker/u16:4 file:line=include/linux/fscache-cache.h:252 func=fscache_put_retrieval op->state=3 event=fscache_put_operation^M [ 79.653151] 4294745512 pid=554 comm=kworker/u16:4 file:line=fs/fscache/operation.c:661 func=fscache_op_work_func op->state=3 event=fscache_put_operation^M [ 79.653152] 4294745512 pid=59 comm=kworker/u16:1 file:line=fs/cachefiles/rdwr.c:204 func=cachefiles_read_copier op->state=3 event=fscache_retrieval_complete^M [ 79.653152] 4294745512 pid=59 comm=kworker/u16:1 file:line=fs/cachefiles/rdwr.c:204 func=cachefiles_read_copier op->state=3 event=fscache_retrieval_complete->fscache_op_complete^M [ 79.653155] 4294745512 pid=59 comm=kworker/u16:1 file:line=fs/cachefiles/rdwr.c:204 func=cachefiles_read_copier op->state=3 event=fscache_op_complete cancelled=false^M [ 79.662397] 4294745512 pid=59 comm=kworker/u16:1 file:line=include/linux/fscache-cache.h:252 func=fscache_put_retrieval op->state=4 event=fscache_put_operation^M [ 79.662398] 4294745512 pid=59 comm=kworker/u16:1 file:line=fs/fscache/operation.c:661 func=fscache_op_work_func op->state=4 event=fscache_put_operation^M [ 79.662399] 4294745512 pid=13074 comm=pmgetopt file:line=fs/cachefiles/rdwr.c:65 func=cachefiles_read_waiter op->state=4 event=fscache_enqueue_retrieval->fscache_enqueue_operation^M [ 79.662410] FS-Cache: ^M [ 79.662411] FS-Cache: Assertion failed^M [ 79.662412] FS-Cache: 4 == 5 is false^M [ 79.662754] ------------[ cut here ]------------^M [ 79.677409] kernel BUG at fs/fscache/operation.c:100!^M [ 79.677498] invalid opcode: 0000 [#1] SMP PTI^M [ 79.690824] CPU: 7 PID: 13074 Comm: pmgetopt Kdump: loaded Not tainted 5.7.0-rc3-1836833f67ab-dbg+ #13^M [ 79.690825] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011^M [ 79.690841] RIP: 0010:fscache_enqueue_operation_real.cold+0xa3/0x141 [fscache]^M [ 79.690845] Code: c7 64 1a 7e c0 e8 6e 80 96 e7 48 c7 c7 72 1a 7e c0 e8 62 80 96 e7 8b 75 40 ba 05 00 00 00 48 c7 c7 60 ff 7d c0 e8 4e 80 96 e7 <0f> 0b 65 48 8b 34 25 c0 8b 01 00 48 c7 c [ 79.690846] RSP: 0018:ffffb740001e0da0 EFLAGS: 00010046^M [ 79.690848] RAX: 0000000000000019 RBX: ffff893b102c2640 RCX: 0000000000000007^M [ 79.690849] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff893c37dd9c20^M [ 79.690850] RBP: ffff893b1e640000 R08: 000000128c3ff314 R09: ffffffffa9efc024^M [ 79.690852] R10: 0000000000000368 R11: 000000000002c288 R12: ffff893b48e3bf80^M [ 79.690853] R13: ffff893b102c30b8 R14: ffff893a1203e580 R15: ffff893b1e650090^M [ 79.690855] FS: 0000000000000000(0000) GS:ffff893c37dc0000(0000) knlGS:0000000000000000^M [ 79.690856] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M [ 79.690857] CR2: 00007fc5964c72f0 CR3: 00000000348d0000 CR4: 00000000000406e0^M [ 79.690862] Call Trace:^M [ 79.707917] <IRQ>^M [ 79.707956] cachefiles_read_waiter+0x1a7/0x230 [cachefiles]^M [ 79.707983] __wake_up_common+0x80/0x180^M [ 79.717234] wake_up_page_bit+0xab/0x100^M [ 79.725948] __read_end_io+0xb6/0x120^M [ 79.737417] blk_update_request+0x17b/0x2d0^M [ 79.744330] blk_mq_end_request+0x1c/0x130^M [ 79.751155] blk_done_softirq+0xa1/0xd0^M [ 79.762223] __do_softirq+0xee/0x2ff^M [ 79.762256] irq_exit+0xe9/0xf0^M Ok so my next trace proves that this bug is another incantation of the first situation as Jay pointed out in comment #9 that started the email thread: https://lore.kernel.org/patchwork/patch/889373/ diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 883bc7bb12c5..9d5d13e150fb 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -58,9 +58,9 @@ static int cachefiles_read_waiter(wait_queue_entry_t *wait, unsigned mode, spin_lock(&object->work_lock); list_add_tail(&monitor->op_link, &monitor->op->to_do); + fscache_enqueue_retrieval(monitor->op); spin_unlock(&object->work_lock); - fscache_enqueue_retrieval(monitor->op); return 0; } Basically the patch that was committed in comment #10 does indeed fix a use-after-free bug. However there still remains a race here where the read_waiter can add to to_do but then release the spinlock and in that time the read_copier runs through to fscache_retrieval_complete() even before the read_waiter calls fscache_enqueue_retrieval(). The discussion died out and missed that the original posters patch in comment #9 was needed to fix this other race. Since this is a fscache_retrieval operation, op->flags has FSCACHE_OP_MYTHREAD set, and if you look at fscache_enqueue_operation() for that case (note that fscache_enqueue_retrieval is a 1-liner that calls fscache_enqueue_operation), the logic is purely a few ASSERT checks, stats accounting and ftrace so it does not have any functional value. In theory another approach could be to rework the ASSERT being hit to just ignore the race in that case and say we'll accept the FSCACHE_OP_ST_COMPLETE (assume our thread already picked it up). In essence there looks to be a small missing piece in the API here since the "enqueue" really happened in the caller, not here, but that's probably just rambling on further than we need to go. 58 void fscache_enqueue_operation(struct fscache_operation *op) 59 { 60 struct fscache_cookie *cookie = op->object->cookie; 61 62 _enter("{OBJ%x OP%x,%u}", 63 op->object->debug_id, op->debug_id, atomic_read(&op->usage)); 64 65 ASSERT(list_empty(&op->pend_link)); 66 ASSERT(op->processor != NULL); 67 ASSERT(fscache_object_is_available(op->object)); 68 ASSERTCMP(atomic_read(&op->usage), >, 0); 69 ASSERTIFCMP(op->state != FSCACHE_OP_ST_IN_PROGRESS, 70 op->state, ==, FSCACHE_OP_ST_CANCELLED); 71 72 fscache_stat(&fscache_n_op_enqueue); 73 switch (op->flags & FSCACHE_OP_TYPE) { 74 case FSCACHE_OP_ASYNC: 75 trace_fscache_op(cookie, op, fscache_op_enqueue_async); 76 _debug("queue async"); 77 atomic_inc(&op->usage); 78 if (!queue_work(fscache_op_wq, &op->work)) 79 fscache_put_operation(op); 80 break; 81 case FSCACHE_OP_MYTHREAD: 82 trace_fscache_op(cookie, op, fscache_op_enqueue_mythread); 83 _debug("queue for caller's attention"); 84 break; 85 default: 86 pr_err("Unexpected op type %lx", op->flags); 87 BUG(); 88 break; 89 } 90 } 91 EXPORT_SYMBOL(fscache_enqueue_operation); I am going to prep the original patch that moves the call to fscache_enqueue_retrieval() inside the spinlock, with credit to the author, and probably add a sentence or two of explanation, because that is probably the most reasonable approach given where we are. [ 94.996481] 4294760822 pid=41 comm=ksoftirqd/6 file:line=fs/cachefiles/rdwr.c:62 func=cachefiles_read_waiter op->state=3 event=cachefiles_read_waiter: adding to monitor to_do^M [ 95.002117] 4294760822 pid=11120 comm=kworker/u16:8 file:line=include/linux/fscache-cache.h:252 func=fscache_put_retrieval op->state=3 event=fscache_put_operation^M [ 95.008096] 4294760822 pid=41 comm=ksoftirqd/6 file:line=fs/cachefiles/rdwr.c:66 func=cachefiles_read_waiter op->state=3 event=fscache_enqueue_retrieval->fscache_enqueue_operation^M [ 95.013673] 4294760822 pid=11120 comm=kworker/u16:8 file:line=fs/cachefiles/rdwr.c:201 func=cachefiles_read_copier op->state=3 event=cachefiles_read_copier: error=0^M [ 95.019839] 4294760822 pid=11120 comm=kworker/u16:8 file:line=fs/cachefiles/rdwr.c:206 func=cachefiles_read_copier op->state=3 event=fscache_retrieval_complete^M [ 95.025515] 4294760822 pid=11120 comm=kworker/u16:8 file:line=fs/cachefiles/rdwr.c:206 func=cachefiles_read_copier op->state=3 event=fscache_retrieval_complete->fscache_op_complete^M [ 95.030996] 4294760822 pid=11120 comm=kworker/u16:8 file:line=fs/cachefiles/rdwr.c:206 func=cachefiles_read_copier op->state=3 event=fscache_op_complete cancelled=false^M [ 95.037197] 4294760822 pid=11120 comm=kworker/u16:8 file:line=include/linux/fscache-cache.h:252 func=fscache_put_retrieval op->state=4 event=fscache_put_operation^M [ 95.042968] 4294760822 pid=11120 comm=kworker/u16:8 file:line=fs/fscache/operation.c:661 func=fscache_op_work_func op->state=4 event=fscache_put_operation^M [ 95.048545] 4294760822 pid=11120 comm=kworker/u16:8 file:line=fs/fscache/operation.c:661 func=fscache_op_work_func op->state=4 event=fscache_put_operation^M [ 95.059218] FS-Cache: ^M [ 95.060174] FS-Cache: Assertion failed^M [ 95.061647] FS-Cache: 4 == 5 is false^M [ 95.063659] ------------[ cut here ]------------^M [ 95.065486] kernel BUG at fs/fscache/operation.c:100!^M [ 95.067596] invalid opcode: 0000 [#1] SMP PTI^M [ 95.069333] CPU: 6 PID: 41 Comm: ksoftirqd/6 Kdump: loaded Not tainted 5.7.0-rc3-1836833f67ab-dbg+ #13^M [ 95.072973] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011^M [ 95.075333] RIP: 0010:fscache_enqueue_operation_real.cold+0xa3/0x141 [fscache]^M [ 95.078163] Code: c7 64 3a 4c c0 e8 6e 60 c8 c7 48 c7 c7 72 3a 4c c0 e8 62 60 c8 c7 8b 75 40 ba 05 00 00 00 48 c7 c7 60 1f 4c c0 e8 4e 60 c8 c7 <0f> 0b 65 48 8b 34 25 c0 8b 01 00 48 c7 c [ 95.085371] RSP: 0018:ffffb80a4019fca0 EFLAGS: 00010046^M [ 95.087419] RAX: 0000000000000019 RBX: ffffa061f6faa640 RCX: 0000000000000007^M [ 95.090172] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffffa061f7d99c20^M [ 95.092915] RBP: ffffa06032120000 R08: 00000000000003a8 R09: 0000000000000028^M [ 95.095708] R10: ffffb80a4019fb50 R11: 0000000000000000 R12: ffffa06130590f00^M [ 95.098486] R13: ffffa061f6fab0b8 R14: ffffa061f13bb820 R15: ffffa06032130090^M [ 95.101254] FS: 0000000000000000(0000) GS:ffffa061f7d80000(0000) knlGS:0000000000000000^M [ 95.104397] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M [ 95.106631] CR2: 00007fb88756c320 CR3: 00000000ae2f6000 CR4: 00000000000406e0^M [ 95.109419] Call Trace:^M [ 95.110504] cachefiles_read_waiter+0x216/0x290 [cachefiles]^M [ 95.112818] __wake_up_common+0x80/0x180^M [ 95.114459] wake_up_page_bit+0xab/0x100^M [ 95.116054] __read_end_io+0xb6/0x120^M [ 95.117572] blk_update_request+0x17b/0x2d0^M [ 95.119299] blk_mq_end_request+0x1c/0x130^M [ 95.120967] blk_done_softirq+0xa1/0xd0^M [ 95.122536] __do_softirq+0xee/0x2ff^M [ 95.123986] run_ksoftirqd+0x26/0x40^M [ 95.125451] smpboot_thread_fn+0xc5/0x160^M [ 95.127059] kthread+0xf9/0x130^M In read_waiter: 20 static int cachefiles_read_waiter(wait_queue_entry_t *wait, unsigned mode, 21 int sync, void *_key) 22 { .. 49 50 /* move onto the action list and queue for FS-Cache thread pool */ 51 ASSERT(op); 52 53 /* We need to temporarily bump the usage count as we don't own a ref 54 * here otherwise cachefiles_read_copier() may free the op between the 55 * monitor being enqueued on the op->to_do list and the op getting 56 * enqueued on the work queue. 57 */ 58 fscache_get_retrieval(op); 59 60 object = container_of(op->op.object, struct cachefiles_object, fscache); 61 spin_lock(&object->work_lock); 62 fscache_op_event_record(&op->op, "cachefiles_read_waiter: adding to monitor to_do"); 63 list_add_tail(&monitor->op_link, &op->to_do); 64 spin_unlock(&object->work_lock); 65 66 fscache_enqueue_retrieval(op); 67 fscache_put_retrieval(op); 68 return 0; 69 } In read_copier: 149 /* 150 * copy data from backing pages to netfs pages to complete a read operation 151 * - driven by FS-Cache's thread pool 152 */ 153 static void cachefiles_read_copier(struct fscache_operation *_op) 154 { ... 167 spin_lock_irq(&object->work_lock); 168 169 while (!list_empty(&op->to_do)) { 170 monitor = list_entry(op->to_do.next, 171 struct cachefiles_one_read, op_link); 172 list_del(&monitor->op_link); 173 174 spin_unlock_irq(&object->work_lock); ... 178 recheck: 179 if (test_bit(FSCACHE_COOKIE_INVALIDATING, 180 &object->fscache.cookie->flags)) { 181 error = -ESTALE; 182 } else if (PageUptodate(monitor->back_page)) { 183 copy_highpage(monitor->netfs_page, monitor->back_page); 184 fscache_mark_page_cached(monitor->op, 185 monitor->netfs_page); 186 error = 0; 187 } else if (!PageError(monitor->back_page)) { 188 /* the page has probably been truncated */ 189 error = cachefiles_read_reissue(object, monitor); 190 if (error == -EINPROGRESS) 191 goto next; 192 goto recheck; 193 } else { 194 cachefiles_io_error_obj( 195 object, 196 "Readpage failed on backing file %lx", 197 (unsigned long) monitor->back_page->flags); 198 error = -EIO; 199 } 200 201 fscache_op_event_record(&op->op, error == 0 ? "cachefiles_read_copier: error=0":"cachefiles_read_copier: error=-ESTALE or -EIO"); 202 put_page(monitor->back_page); 203 204 fscache_end_io(op, monitor->netfs_page, error); 205 put_page(monitor->netfs_page); 206 fscache_retrieval_complete(op, 1); 207 fscache_put_retrieval(op); 208 kfree(monitor); With the patch in comment #30 added to dhowells fscache-fixes branch (5.7-rc3 + a few recent small fixes), the reproducer ran for over 10 hours whereas prior to the patch it would oops within a few minutes. Also with the exception of a periodic below page_allocation error, which per dhowells is a separate / known issue, and one reason for the in-progress rewrite of fscache, I saw no other issue with the test, and I could kill the test and unmount the share, etc. One final note - in the course of running these tests, I did see a periodic cachefilesd crash on startup, and I do have a corefile of this, but I've not yet filed a bug. The crash resolved upon a simple restart of cachefilesd though so it was not deemed a critical issue. Once I sort this bug out, I'll investigate that further and file another bug as needed, if one doesn't already exist. [38532.455608] warn_alloc: 2 callbacks suppressed [38532.455624] dd: page allocation failure: order:0, mode:0x90c00(GFP_NOIO|__GFP_NORETRY|__GFP_NOMEMALLOC), nodemask=(null),cpuset=/,mems_allowed=0 [38532.511641] CPU: 3 PID: 462539 Comm: dd Kdump: loaded Not tainted 5.7.0-rc3-1836833f67ab-bz1829662-fix+ #14 [38532.516041] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [38532.527383] Call Trace: [38532.534441] dump_stack+0x66/0x90 [38532.542477] warn_alloc.cold+0x7b/0xdf [38532.550760] ? unreserve_highatomic_pageblock+0x16e/0x190 [38532.562946] __alloc_pages_slowpath.constprop.0+0xd9c/0xdd0 [38532.570996] ? mem_cgroup_commit_charge+0xcb/0x1a0 [38532.573624] ? __add_to_page_cache_locked+0x349/0x3f0 [38532.578266] __alloc_pages_nodemask+0x2e8/0x330 [38532.580195] cachefiles_read_or_alloc_pages+0x3d6/0x1020 [cachefiles] [38532.582885] ? __wake_up_bit+0x52/0x70 [38532.584527] ? fscache_run_op.isra.0+0x55/0xb0 [fscache] [38532.586686] __fscache_read_or_alloc_pages+0x2a8/0x310 [fscache] [38532.589320] __nfs_readpages_from_fscache+0x60/0x160 [nfs] [38532.591585] nfs_readpages+0xb5/0x1a0 [nfs] [38532.593718] ? __switch_to_asm+0x40/0x70 [38532.595331] read_pages+0x6b/0x1b0 [38532.596737] __do_page_cache_readahead+0x1ba/0x1d0 [38532.598686] ondemand_readahead+0x171/0x2e0 [38532.600355] ? page_cache_async_readahead+0x6c/0xe0 [38532.604840] generic_file_read_iter+0x7cb/0xe40 [38532.608240] ? nfs3_proc_commit_setup+0x10/0x10 [nfsv3] [38532.610356] ? nfs_check_cache_invalid+0x33/0x90 [nfs] [38532.612432] nfs_file_read+0x6d/0xa0 [nfs] [38532.614182] new_sync_read+0x12a/0x1c0 [38532.615773] vfs_read+0x9d/0x150 [38532.617134] ksys_read+0x5f/0xe0 [38532.619116] do_syscall_64+0x5b/0x1c0 [38532.620624] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [38532.622682] RIP: 0033:0x7ff8816ec3e2 Created attachment 1686164 [details]
patch submitted to linux-nfs
Created attachment 1689835 [details]
Debug patches on top of 5.7-rc3 used to track down this problem
These are a couple debug patches I wrote to track down this problem. Specifically due to the nature of the bug, I found ftrace would create too much overhead and change the timing so it dramatically slowed the reproducer or made it fail entirely. So I shifted to a bit of custom tracing inside the structure in question. I could do this since it was not a 'use after free' problem but more of a 'structure in the wrong state' problem. Placing the tracing inside the structure allows for 'playback' of the specific trace attached to the one structure that fails right before the machine panics, rather than a deluge of trace information that is irrelevant to the problem.
Patch(es) committed on kernel-3.10.0-1145.el7 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:4060 |