Red Hat Bugzilla – Bug 1383739
BUG: Dentry ffff880232eeacc0{i=800fe1,n=f290} still in use (1)
Last modified: 2017-08-01 22:29:35 EDT
I hit the following kernel BUG() when running an fsstress + parallel fsfreeze race test on 3.10.0-512.el7.x86_64: BUG: Dentry ffff880232eeacc0{i=800fe1,n=f290} still in use (1) [unmount of xfs dm-7] ------------[ cut here ]------------ kernel BUG at fs/dcache.c:946! ... The test procedure is to mkfs and mount a linear lvm volume and run the following in one shell: <xfstests-dev>/ltp/fsstress -d /mnt -n 999999 -p 4 ... and start the following loop in two others: i=0 while [ true ]; do echo $i fsfreeze -f /mnt fsfreeze -u /mnt i=$((i + 1)) done Note that the problem doesn't typically occur until a second freeze/unfreeze loop is started, at which point it occurs rather quickly in my tests. The problem occurs with XFS and ext4 (with a slightly different call stack), which suggests this may be a vfs issue. Further details to follow...
This is the full bug report for XFS: BUG: Dentry ffff880232eeacc0{i=800fe1,n=f290} still in use (1) [unmount of xfs dm-7] ------------[ cut here ]------------ kernel BUG at fs/dcache.c:946! invalid opcode: 0000 [#1] SMP Modules linked in: xfs libcrc32c xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter kvm_amd kvm ipmi_ssif irqbypass sg ipmi_devintf dcdbas ipmi_si amd64_edac_mod ipmi_msghandler shpchp edac_mce_amd sp5100_tco edac_core pcspkr acpi_power_meter i2c_piix4 k10temp nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif crct10dif_generic crct10dif_common uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ixgbe drm mdio ptp megaraid_sas mpt2sas pps_core serio_raw dca i2c_core raid_class scsi_transport_sas bnx2 fjes dm_mirror dm_region_hash dm_log dm_mod CPU: 2 PID: 14592 Comm: xfs_io Not tainted 3.10.0-512.el7.x86_64 #1 Hardware name: Dell Inc. PowerEdge R715/0G2DP3, BIOS 2.3.0 10/18/2011 task: ffff88082fb53ec0 ti: ffff88082ab30000 task.ti: ffff88082ab30000 RIP: 0010:[<ffffffff812159dc>] [<ffffffff812159dc>] shrink_dcache_for_umount_subtree+0x1ac/0x1c0 RSP: 0018:ffff88082ab33df0 EFLAGS: 00010246 RAX: 0000000000000054 RBX: ffff880232eeacc0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff880237a4f838 RDI: ffff880237a4f838 RBP: ffff88082ab33e08 R08: 0000000000000096 R09: 00000000000005a1 R10: 0000000000000000 R11: ffff88082ab33af6 R12: ffff880232f7e6c0 R13: ffffffffa06b9240 R14: 00007ffcee5d987c R15: 0000000000000000 FS: 00007f32b7e1c740(0000) GS:ffff880237a40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000009c7008 CR3: 00000000bdbdb000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff880836f5b320 ffff880836f5b000 0000000000000083 ffff88082ab33e20 ffffffff812175cf ffff880836f5b000 ffff88082ab33e48 ffffffff81200631 ffff880187aa9380 0000000000000083 0000000000000000 ffff88082ab33e68 Call Trace: [<ffffffff812175cf>] shrink_dcache_for_umount+0x2f/0x60 [<ffffffff81200631>] generic_shutdown_super+0x21/0xf0 [<ffffffff81200ac7>] kill_block_super+0x27/0x70 [<ffffffff81200e09>] deactivate_locked_super+0x49/0x60 [<ffffffff81200e90>] thaw_super+0x70/0xb0 [<ffffffff81211df1>] do_vfs_ioctl+0x211/0x4b0 [<ffffffff812aea3e>] ? file_has_perm+0xae/0xc0 [<ffffffff81212131>] SyS_ioctl+0xa1/0xc0 [<ffffffff81696489>] system_call_fastpath+0x16/0x1b Code: 00 00 48 8b 40 28 4c 8b 08 48 8b 43 30 48 85 c0 74 1b 48 8b 50 40 48 89 34 24 48 c7 c7 a0 25 8e 81 48 89 de 31 c0 e8 fa 99 46 00 <0f> 0b 31 d2 eb e5 0f 0b 66 90 66 2e 0f 1f 84 00 00 00 00 00 66 RIP [<ffffffff812159dc>] shrink_dcache_for_umount_subtree+0x1ac/0x1c0 RSP <ffff88082ab33df0>
This is the full bug report for ext4: BUG: Dentry ffff8800bad60e40{i=0,n=dd90} still in use (1) [unmount of ext4 dm-6] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 IP: [<ffffffffa02911d1>] find_group_orlov+0x301/0x450 [ext4] PGD 637653067 PUD 632f73067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter kvm_amd kvm irqbypass sg ipmi_ssif ipmi_devintf dcdbas ipmi_si shpchp ipmi_msghandler pcspkr sp5100_tco amd64_edac_mod i2c_piix4 k10temp edac_mce_amd acpi_power_meter edac_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif crct10dif_generic uas crct10dif_common usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ixgbe ttm drm mdio ptp mpt2sas megaraid_sas pps_core serio_raw dca i2c_core raid_class scsi_transport_sas bnx2 fjes dm_mirror dm_region_hash dm_log dm_mod CPU: 6 PID: 3092 Comm: fsstress Not tainted 3.10.0-512.el7.x86_64 #1 Hardware name: Dell Inc. PowerEdge R715/0G2DP3, BIOS 2.3.0 10/18/2011 task: ffff880833f78fb0 ti: ffff8806292cc000 task.ti: ffff8806292cc000 RIP: 0010:[<ffffffffa02911d1>] [<ffffffffa02911d1>] find_group_orlov+0x301/0x450 [ext4] RSP: 0018:ffff8806292cfd08 EFLAGS: 00010246 RAX: 0000000000000722 RBX: ffff8808374d4800 RCX: ffff88022dcf4150 RDX: 0000000000000000 RSI: ffff8808374d0000 RDI: 0000000000002000 RBP: ffff8806292cfda0 R08: ffff8800bad60e60 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000004000 R12: ffff8808374d4800 R13: 0000000000000005 R14: 00000000002230e1 R15: 0000000000000050 FS: 00007f9c157a0740(0000) GS:ffff880237ac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000030 CR3: 00000004c52ee000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff8808374d4800 ffff8808374d4800 ffff8806292cfe04 000000000006d693 0001f8a40009db35 ffff88022dcf4150 ffff8806292cfd90 0000001000002000 ffff8806292cfd68 ffffffff81219250 ffff88022e01d578 00000000a81a32d9 Call Trace: [<ffffffff81219250>] ? alloc_inode+0x30/0xa0 [<ffffffffa0292810>] __ext4_new_inode+0x6e0/0x12d0 [ext4] [<ffffffffa02a44ac>] ext4_mkdir+0x1bc/0x410 [ext4] [<ffffffff8120a1f7>] vfs_mkdir+0xb7/0x160 [<ffffffff812101ff>] SyS_mkdirat+0x6f/0xe0 [<ffffffff81210289>] SyS_mkdir+0x19/0x20 [<ffffffff81696489>] system_call_fastpath+0x16/0x1b Code: 33 0c 25 28 00 00 00 0f 85 67 01 00 00 48 83 c4 70 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 44 00 00 49 8b 54 24 60 48 8b 4d 90 <48> 39 4a 30 74 10 48 8b 51 80 f7 c2 00 00 02 00 0f 84 c6 fd ff RIP [<ffffffffa02911d1>] find_group_orlov+0x301/0x450 [ext4] RSP <ffff8806292cfd08> CR2: 0000000000000030
This appears to be fixed upstream by commit 89f39af129 ("fs/super.c: fix race between freeze_super() and thaw_super()"). I'll post a backport after some wider testing. Note that this is reproducible on-demand via xfstests generic/390.
*** Bug 1436407 has been marked as a duplicate of this bug. ***
*** Bug 1388434 has been marked as a duplicate of this bug. ***
*** Bug 1450283 has been marked as a duplicate of this bug. ***
IBM blamed this bug on bug 1450283. Due to we already had a bug fix (thanks Brian), so let's review it fast, and fix this bug in 7.4. Thanks, Zorro
Created attachment 1278371 [details] dmesg log
Created attachment 1278372 [details] xmon log
Created attachment 1278373 [details] sosreport
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing
Patch(es) available on kernel-3.10.0-670.el7
------- Comment From hasriram@in.ibm.com 2017-06-08 06:08 EDT------- Issue is resolved with the latest snap2 kernel. # ./check tests/generic/390 [ 216.349894] XFS (loop0): Mounting V5 Filesystem [ 216.380231] XFS (loop0): Ending clean mount FSTYP -- xfs (non-debug) PLATFORM -- Linux/ppc64le alp4 3.10.0-675.el7.ppc64le MKFS_OPTIONS -- -f -bsize=4096 /dev/loop1 MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/loop1 /mnt/scratch [ 220.050102] XFS (loop1): Mounting V5 Filesystem [ 220.080232] XFS (loop1): Ending clean mount [ 220.589922] XFS (loop0): Unmounting Filesystem [ 222.249995] XFS (loop0): Mounting V5 Filesystem [ 222.289975] XFS (loop0): Ending clean mount generic/390 373s ...[ 222.930060] run fstests generic/390 at 2017-06-08 05:58:00 [ 227.269879] XFS (loop1): Unmounting Filesystem [ 229.550029] XFS (loop1): Mounting V5 Filesystem [ 229.580118] XFS (loop1): Ending clean mount 400s [ 624.069922] XFS (loop1): Unmounting Filesystem [ 625.950155] XFS (loop1): Mounting V5 Filesystem [ 625.999883] XFS (loop1): Ending clean mount Ran: generic/390 Passed all 1 tests [ 626.579853] XFS (loop0): Unmounting Filesystem [ 626.859853] XFS (loop1): Unmounting Filesystem # uname -a Linux alp4 3.10.0-675.el7.ppc64le #1 SMP Mon May 29 23:22:30 EDT 2017 ppc64le ppc64le ppc64le GNU/Linux Harish
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:1842