Bug 1383739 - BUG: Dentry ffff880232eeacc0{i=800fe1,n=f290} still in use (1)
Summary: BUG: Dentry ffff880232eeacc0{i=800fe1,n=f290} still in use (1)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.3
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Brian Foster
QA Contact: Murphy Zhou
URL:
Whiteboard:
: 1388434 1436407 1450283 (view as bug list)
Depends On:
Blocks: 1299988 1446211
TreeView+ depends on / blocked
 
Reported: 2016-10-11 16:07 UTC by Brian Foster
Modified: 2017-08-02 02:29 UTC (History)
6 users (show)

Fixed In Version: kernel-3.10.0-670.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-02 02:29:35 UTC
Target Upstream Version:


Attachments (Terms of Use)
dmesg log (3.56 KB, text/plain)
2017-05-13 01:31 UTC, IBM Bug Proxy
no flags Details
xmon log (4.40 KB, text/plain)
2017-05-13 01:31 UTC, IBM Bug Proxy
no flags Details
sosreport (10.23 MB, text/plain)
2017-05-13 01:32 UTC, IBM Bug Proxy
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:1842 normal SHIPPED_LIVE Important: kernel security, bug fix, and enhancement update 2017-08-01 18:22:09 UTC

Description Brian Foster 2016-10-11 16:07:03 UTC
I hit the following kernel BUG() when running an fsstress + parallel fsfreeze race test on 3.10.0-512.el7.x86_64:

 BUG: Dentry ffff880232eeacc0{i=800fe1,n=f290} still in use (1) [unmount of xfs dm-7]
 ------------[ cut here ]------------
 kernel BUG at fs/dcache.c:946!
 ...

The test procedure is to mkfs and mount a linear lvm volume and run the following in one shell:

<xfstests-dev>/ltp/fsstress -d /mnt -n 999999 -p 4

... and start the following loop in two others:

i=0
while [ true ]; do
        echo $i
        fsfreeze -f /mnt
        fsfreeze -u /mnt
        i=$((i + 1))
done

Note that the problem doesn't typically occur until a second freeze/unfreeze loop is started, at which point it occurs rather quickly in my tests. The problem occurs with XFS and ext4 (with a slightly different call stack), which suggests this may be a vfs issue.

Further details to follow...

Comment 1 Brian Foster 2016-10-11 16:07:57 UTC
This is the full bug report for XFS:

BUG: Dentry ffff880232eeacc0{i=800fe1,n=f290} still in use (1) [unmount of xfs dm-7]
------------[ cut here ]------------
kernel BUG at fs/dcache.c:946!
invalid opcode: 0000 [#1] SMP
Modules linked in: xfs libcrc32c xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter kvm_amd kvm ipmi_ssif irqbypass sg ipmi_devintf dcdbas ipmi_si amd64_edac_mod ipmi_msghandler shpchp edac_mce_amd sp5100_tco edac_core pcspkr acpi_power_meter i2c_piix4 k10temp nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif
 crct10dif_generic crct10dif_common uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ixgbe drm mdio ptp megaraid_sas mpt2sas pps_core serio_raw dca i2c_core raid_class scsi_transport_sas bnx2 fjes dm_mirror dm_region_hash dm_log dm_mod
CPU: 2 PID: 14592 Comm: xfs_io Not tainted 3.10.0-512.el7.x86_64 #1
Hardware name: Dell Inc. PowerEdge R715/0G2DP3, BIOS 2.3.0 10/18/2011
task: ffff88082fb53ec0 ti: ffff88082ab30000 task.ti: ffff88082ab30000
RIP: 0010:[<ffffffff812159dc>]  [<ffffffff812159dc>] shrink_dcache_for_umount_subtree+0x1ac/0x1c0
RSP: 0018:ffff88082ab33df0  EFLAGS: 00010246
RAX: 0000000000000054 RBX: ffff880232eeacc0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880237a4f838 RDI: ffff880237a4f838
RBP: ffff88082ab33e08 R08: 0000000000000096 R09: 00000000000005a1
R10: 0000000000000000 R11: ffff88082ab33af6 R12: ffff880232f7e6c0
R13: ffffffffa06b9240 R14: 00007ffcee5d987c R15: 0000000000000000
FS:  00007f32b7e1c740(0000) GS:ffff880237a40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000009c7008 CR3: 00000000bdbdb000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff880836f5b320 ffff880836f5b000 0000000000000083 ffff88082ab33e20
 ffffffff812175cf ffff880836f5b000 ffff88082ab33e48 ffffffff81200631
 ffff880187aa9380 0000000000000083 0000000000000000 ffff88082ab33e68
Call Trace:
 [<ffffffff812175cf>] shrink_dcache_for_umount+0x2f/0x60
 [<ffffffff81200631>] generic_shutdown_super+0x21/0xf0
 [<ffffffff81200ac7>] kill_block_super+0x27/0x70
 [<ffffffff81200e09>] deactivate_locked_super+0x49/0x60
 [<ffffffff81200e90>] thaw_super+0x70/0xb0
 [<ffffffff81211df1>] do_vfs_ioctl+0x211/0x4b0
 [<ffffffff812aea3e>] ? file_has_perm+0xae/0xc0
 [<ffffffff81212131>] SyS_ioctl+0xa1/0xc0
 [<ffffffff81696489>] system_call_fastpath+0x16/0x1b
Code: 00 00 48 8b 40 28 4c 8b 08 48 8b 43 30 48 85 c0 74 1b 48 8b 50 40 48 89 34 24 48 c7 c7 a0 25 8e 81 48 89 de 31 c0 e8 fa 99 46 00 <0f> 0b 31 d2 eb e5 0f 0b 66 90 66 2e 0f 1f 84 00 00 00 00 00 66
RIP  [<ffffffff812159dc>] shrink_dcache_for_umount_subtree+0x1ac/0x1c0
 RSP <ffff88082ab33df0>

Comment 2 Brian Foster 2016-10-11 16:08:34 UTC
This is the full bug report for ext4:

BUG: Dentry ffff8800bad60e40{i=0,n=dd90} still in use (1) [unmount of ext4 dm-6]
BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: [<ffffffffa02911d1>] find_group_orlov+0x301/0x450 [ext4]
PGD 637653067 PUD 632f73067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter kvm_amd kvm irqbypass sg ipmi_ssif ipmi_devintf dcdbas ipmi_si shpchp ipmi_msghandler pcspkr sp5100_tco amd64_edac_mod i2c_piix4 k10temp edac_mce_amd acpi_power_meter edac_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif crct10dif_generic
 uas crct10dif_common usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ixgbe ttm drm mdio ptp mpt2sas megaraid_sas pps_core serio_raw dca i2c_core raid_class scsi_transport_sas bnx2 fjes dm_mirror dm_region_hash dm_log dm_mod
CPU: 6 PID: 3092 Comm: fsstress Not tainted 3.10.0-512.el7.x86_64 #1
Hardware name: Dell Inc. PowerEdge R715/0G2DP3, BIOS 2.3.0 10/18/2011
task: ffff880833f78fb0 ti: ffff8806292cc000 task.ti: ffff8806292cc000
RIP: 0010:[<ffffffffa02911d1>]  [<ffffffffa02911d1>] find_group_orlov+0x301/0x450 [ext4]
RSP: 0018:ffff8806292cfd08  EFLAGS: 00010246
RAX: 0000000000000722 RBX: ffff8808374d4800 RCX: ffff88022dcf4150
RDX: 0000000000000000 RSI: ffff8808374d0000 RDI: 0000000000002000
RBP: ffff8806292cfda0 R08: ffff8800bad60e60 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000004000 R12: ffff8808374d4800
R13: 0000000000000005 R14: 00000000002230e1 R15: 0000000000000050
FS:  00007f9c157a0740(0000) GS:ffff880237ac0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000030 CR3: 00000004c52ee000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff8808374d4800 ffff8808374d4800 ffff8806292cfe04 000000000006d693
 0001f8a40009db35 ffff88022dcf4150 ffff8806292cfd90 0000001000002000
 ffff8806292cfd68 ffffffff81219250 ffff88022e01d578 00000000a81a32d9
Call Trace:
 [<ffffffff81219250>] ? alloc_inode+0x30/0xa0
 [<ffffffffa0292810>] __ext4_new_inode+0x6e0/0x12d0 [ext4]
 [<ffffffffa02a44ac>] ext4_mkdir+0x1bc/0x410 [ext4]
 [<ffffffff8120a1f7>] vfs_mkdir+0xb7/0x160
 [<ffffffff812101ff>] SyS_mkdirat+0x6f/0xe0
 [<ffffffff81210289>] SyS_mkdir+0x19/0x20
 [<ffffffff81696489>] system_call_fastpath+0x16/0x1b
Code: 33 0c 25 28 00 00 00 0f 85 67 01 00 00 48 83 c4 70 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 44 00 00 49 8b 54 24 60 48 8b 4d 90 <48> 39 4a 30 74 10 48 8b 51 80 f7 c2 00 00 02 00 0f 84 c6 fd ff
RIP  [<ffffffffa02911d1>] find_group_orlov+0x301/0x450 [ext4]
 RSP <ffff8806292cfd08>
CR2: 0000000000000030

Comment 4 Brian Foster 2017-03-29 14:07:03 UTC
This appears to be fixed upstream by commit 89f39af129 ("fs/super.c: fix race between freeze_super() and thaw_super()"). I'll post a backport after some wider testing.

Note that this is reproducible on-demand via xfstests generic/390.

Comment 5 Brian Foster 2017-03-29 16:49:28 UTC
*** Bug 1436407 has been marked as a duplicate of this bug. ***

Comment 6 Brian Foster 2017-03-29 16:49:57 UTC
*** Bug 1388434 has been marked as a duplicate of this bug. ***

Comment 8 Zorro Lang 2017-05-12 12:11:45 UTC
*** Bug 1450283 has been marked as a duplicate of this bug. ***

Comment 9 Zorro Lang 2017-05-12 12:19:06 UTC
IBM blamed this bug on bug 1450283. Due to we already had a bug fix (thanks Brian), so let's review it fast, and fix this bug in 7.4.

Thanks,
Zorro

Comment 10 IBM Bug Proxy 2017-05-13 01:31:41 UTC
Created attachment 1278371 [details]
dmesg log

Comment 11 IBM Bug Proxy 2017-05-13 01:31:43 UTC
Created attachment 1278372 [details]
xmon log

Comment 12 IBM Bug Proxy 2017-05-13 01:32:05 UTC
Created attachment 1278373 [details]
sosreport

Comment 13 Rafael Aquini 2017-05-19 23:22:31 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 15 Rafael Aquini 2017-05-22 13:54:18 UTC
Patch(es) available on kernel-3.10.0-670.el7

Comment 19 IBM Bug Proxy 2017-06-08 10:10:37 UTC
------- Comment From hasriram@in.ibm.com 2017-06-08 06:08 EDT-------
Issue is resolved with the latest snap2 kernel.

# ./check tests/generic/390
[  216.349894] XFS (loop0): Mounting V5 Filesystem
[  216.380231] XFS (loop0): Ending clean mount
FSTYP         -- xfs (non-debug)
PLATFORM      -- Linux/ppc64le alp4 3.10.0-675.el7.ppc64le
MKFS_OPTIONS  -- -f -bsize=4096 /dev/loop1
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/loop1 /mnt/scratch

[  220.050102] XFS (loop1): Mounting V5 Filesystem
[  220.080232] XFS (loop1): Ending clean mount
[  220.589922] XFS (loop0): Unmounting Filesystem
[  222.249995] XFS (loop0): Mounting V5 Filesystem
[  222.289975] XFS (loop0): Ending clean mount
generic/390 373s ...[  222.930060] run fstests generic/390 at 2017-06-08 05:58:00
[  227.269879] XFS (loop1): Unmounting Filesystem
[  229.550029] XFS (loop1): Mounting V5 Filesystem
[  229.580118] XFS (loop1): Ending clean mount
400s
[  624.069922] XFS (loop1): Unmounting Filesystem
[  625.950155] XFS (loop1): Mounting V5 Filesystem
[  625.999883] XFS (loop1): Ending clean mount
Ran: generic/390
Passed all 1 tests

[  626.579853] XFS (loop0): Unmounting Filesystem
[  626.859853] XFS (loop1): Unmounting Filesystem

# uname -a
Linux alp4 3.10.0-675.el7.ppc64le #1 SMP Mon May 29 23:22:30 EDT 2017 ppc64le ppc64le ppc64le GNU/Linux

Harish

Comment 21 errata-xmlrpc 2017-08-02 02:29:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:1842


Note You need to log in before you can comment on or make changes to this bug.