Bug 1383739
| Summary: | BUG: Dentry ffff880232eeacc0{i=800fe1,n=f290} still in use (1) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Brian Foster <bfoster> | ||||||||
| Component: | kernel | Assignee: | Brian Foster <bfoster> | ||||||||
| kernel sub component: | File Systems | QA Contact: | Murphy Zhou <xzhou> | ||||||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||||||
| Severity: | high | ||||||||||
| Priority: | high | CC: | bugproxy, eguan, hannsj_uhl, ross.zwisler, xzhou, zlang | ||||||||
| Version: | 7.3 | Keywords: | Patch | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | All | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | kernel-3.10.0-670.el7 | Doc Type: | If docs needed, set a value | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2017-08-02 02:29:35 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1299988, 1446211 | ||||||||||
| Attachments: |
|
||||||||||
This is the full bug report for XFS:
BUG: Dentry ffff880232eeacc0{i=800fe1,n=f290} still in use (1) [unmount of xfs dm-7]
------------[ cut here ]------------
kernel BUG at fs/dcache.c:946!
invalid opcode: 0000 [#1] SMP
Modules linked in: xfs libcrc32c xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter kvm_amd kvm ipmi_ssif irqbypass sg ipmi_devintf dcdbas ipmi_si amd64_edac_mod ipmi_msghandler shpchp edac_mce_amd sp5100_tco edac_core pcspkr acpi_power_meter i2c_piix4 k10temp nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif
crct10dif_generic crct10dif_common uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ixgbe drm mdio ptp megaraid_sas mpt2sas pps_core serio_raw dca i2c_core raid_class scsi_transport_sas bnx2 fjes dm_mirror dm_region_hash dm_log dm_mod
CPU: 2 PID: 14592 Comm: xfs_io Not tainted 3.10.0-512.el7.x86_64 #1
Hardware name: Dell Inc. PowerEdge R715/0G2DP3, BIOS 2.3.0 10/18/2011
task: ffff88082fb53ec0 ti: ffff88082ab30000 task.ti: ffff88082ab30000
RIP: 0010:[<ffffffff812159dc>] [<ffffffff812159dc>] shrink_dcache_for_umount_subtree+0x1ac/0x1c0
RSP: 0018:ffff88082ab33df0 EFLAGS: 00010246
RAX: 0000000000000054 RBX: ffff880232eeacc0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880237a4f838 RDI: ffff880237a4f838
RBP: ffff88082ab33e08 R08: 0000000000000096 R09: 00000000000005a1
R10: 0000000000000000 R11: ffff88082ab33af6 R12: ffff880232f7e6c0
R13: ffffffffa06b9240 R14: 00007ffcee5d987c R15: 0000000000000000
FS: 00007f32b7e1c740(0000) GS:ffff880237a40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000009c7008 CR3: 00000000bdbdb000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff880836f5b320 ffff880836f5b000 0000000000000083 ffff88082ab33e20
ffffffff812175cf ffff880836f5b000 ffff88082ab33e48 ffffffff81200631
ffff880187aa9380 0000000000000083 0000000000000000 ffff88082ab33e68
Call Trace:
[<ffffffff812175cf>] shrink_dcache_for_umount+0x2f/0x60
[<ffffffff81200631>] generic_shutdown_super+0x21/0xf0
[<ffffffff81200ac7>] kill_block_super+0x27/0x70
[<ffffffff81200e09>] deactivate_locked_super+0x49/0x60
[<ffffffff81200e90>] thaw_super+0x70/0xb0
[<ffffffff81211df1>] do_vfs_ioctl+0x211/0x4b0
[<ffffffff812aea3e>] ? file_has_perm+0xae/0xc0
[<ffffffff81212131>] SyS_ioctl+0xa1/0xc0
[<ffffffff81696489>] system_call_fastpath+0x16/0x1b
Code: 00 00 48 8b 40 28 4c 8b 08 48 8b 43 30 48 85 c0 74 1b 48 8b 50 40 48 89 34 24 48 c7 c7 a0 25 8e 81 48 89 de 31 c0 e8 fa 99 46 00 <0f> 0b 31 d2 eb e5 0f 0b 66 90 66 2e 0f 1f 84 00 00 00 00 00 66
RIP [<ffffffff812159dc>] shrink_dcache_for_umount_subtree+0x1ac/0x1c0
RSP <ffff88082ab33df0>
This is the full bug report for ext4:
BUG: Dentry ffff8800bad60e40{i=0,n=dd90} still in use (1) [unmount of ext4 dm-6]
BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: [<ffffffffa02911d1>] find_group_orlov+0x301/0x450 [ext4]
PGD 637653067 PUD 632f73067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter kvm_amd kvm irqbypass sg ipmi_ssif ipmi_devintf dcdbas ipmi_si shpchp ipmi_msghandler pcspkr sp5100_tco amd64_edac_mod i2c_piix4 k10temp edac_mce_amd acpi_power_meter edac_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif crct10dif_generic
uas crct10dif_common usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ixgbe ttm drm mdio ptp mpt2sas megaraid_sas pps_core serio_raw dca i2c_core raid_class scsi_transport_sas bnx2 fjes dm_mirror dm_region_hash dm_log dm_mod
CPU: 6 PID: 3092 Comm: fsstress Not tainted 3.10.0-512.el7.x86_64 #1
Hardware name: Dell Inc. PowerEdge R715/0G2DP3, BIOS 2.3.0 10/18/2011
task: ffff880833f78fb0 ti: ffff8806292cc000 task.ti: ffff8806292cc000
RIP: 0010:[<ffffffffa02911d1>] [<ffffffffa02911d1>] find_group_orlov+0x301/0x450 [ext4]
RSP: 0018:ffff8806292cfd08 EFLAGS: 00010246
RAX: 0000000000000722 RBX: ffff8808374d4800 RCX: ffff88022dcf4150
RDX: 0000000000000000 RSI: ffff8808374d0000 RDI: 0000000000002000
RBP: ffff8806292cfda0 R08: ffff8800bad60e60 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000004000 R12: ffff8808374d4800
R13: 0000000000000005 R14: 00000000002230e1 R15: 0000000000000050
FS: 00007f9c157a0740(0000) GS:ffff880237ac0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000030 CR3: 00000004c52ee000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff8808374d4800 ffff8808374d4800 ffff8806292cfe04 000000000006d693
0001f8a40009db35 ffff88022dcf4150 ffff8806292cfd90 0000001000002000
ffff8806292cfd68 ffffffff81219250 ffff88022e01d578 00000000a81a32d9
Call Trace:
[<ffffffff81219250>] ? alloc_inode+0x30/0xa0
[<ffffffffa0292810>] __ext4_new_inode+0x6e0/0x12d0 [ext4]
[<ffffffffa02a44ac>] ext4_mkdir+0x1bc/0x410 [ext4]
[<ffffffff8120a1f7>] vfs_mkdir+0xb7/0x160
[<ffffffff812101ff>] SyS_mkdirat+0x6f/0xe0
[<ffffffff81210289>] SyS_mkdir+0x19/0x20
[<ffffffff81696489>] system_call_fastpath+0x16/0x1b
Code: 33 0c 25 28 00 00 00 0f 85 67 01 00 00 48 83 c4 70 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 44 00 00 49 8b 54 24 60 48 8b 4d 90 <48> 39 4a 30 74 10 48 8b 51 80 f7 c2 00 00 02 00 0f 84 c6 fd ff
RIP [<ffffffffa02911d1>] find_group_orlov+0x301/0x450 [ext4]
RSP <ffff8806292cfd08>
CR2: 0000000000000030
This appears to be fixed upstream by commit 89f39af129 ("fs/super.c: fix race between freeze_super() and thaw_super()"). I'll post a backport after some wider testing.
Note that this is reproducible on-demand via xfstests generic/390.
*** Bug 1436407 has been marked as a duplicate of this bug. *** *** Bug 1388434 has been marked as a duplicate of this bug. *** *** Bug 1450283 has been marked as a duplicate of this bug. *** IBM blamed this bug on bug 1450283. Due to we already had a bug fix (thanks Brian), so let's review it fast, and fix this bug in 7.4. Thanks, Zorro Created attachment 1278371 [details]
dmesg log
Created attachment 1278372 [details]
xmon log
Created attachment 1278373 [details]
sosreport
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing Patch(es) available on kernel-3.10.0-670.el7 ------- Comment From hasriram.com 2017-06-08 06:08 EDT------- Issue is resolved with the latest snap2 kernel. # ./check tests/generic/390 [ 216.349894] XFS (loop0): Mounting V5 Filesystem [ 216.380231] XFS (loop0): Ending clean mount FSTYP -- xfs (non-debug) PLATFORM -- Linux/ppc64le alp4 3.10.0-675.el7.ppc64le MKFS_OPTIONS -- -f -bsize=4096 /dev/loop1 MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/loop1 /mnt/scratch [ 220.050102] XFS (loop1): Mounting V5 Filesystem [ 220.080232] XFS (loop1): Ending clean mount [ 220.589922] XFS (loop0): Unmounting Filesystem [ 222.249995] XFS (loop0): Mounting V5 Filesystem [ 222.289975] XFS (loop0): Ending clean mount generic/390 373s ...[ 222.930060] run fstests generic/390 at 2017-06-08 05:58:00 [ 227.269879] XFS (loop1): Unmounting Filesystem [ 229.550029] XFS (loop1): Mounting V5 Filesystem [ 229.580118] XFS (loop1): Ending clean mount 400s [ 624.069922] XFS (loop1): Unmounting Filesystem [ 625.950155] XFS (loop1): Mounting V5 Filesystem [ 625.999883] XFS (loop1): Ending clean mount Ran: generic/390 Passed all 1 tests [ 626.579853] XFS (loop0): Unmounting Filesystem [ 626.859853] XFS (loop1): Unmounting Filesystem # uname -a Linux alp4 3.10.0-675.el7.ppc64le #1 SMP Mon May 29 23:22:30 EDT 2017 ppc64le ppc64le ppc64le GNU/Linux Harish Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:1842 |
I hit the following kernel BUG() when running an fsstress + parallel fsfreeze race test on 3.10.0-512.el7.x86_64: BUG: Dentry ffff880232eeacc0{i=800fe1,n=f290} still in use (1) [unmount of xfs dm-7] ------------[ cut here ]------------ kernel BUG at fs/dcache.c:946! ... The test procedure is to mkfs and mount a linear lvm volume and run the following in one shell: <xfstests-dev>/ltp/fsstress -d /mnt -n 999999 -p 4 ... and start the following loop in two others: i=0 while [ true ]; do echo $i fsfreeze -f /mnt fsfreeze -u /mnt i=$((i + 1)) done Note that the problem doesn't typically occur until a second freeze/unfreeze loop is started, at which point it occurs rather quickly in my tests. The problem occurs with XFS and ext4 (with a slightly different call stack), which suggests this may be a vfs issue. Further details to follow...