1383739 – BUG: Dentry ffff880232eeacc0{i=800fe1,n=f290} still in use (1)

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1383739 - BUG: Dentry ffff880232eeacc0{i=800fe1,n=f290} still in use (1)

Summary: BUG: Dentry ffff880232eeacc0{i=800fe1,n=f290} still in use (1)

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	7.3
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Brian Foster
QA Contact:	Murphy Zhou
Docs Contact:
URL:
Whiteboard:
Duplicates (3):	1388434 1436407 1450283 (view as bug list)
Depends On:
Blocks:	1299988 1446211
TreeView+	depends on / blocked

Reported:	2016-10-11 16:07 UTC by Brian Foster
Modified:	2017-08-02 02:29 UTC (History)
CC List:	6 users (show)
Fixed In Version:	kernel-3.10.0-670.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-02 02:29:35 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
dmesg log (3.56 KB, text/plain) 2017-05-13 01:31 UTC, IBM Bug Proxy	no flags	Details
xmon log (4.40 KB, text/plain) 2017-05-13 01:31 UTC, IBM Bug Proxy	no flags	Details
sosreport (10.23 MB, text/plain) 2017-05-13 01:32 UTC, IBM Bug Proxy	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:1842	0	normal	SHIPPED_LIVE	Important: kernel security, bug fix, and enhancement update	2017-08-01 18:22:09 UTC

Description Brian Foster 2016-10-11 16:07:03 UTC

I hit the following kernel BUG() when running an fsstress + parallel fsfreeze race test on 3.10.0-512.el7.x86_64:

 BUG: Dentry ffff880232eeacc0{i=800fe1,n=f290} still in use (1) [unmount of xfs dm-7]
 ------------[ cut here ]------------
 kernel BUG at fs/dcache.c:946!
 ...

The test procedure is to mkfs and mount a linear lvm volume and run the following in one shell:

<xfstests-dev>/ltp/fsstress -d /mnt -n 999999 -p 4

... and start the following loop in two others:

i=0
while [ true ]; do
        echo $i
        fsfreeze -f /mnt
        fsfreeze -u /mnt
        i=$((i + 1))
done

Note that the problem doesn't typically occur until a second freeze/unfreeze loop is started, at which point it occurs rather quickly in my tests. The problem occurs with XFS and ext4 (with a slightly different call stack), which suggests this may be a vfs issue.

Further details to follow...

Comment 1 Brian Foster 2016-10-11 16:07:57 UTC

This is the full bug report for XFS:

BUG: Dentry ffff880232eeacc0{i=800fe1,n=f290} still in use (1) [unmount of xfs dm-7]
------------[ cut here ]------------
kernel BUG at fs/dcache.c:946!
invalid opcode: 0000 [#1] SMP
Modules linked in: xfs libcrc32c xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter kvm_amd kvm ipmi_ssif irqbypass sg ipmi_devintf dcdbas ipmi_si amd64_edac_mod ipmi_msghandler shpchp edac_mce_amd sp5100_tco edac_core pcspkr acpi_power_meter i2c_piix4 k10temp nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif
 crct10dif_generic crct10dif_common uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ixgbe drm mdio ptp megaraid_sas mpt2sas pps_core serio_raw dca i2c_core raid_class scsi_transport_sas bnx2 fjes dm_mirror dm_region_hash dm_log dm_mod
CPU: 2 PID: 14592 Comm: xfs_io Not tainted 3.10.0-512.el7.x86_64 #1
Hardware name: Dell Inc. PowerEdge R715/0G2DP3, BIOS 2.3.0 10/18/2011
task: ffff88082fb53ec0 ti: ffff88082ab30000 task.ti: ffff88082ab30000
RIP: 0010:[<ffffffff812159dc>]  [<ffffffff812159dc>] shrink_dcache_for_umount_subtree+0x1ac/0x1c0
RSP: 0018:ffff88082ab33df0  EFLAGS: 00010246
RAX: 0000000000000054 RBX: ffff880232eeacc0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880237a4f838 RDI: ffff880237a4f838
RBP: ffff88082ab33e08 R08: 0000000000000096 R09: 00000000000005a1
R10: 0000000000000000 R11: ffff88082ab33af6 R12: ffff880232f7e6c0
R13: ffffffffa06b9240 R14: 00007ffcee5d987c R15: 0000000000000000
FS:  00007f32b7e1c740(0000) GS:ffff880237a40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000009c7008 CR3: 00000000bdbdb000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff880836f5b320 ffff880836f5b000 0000000000000083 ffff88082ab33e20
 ffffffff812175cf ffff880836f5b000 ffff88082ab33e48 ffffffff81200631
 ffff880187aa9380 0000000000000083 0000000000000000 ffff88082ab33e68
Call Trace:
 [<ffffffff812175cf>] shrink_dcache_for_umount+0x2f/0x60
 [<ffffffff81200631>] generic_shutdown_super+0x21/0xf0
 [<ffffffff81200ac7>] kill_block_super+0x27/0x70
 [<ffffffff81200e09>] deactivate_locked_super+0x49/0x60
 [<ffffffff81200e90>] thaw_super+0x70/0xb0
 [<ffffffff81211df1>] do_vfs_ioctl+0x211/0x4b0
 [<ffffffff812aea3e>] ? file_has_perm+0xae/0xc0
 [<ffffffff81212131>] SyS_ioctl+0xa1/0xc0
 [<ffffffff81696489>] system_call_fastpath+0x16/0x1b
Code: 00 00 48 8b 40 28 4c 8b 08 48 8b 43 30 48 85 c0 74 1b 48 8b 50 40 48 89 34 24 48 c7 c7 a0 25 8e 81 48 89 de 31 c0 e8 fa 99 46 00 <0f> 0b 31 d2 eb e5 0f 0b 66 90 66 2e 0f 1f 84 00 00 00 00 00 66
RIP  [<ffffffff812159dc>] shrink_dcache_for_umount_subtree+0x1ac/0x1c0
 RSP <ffff88082ab33df0>

Comment 2 Brian Foster 2016-10-11 16:08:34 UTC

This is the full bug report for ext4:

BUG: Dentry ffff8800bad60e40{i=0,n=dd90} still in use (1) [unmount of ext4 dm-6]
BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: [<ffffffffa02911d1>] find_group_orlov+0x301/0x450 [ext4]
PGD 637653067 PUD 632f73067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter kvm_amd kvm irqbypass sg ipmi_ssif ipmi_devintf dcdbas ipmi_si shpchp ipmi_msghandler pcspkr sp5100_tco amd64_edac_mod i2c_piix4 k10temp edac_mce_amd acpi_power_meter edac_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif crct10dif_generic
 uas crct10dif_common usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ixgbe ttm drm mdio ptp mpt2sas megaraid_sas pps_core serio_raw dca i2c_core raid_class scsi_transport_sas bnx2 fjes dm_mirror dm_region_hash dm_log dm_mod
CPU: 6 PID: 3092 Comm: fsstress Not tainted 3.10.0-512.el7.x86_64 #1
Hardware name: Dell Inc. PowerEdge R715/0G2DP3, BIOS 2.3.0 10/18/2011
task: ffff880833f78fb0 ti: ffff8806292cc000 task.ti: ffff8806292cc000
RIP: 0010:[<ffffffffa02911d1>]  [<ffffffffa02911d1>] find_group_orlov+0x301/0x450 [ext4]
RSP: 0018:ffff8806292cfd08  EFLAGS: 00010246
RAX: 0000000000000722 RBX: ffff8808374d4800 RCX: ffff88022dcf4150
RDX: 0000000000000000 RSI: ffff8808374d0000 RDI: 0000000000002000
RBP: ffff8806292cfda0 R08: ffff8800bad60e60 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000004000 R12: ffff8808374d4800
R13: 0000000000000005 R14: 00000000002230e1 R15: 0000000000000050
FS:  00007f9c157a0740(0000) GS:ffff880237ac0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000030 CR3: 00000004c52ee000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff8808374d4800 ffff8808374d4800 ffff8806292cfe04 000000000006d693
 0001f8a40009db35 ffff88022dcf4150 ffff8806292cfd90 0000001000002000
 ffff8806292cfd68 ffffffff81219250 ffff88022e01d578 00000000a81a32d9
Call Trace:
 [<ffffffff81219250>] ? alloc_inode+0x30/0xa0
 [<ffffffffa0292810>] __ext4_new_inode+0x6e0/0x12d0 [ext4]
 [<ffffffffa02a44ac>] ext4_mkdir+0x1bc/0x410 [ext4]
 [<ffffffff8120a1f7>] vfs_mkdir+0xb7/0x160
 [<ffffffff812101ff>] SyS_mkdirat+0x6f/0xe0
 [<ffffffff81210289>] SyS_mkdir+0x19/0x20
 [<ffffffff81696489>] system_call_fastpath+0x16/0x1b
Code: 33 0c 25 28 00 00 00 0f 85 67 01 00 00 48 83 c4 70 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 44 00 00 49 8b 54 24 60 48 8b 4d 90 <48> 39 4a 30 74 10 48 8b 51 80 f7 c2 00 00 02 00 0f 84 c6 fd ff
RIP  [<ffffffffa02911d1>] find_group_orlov+0x301/0x450 [ext4]
 RSP <ffff8806292cfd08>
CR2: 0000000000000030

Comment 4 Brian Foster 2017-03-29 14:07:03 UTC

This appears to be fixed upstream by commit 89f39af129 ("fs/super.c: fix race between freeze_super() and thaw_super()"). I'll post a backport after some wider testing.

Note that this is reproducible on-demand via xfstests generic/390.

Comment 5 Brian Foster 2017-03-29 16:49:28 UTC

*** Bug 1436407 has been marked as a duplicate of this bug. ***

Comment 6 Brian Foster 2017-03-29 16:49:57 UTC

*** Bug 1388434 has been marked as a duplicate of this bug. ***

Comment 8 Zorro Lang 2017-05-12 12:11:45 UTC

*** Bug 1450283 has been marked as a duplicate of this bug. ***

Comment 9 Zorro Lang 2017-05-12 12:19:06 UTC

IBM blamed this bug on bug 1450283. Due to we already had a bug fix (thanks Brian), so let's review it fast, and fix this bug in 7.4.

Thanks,
Zorro

Comment 10 IBM Bug Proxy 2017-05-13 01:31:41 UTC

Created attachment 1278371 [details]
dmesg log

Comment 11 IBM Bug Proxy 2017-05-13 01:31:43 UTC

Created attachment 1278372 [details]
xmon log

Comment 12 IBM Bug Proxy 2017-05-13 01:32:05 UTC

Created attachment 1278373 [details]
sosreport

Comment 13 Rafael Aquini 2017-05-19 23:22:31 UTC

Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 15 Rafael Aquini 2017-05-22 13:54:18 UTC

Patch(es) available on kernel-3.10.0-670.el7

Comment 19 IBM Bug Proxy 2017-06-08 10:10:37 UTC

------- Comment From hasriram.com 2017-06-08 06:08 EDT-------
Issue is resolved with the latest snap2 kernel.

# ./check tests/generic/390
[  216.349894] XFS (loop0): Mounting V5 Filesystem
[  216.380231] XFS (loop0): Ending clean mount
FSTYP         -- xfs (non-debug)
PLATFORM      -- Linux/ppc64le alp4 3.10.0-675.el7.ppc64le
MKFS_OPTIONS  -- -f -bsize=4096 /dev/loop1
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/loop1 /mnt/scratch

[  220.050102] XFS (loop1): Mounting V5 Filesystem
[  220.080232] XFS (loop1): Ending clean mount
[  220.589922] XFS (loop0): Unmounting Filesystem
[  222.249995] XFS (loop0): Mounting V5 Filesystem
[  222.289975] XFS (loop0): Ending clean mount
generic/390 373s ...[  222.930060] run fstests generic/390 at 2017-06-08 05:58:00
[  227.269879] XFS (loop1): Unmounting Filesystem
[  229.550029] XFS (loop1): Mounting V5 Filesystem
[  229.580118] XFS (loop1): Ending clean mount
400s
[  624.069922] XFS (loop1): Unmounting Filesystem
[  625.950155] XFS (loop1): Mounting V5 Filesystem
[  625.999883] XFS (loop1): Ending clean mount
Ran: generic/390
Passed all 1 tests

[  626.579853] XFS (loop0): Unmounting Filesystem
[  626.859853] XFS (loop1): Unmounting Filesystem

# uname -a
Linux alp4 3.10.0-675.el7.ppc64le #1 SMP Mon May 29 23:22:30 EDT 2017 ppc64le ppc64le ppc64le GNU/Linux

Harish

Comment 21 errata-xmlrpc 2017-08-02 02:29:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:1842

Note You need to log in before you can comment on or make changes to this bug.