Bug 981741 - BUG on dentry still in use when unmounting fuse
BUG on dentry still in use when unmounting fuse
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.4
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Niels de Vos
Zorro Lang
: Patch, Regression, TestCaseProvided, ZStream
: 1031614 (view as bug list)
Depends On:
Blocks: 988312 988708
  Show dependency treegraph
 
Reported: 2013-07-05 12:44 EDT by Niels de Vos
Modified: 2014-01-13 22:36 EST (History)
12 users (show)

See Also:
Fixed In Version: kernel-2.6.32-408.el6
Doc Type: Bug Fix
Doc Text:
A dentry leak occurred in the FUSE code when, after a negative lookup, a negative dentry was neither dropped nor was the reference counter of the dentry decremented. This triggered a BUG() macro when unmounting a FUSE subtree containing the dentry, resulting in a kernel panic. A series of patches related to this problem has been applied to the FUSE code and negative dentries are now properly dropped so that triggering the BUG() macro is now avoided.
Story Points: ---
Clone Of:
: 988312 (view as bug list)
Environment:
Last Closed: 2013-11-21 14:24:57 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Proposed patch (838 bytes, patch)
2013-07-05 12:53 EDT, Niels de Vos
ndevos: review-
Details | Diff
Disable readdirplus for testing (482 bytes, patch)
2013-07-15 03:27 EDT, Niels de Vos
no flags Details | Diff
fix the dentry leak (402 bytes, patch)
2013-07-15 05:41 EDT, Niels de Vos
no flags Details | Diff

  None (edit)
Description Niels de Vos 2013-07-05 12:44:26 EDT
Description of problem:

While running the GlusterFS testsuite on RHEL-6.4 the kernel panics reliably:

<3>BUG: Dentry ffff8801eb4e9800{i=0,n=files9995} still in use (1) [unmount of fuse fuse]
<4>------------[ cut here ]------------
<2>kernel BUG at fs/dcache.c:670!
<4>invalid opcode: 0000 [#1] SMP 
<4>last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/infiniband/mlx4_0/node_guid
<4>CPU 3 
<4>Modules linked in: xfs exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 fuse r8169 mii sg serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support shpchp snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core xhci_hcd ext4 mbcache jbd2 sd_mod crc_t10dif ahci i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>
<4>Pid: 16277, comm: umount Not tainted 2.6.32-358.11.1.el6.x86_64 #1 System manufacturer System Product Name/P8Z77-V LX2
<4>RIP: 0010:[<ffffffff8119a9d8>]  [<ffffffff8119a9d8>] shrink_dcache_for_umount_subtree+0x2a8/0x2b0
<4>RSP: 0018:ffff8801f71dfdb8  EFLAGS: 00010292
<4>RAX: 000000000000005c RBX: ffff8801eb4e9800 RCX: 000000000000f21b
<4>RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
<4>RBP: ffff8801f71dfdf8 R08: 0000000000000000 R09: 0000000000000001
<4>R10: ffffffff81641bc0 R11: ffff880217537b8b R12: 0000000000000d0b
<4>R13: ffffffff81a83fc0 R14: ffff8801b3f58780 R15: ffff8801eb4e9860
<4>FS:  00007f66d09e4740(0000) GS:ffff88002c380000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>CR2: 00007f66d004b360 CR3: 00000001af6e6000 CR4: 00000000000407e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process umount (pid: 16277, threadinfo ffff8801f71de000, task ffff8801f7364040)
<4>Stack:
<4> ffff880204cf5e70 ffff8801f7364040 0000000000000015 ffff880204cf5c00
<4><d> ffffffffa034c200 ffff8801f7102b38 ffff880204cf5c00 ffff88021c1ee380
<4><d> ffff8801f71dfe18 ffffffff8119aa16 0000000000000000 ffff880204cf5c00
<4>Call Trace:
<4> [<ffffffff8119aa16>] shrink_dcache_for_umount+0x36/0x60
<4> [<ffffffff8118336f>] generic_shutdown_super+0x1f/0xe0
<4> [<ffffffff81183496>] kill_anon_super+0x16/0x60
<4> [<ffffffffa03495d2>] fuse_kill_sb_anon+0x52/0x60 [fuse]
<4> [<ffffffff81183c37>] deactivate_super+0x57/0x80
<4> [<ffffffff811a1c2f>] mntput_no_expire+0xbf/0x110
<4> [<ffffffff811a269b>] sys_umount+0x7b/0x3a0
<4> [<ffffffff810dc847>] ? audit_syscall_entry+0x1d7/0x200
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>Code: 50 30 4c 8b 0a 31 d2 48 85 f6 74 04 48 8b 56 40 48 05 70 02 00 00 48 89 de 48 c7 c7 80 39 7b 81 48 89 04 24 31 c0 e8 e8 2b 37 00 <0f> 0b eb fe 0f 0b eb fe 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 
<1>RIP  [<ffffffff8119a9d8>] shrink_dcache_for_umount_subtree+0x2a8/0x2b0
<4> RSP <ffff8801f71dfdb8>


Version-Release number of selected component (if applicable):
kernel-2.6.32-358.11.1.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. follow the instructions on http://www.gluster.org/community/documentation/index.php/Using_the_Gluster_Test_Framework
2. run the test like: prove -rf --timer $(dirname $0)/tests/bugs

Actual results:
Kernel panic (on RHEL), BUG+calltrace on Fedora.

Expected results:
No panic/BUG.

Additional info:
Comment 2 Niels de Vos 2013-07-05 12:53:58 EDT
Created attachment 769332 [details]
Proposed patch

jclift tested this patch successfully on RHEL-6.4.
Comment 4 Niels de Vos 2013-07-08 05:19:48 EDT
Comment on attachment 769332 [details]
Proposed patch

This patch does not correctly decrease the sb->s_active counter. This makes it impossible to unload the module after using it. Testing some variations now.
Comment 5 Niels de Vos 2013-07-15 03:27:06 EDT
Created attachment 773572 [details]
Disable readdirplus for testing

When I run the tests and the fuse-module does not support readdirplus, I can not reproduce the crashes. This narrows down the search for the cause considerably.

I'll read through the code over the next few days, do some further testing and see if there is anything obvious.
Comment 6 Niels de Vos 2013-07-15 05:41:38 EDT
Created attachment 773675 [details]
fix the dentry leak

There is a dentry leak in case d_lookup() returned a dentry that does not have a valid d_inode set. The attached patch fixes it for me.

Doing some further verification tests before posing upstream for review.
Comment 7 Niels de Vos 2013-07-15 09:26:02 EDT
This regression was introduced with the new READDIRPLUS support in fuse.

In order to hit the BUG() (which results in a kernel panic on RHEL), some
stressing of the VFS and the fuse mount seems needed. The GlusterFS tests make
a reliable reproducer:
 - http://www.gluster.org/community/documentation/index.php/Using_the_Gluster_Test_Framework

After some stressing of the VFS and fuse mountpoints, bug-860663.t will hit
the BUG(). It does not happen on running this test stand-alone.

Patch posted upstream for review:
- https://lkml.org/lkml/2013/7/15/203
Comment 8 RHEL Product and Program Management 2013-07-15 09:37:42 EDT
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.
Comment 14 Niels de Vos 2013-08-01 04:03:00 EDT
RHEL-6 test-packages and the upstream patch can be found here:
- http://people.redhat.com/ndevos/bz981741/
Comment 17 Rafael Aquini 2013-08-07 11:48:39 EDT
Patch(es) available on kernel-2.6.32-408.el6
Comment 22 errata-xmlrpc 2013-11-21 14:24:57 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-1645.html
Comment 23 Raghavendra Talur 2013-12-23 04:22:12 EST
*** Bug 1031614 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.