Red Hat Bugzilla – Bug 981741
BUG on dentry still in use when unmounting fuse
Last modified: 2014-01-13 22:36:12 EST
Description of problem: While running the GlusterFS testsuite on RHEL-6.4 the kernel panics reliably: <3>BUG: Dentry ffff8801eb4e9800{i=0,n=files9995} still in use (1) [unmount of fuse fuse] <4>------------[ cut here ]------------ <2>kernel BUG at fs/dcache.c:670! <4>invalid opcode: 0000 [#1] SMP <4>last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/infiniband/mlx4_0/node_guid <4>CPU 3 <4>Modules linked in: xfs exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 fuse r8169 mii sg serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support shpchp snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core xhci_hcd ext4 mbcache jbd2 sd_mod crc_t10dif ahci i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4> <4>Pid: 16277, comm: umount Not tainted 2.6.32-358.11.1.el6.x86_64 #1 System manufacturer System Product Name/P8Z77-V LX2 <4>RIP: 0010:[<ffffffff8119a9d8>] [<ffffffff8119a9d8>] shrink_dcache_for_umount_subtree+0x2a8/0x2b0 <4>RSP: 0018:ffff8801f71dfdb8 EFLAGS: 00010292 <4>RAX: 000000000000005c RBX: ffff8801eb4e9800 RCX: 000000000000f21b <4>RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246 <4>RBP: ffff8801f71dfdf8 R08: 0000000000000000 R09: 0000000000000001 <4>R10: ffffffff81641bc0 R11: ffff880217537b8b R12: 0000000000000d0b <4>R13: ffffffff81a83fc0 R14: ffff8801b3f58780 R15: ffff8801eb4e9860 <4>FS: 00007f66d09e4740(0000) GS:ffff88002c380000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>CR2: 00007f66d004b360 CR3: 00000001af6e6000 CR4: 00000000000407e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process umount (pid: 16277, threadinfo ffff8801f71de000, task ffff8801f7364040) <4>Stack: <4> ffff880204cf5e70 ffff8801f7364040 0000000000000015 ffff880204cf5c00 <4><d> ffffffffa034c200 ffff8801f7102b38 ffff880204cf5c00 ffff88021c1ee380 <4><d> ffff8801f71dfe18 ffffffff8119aa16 0000000000000000 ffff880204cf5c00 <4>Call Trace: <4> [<ffffffff8119aa16>] shrink_dcache_for_umount+0x36/0x60 <4> [<ffffffff8118336f>] generic_shutdown_super+0x1f/0xe0 <4> [<ffffffff81183496>] kill_anon_super+0x16/0x60 <4> [<ffffffffa03495d2>] fuse_kill_sb_anon+0x52/0x60 [fuse] <4> [<ffffffff81183c37>] deactivate_super+0x57/0x80 <4> [<ffffffff811a1c2f>] mntput_no_expire+0xbf/0x110 <4> [<ffffffff811a269b>] sys_umount+0x7b/0x3a0 <4> [<ffffffff810dc847>] ? audit_syscall_entry+0x1d7/0x200 <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b <4>Code: 50 30 4c 8b 0a 31 d2 48 85 f6 74 04 48 8b 56 40 48 05 70 02 00 00 48 89 de 48 c7 c7 80 39 7b 81 48 89 04 24 31 c0 e8 e8 2b 37 00 <0f> 0b eb fe 0f 0b eb fe 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 <1>RIP [<ffffffff8119a9d8>] shrink_dcache_for_umount_subtree+0x2a8/0x2b0 <4> RSP <ffff8801f71dfdb8> Version-Release number of selected component (if applicable): kernel-2.6.32-358.11.1.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. follow the instructions on http://www.gluster.org/community/documentation/index.php/Using_the_Gluster_Test_Framework 2. run the test like: prove -rf --timer $(dirname $0)/tests/bugs Actual results: Kernel panic (on RHEL), BUG+calltrace on Fedora. Expected results: No panic/BUG. Additional info:
Created attachment 769332 [details] Proposed patch jclift tested this patch successfully on RHEL-6.4.
Comment on attachment 769332 [details] Proposed patch This patch does not correctly decrease the sb->s_active counter. This makes it impossible to unload the module after using it. Testing some variations now.
Created attachment 773572 [details] Disable readdirplus for testing When I run the tests and the fuse-module does not support readdirplus, I can not reproduce the crashes. This narrows down the search for the cause considerably. I'll read through the code over the next few days, do some further testing and see if there is anything obvious.
Created attachment 773675 [details] fix the dentry leak There is a dentry leak in case d_lookup() returned a dentry that does not have a valid d_inode set. The attached patch fixes it for me. Doing some further verification tests before posing upstream for review.
This regression was introduced with the new READDIRPLUS support in fuse. In order to hit the BUG() (which results in a kernel panic on RHEL), some stressing of the VFS and the fuse mount seems needed. The GlusterFS tests make a reliable reproducer: - http://www.gluster.org/community/documentation/index.php/Using_the_Gluster_Test_Framework After some stressing of the VFS and fuse mountpoints, bug-860663.t will hit the BUG(). It does not happen on running this test stand-alone. Patch posted upstream for review: - https://lkml.org/lkml/2013/7/15/203
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
RHEL-6 test-packages and the upstream patch can be found here: - http://people.redhat.com/ndevos/bz981741/
Patch(es) available on kernel-2.6.32-408.el6
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-1645.html
*** Bug 1031614 has been marked as a duplicate of this bug. ***