Bug 956034 - rm is blocked on the fuse mount while trying to remove directories and files from the FUSE and NFS mount simultaneously
Summary: rm is blocked on the fuse mount while trying to remove directories and files ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.1
Hardware: x86_64
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Pranith Kumar K
QA Contact: Rahul Hinduja
URL:
Whiteboard:
Depends On:
Blocks: 965987
TreeView+ depends on / blocked
 
Reported: 2013-04-24 08:21 UTC by Rahul Hinduja
Modified: 2013-09-23 22:35 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.4.0.12rhs-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 965987 (view as bug list)
Environment:
Last Closed: 2013-09-23 22:35:21 UTC
Embargoed:


Attachments (Terms of Use)

Description Rahul Hinduja 2013-04-24 08:21:07 UTC
Description of problem:
=======================

Created multiple files and directories on volume from fuse and nfs mount. After a while tried to delete the files and directories simultaneously from fuse and nfs mount with broadcast command "rm -rvf *"

Noticed that FUSE mount is hung and can not come out of it, dmesg confirmed that rm on fuse is blocked. 

dmesg shows the following:
=========================

INFO: task rm:12156 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rm            D 0000000000000001     0 12156  14341 0x00000084
 ffff88011bf93db8 0000000000000082 0000000000000000 ffffffff8104d1c9
 ffff88011bf93d48 0000000300000000 ffff88011bf93d58 ffff880037b8cc40
 ffff880118f37098 ffff88011bf93fd8 000000000000fb88 ffff880118f37098
Call Trace:
 [<ffffffff8104d1c9>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00f3075>] fuse_request_send+0xe5/0x290 [fuse]
 [<ffffffff81090c00>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa00f5385>] fuse_rmdir+0x85/0x110 [fuse]
 [<ffffffff811841fd>] vfs_rmdir+0xbd/0xf0
 [<ffffffff81182e4a>] ? lookup_hash+0x3a/0x50
 [<ffffffff81186793>] do_rmdir+0x103/0x120
 [<ffffffff81176922>] ? vfs_write+0x132/0x1a0
 [<ffffffff810d4027>] ? audit_syscall_entry+0x1d7/0x200
 [<ffffffff811867dd>] sys_unlinkat+0x2d/0x40
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
[root@darrel nfs-923538]# 



Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.4.0.1rhs-1.el6rhs.x86_64



Steps to Reproduce:
1. Created 1*2 volume 
2. Mounted this volume using NFS and FUSE on same client.
3. Ran script self_heal_all_file_type_script1.sh from FUSE mount and self_heal_all_file_type_script2.sh from NFS Mount. Both the script does the creation of files and directories
4. While the above script execution was in progress, replaced one of the brick using "replace-brick commit force"
5. Started heal using "gluster volume heal vol-name full"
6. Heal is successful and arequal confirms the checksum match.
7. Tried to remove the files and directories at the same time from fuse and nfs mount using broadcast command "rm -rvf *"
8. Fuse mount hung. Couldn't come out of the hung and any further access to the mount from different session also hungs. dmesg shows the above call trace
  
Actual results:
===============


INFO: task rm:12156 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rm            D 0000000000000001     0 12156  14341 0x00000084
 ffff88011bf93db8 0000000000000082 0000000000000000 ffffffff8104d1c9
 ffff88011bf93d48 0000000300000000 ffff88011bf93d58 ffff880037b8cc40
 ffff880118f37098 ffff88011bf93fd8 000000000000fb88 ffff880118f37098
Call Trace:
 [<ffffffff8104d1c9>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00f3075>] fuse_request_send+0xe5/0x290 [fuse]
 [<ffffffff81090c00>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa00f5385>] fuse_rmdir+0x85/0x110 [fuse]
 [<ffffffff811841fd>] vfs_rmdir+0xbd/0xf0
 [<ffffffff81182e4a>] ? lookup_hash+0x3a/0x50
 [<ffffffff81186793>] do_rmdir+0x103/0x120
 [<ffffffff81176922>] ? vfs_write+0x132/0x1a0
 [<ffffffff810d4027>] ? audit_syscall_entry+0x1d7/0x200
 [<ffffffff811867dd>] sys_unlinkat+0x2d/0x40
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task rm:12156 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rm            D 0000000000000001     0 12156  14341 0x00000084
 ffff88011bf93db8 0000000000000082 0000000000000000 ffffffff8104d1c9
 ffff88011bf93d48 0000000300000000 ffff88011bf93d58 ffff880037b8cc40
 ffff880118f37098 ffff88011bf93fd8 000000000000fb88 ffff880118f37098
Call Trace:
 [<ffffffff8104d1c9>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00f3075>] fuse_request_send+0xe5/0x290 [fuse]
 [<ffffffff81090c00>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa00f5385>] fuse_rmdir+0x85/0x110 [fuse]
 [<ffffffff811841fd>] vfs_rmdir+0xbd/0xf0
 [<ffffffff81182e4a>] ? lookup_hash+0x3a/0x50
 [<ffffffff81186793>] do_rmdir+0x103/0x120
 [<ffffffff81176922>] ? vfs_write+0x132/0x1a0
 [<ffffffff810d4027>] ? audit_syscall_entry+0x1d7/0x200
 [<ffffffff811867dd>] sys_unlinkat+0x2d/0x40
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task rm:12156 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rm            D 0000000000000001     0 12156  14341 0x00000084
 ffff88011bf93db8 0000000000000082 0000000000000000 ffffffff8104d1c9
 ffff88011bf93d48 0000000300000000 ffff88011bf93d58 ffff880037b8cc40
 ffff880118f37098 ffff88011bf93fd8 000000000000fb88 ffff880118f37098
Call Trace:
 [<ffffffff8104d1c9>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00f3075>] fuse_request_send+0xe5/0x290 [fuse]
 [<ffffffff81090c00>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa00f5385>] fuse_rmdir+0x85/0x110 [fuse]
 [<ffffffff811841fd>] vfs_rmdir+0xbd/0xf0
 [<ffffffff81182e4a>] ? lookup_hash+0x3a/0x50
 [<ffffffff81186793>] do_rmdir+0x103/0x120
 [<ffffffff81176922>] ? vfs_write+0x132/0x1a0
 [<ffffffff810d4027>] ? audit_syscall_entry+0x1d7/0x200
 [<ffffffff811867dd>] sys_unlinkat+0x2d/0x40
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task rm:12156 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rm            D 0000000000000001     0 12156  14341 0x00000084
 ffff88011bf93db8 0000000000000082 0000000000000000 ffffffff8104d1c9
 ffff88011bf93d48 0000000300000000 ffff88011bf93d58 ffff880037b8cc40
 ffff880118f37098 ffff88011bf93fd8 000000000000fb88 ffff880118f37098
Call Trace:
 [<ffffffff8104d1c9>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00f3075>] fuse_request_send+0xe5/0x290 [fuse]
 [<ffffffff81090c00>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa00f5385>] fuse_rmdir+0x85/0x110 [fuse]
 [<ffffffff811841fd>] vfs_rmdir+0xbd/0xf0
 [<ffffffff81182e4a>] ? lookup_hash+0x3a/0x50
 [<ffffffff81186793>] do_rmdir+0x103/0x120
 [<ffffffff81176922>] ? vfs_write+0x132/0x1a0
 [<ffffffff810d4027>] ? audit_syscall_entry+0x1d7/0x200
 [<ffffffff811867dd>] sys_unlinkat+0x2d/0x40
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task rm:12156 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rm            D 0000000000000001     0 12156  14341 0x00000084
 ffff88011bf93db8 0000000000000082 0000000000000000 ffffffff8104d1c9
 ffff88011bf93d48 0000000300000000 ffff88011bf93d58 ffff880037b8cc40
 ffff880118f37098 ffff88011bf93fd8 000000000000fb88 ffff880118f37098
Call Trace:
 [<ffffffff8104d1c9>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00f3075>] fuse_request_send+0xe5/0x290 [fuse]
 [<ffffffff81090c00>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa00f5385>] fuse_rmdir+0x85/0x110 [fuse]
 [<ffffffff811841fd>] vfs_rmdir+0xbd/0xf0
 [<ffffffff81182e4a>] ? lookup_hash+0x3a/0x50
 [<ffffffff81186793>] do_rmdir+0x103/0x120
 [<ffffffff81176922>] ? vfs_write+0x132/0x1a0
 [<ffffffff810d4027>] ? audit_syscall_entry+0x1d7/0x200
 [<ffffffff811867dd>] sys_unlinkat+0x2d/0x40
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task rm:12156 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rm            D 0000000000000001     0 12156  14341 0x00000084
 ffff88011bf93db8 0000000000000082 0000000000000000 ffffffff8104d1c9
 ffff88011bf93d48 0000000300000000 ffff88011bf93d58 ffff880037b8cc40
 ffff880118f37098 ffff88011bf93fd8 000000000000fb88 ffff880118f37098
Call Trace:
 [<ffffffff8104d1c9>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00f3075>] fuse_request_send+0xe5/0x290 [fuse]
 [<ffffffff81090c00>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa00f5385>] fuse_rmdir+0x85/0x110 [fuse]
 [<ffffffff811841fd>] vfs_rmdir+0xbd/0xf0
 [<ffffffff81182e4a>] ? lookup_hash+0x3a/0x50
 [<ffffffff81186793>] do_rmdir+0x103/0x120
 [<ffffffff81176922>] ? vfs_write+0x132/0x1a0
 [<ffffffff810d4027>] ? audit_syscall_entry+0x1d7/0x200
 [<ffffffff811867dd>] sys_unlinkat+0x2d/0x40
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task rm:12156 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rm            D 0000000000000001     0 12156  14341 0x00000084
 ffff88011bf93db8 0000000000000082 0000000000000000 ffffffff8104d1c9
 ffff88011bf93d48 0000000300000000 ffff88011bf93d58 ffff880037b8cc40
 ffff880118f37098 ffff88011bf93fd8 000000000000fb88 ffff880118f37098
Call Trace:
 [<ffffffff8104d1c9>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00f3075>] fuse_request_send+0xe5/0x290 [fuse]
 [<ffffffff81090c00>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa00f5385>] fuse_rmdir+0x85/0x110 [fuse]
 [<ffffffff811841fd>] vfs_rmdir+0xbd/0xf0
 [<ffffffff81182e4a>] ? lookup_hash+0x3a/0x50
 [<ffffffff81186793>] do_rmdir+0x103/0x120
 [<ffffffff81176922>] ? vfs_write+0x132/0x1a0
 [<ffffffff810d4027>] ? audit_syscall_entry+0x1d7/0x200
 [<ffffffff811867dd>] sys_unlinkat+0x2d/0x40
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task rm:12156 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rm            D 0000000000000001     0 12156  14341 0x00000084
 ffff88011bf93db8 0000000000000082 0000000000000000 ffffffff8104d1c9
 ffff88011bf93d48 0000000300000000 ffff88011bf93d58 ffff880037b8cc40
 ffff880118f37098 ffff88011bf93fd8 000000000000fb88 ffff880118f37098
Call Trace:
 [<ffffffff8104d1c9>] ? __wake_up_common+0x59/0x90
 [<ffffffffa00f3075>] fuse_request_send+0xe5/0x290 [fuse]
 [<ffffffff81090c00>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa00f5385>] fuse_rmdir+0x85/0x110 [fuse]
 [<ffffffff811841fd>] vfs_rmdir+0xbd/0xf0
 [<ffffffff81182e4a>] ? lookup_hash+0x3a/0x50
 [<ffffffff81186793>] do_rmdir+0x103/0x120
 [<ffffffff81176922>] ? vfs_write+0x132/0x1a0
 [<ffffffff810d4027>] ? audit_syscall_entry+0x1d7/0x200
 [<ffffffff811867dd>] sys_unlinkat+0x2d/0x40
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b




Expected results:
=================

Fuse Mount / rm should not hung.

Comment 4 Amar Tumballi 2013-05-04 10:35:36 UTC
for testing purposes, try mounting nfs mount using '-o soft' which allows the mount processes to get out of hang without a reboot.

Comment 5 Amar Tumballi 2013-05-09 10:07:49 UTC
also, will this be fixed if we do 'gluster volume set <VOL> eager-lock off'

Comment 8 Sachidananda Urs 2013-07-17 07:00:22 UTC
Amar, can you please update the patch as well?

Comment 9 Amar Tumballi 2013-07-25 04:35:43 UTC
Patch fixing this issue is @ https://code.engineering.redhat.com/gerrit/7902

Comment 11 Scott Haines 2013-09-23 22:35:21 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html


Note You need to log in before you can comment on or make changes to this bug.