Description of problem: ======================= Created multiple files and directories on volume from fuse and nfs mount. After a while tried to delete the files and directories simultaneously from fuse and nfs mount with broadcast command "rm -rvf *" Noticed that FUSE mount is hung and can not come out of it, dmesg confirmed that rm on fuse is blocked. dmesg shows the following: ========================= INFO: task rm:12156 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rm D 0000000000000001 0 12156 14341 0x00000084 ffff88011bf93db8 0000000000000082 0000000000000000 ffffffff8104d1c9 ffff88011bf93d48 0000000300000000 ffff88011bf93d58 ffff880037b8cc40 ffff880118f37098 ffff88011bf93fd8 000000000000fb88 ffff880118f37098 Call Trace: [<ffffffff8104d1c9>] ? __wake_up_common+0x59/0x90 [<ffffffffa00f3075>] fuse_request_send+0xe5/0x290 [fuse] [<ffffffff81090c00>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa00f5385>] fuse_rmdir+0x85/0x110 [fuse] [<ffffffff811841fd>] vfs_rmdir+0xbd/0xf0 [<ffffffff81182e4a>] ? lookup_hash+0x3a/0x50 [<ffffffff81186793>] do_rmdir+0x103/0x120 [<ffffffff81176922>] ? vfs_write+0x132/0x1a0 [<ffffffff810d4027>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff811867dd>] sys_unlinkat+0x2d/0x40 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b [root@darrel nfs-923538]# Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.4.0.1rhs-1.el6rhs.x86_64 Steps to Reproduce: 1. Created 1*2 volume 2. Mounted this volume using NFS and FUSE on same client. 3. Ran script self_heal_all_file_type_script1.sh from FUSE mount and self_heal_all_file_type_script2.sh from NFS Mount. Both the script does the creation of files and directories 4. While the above script execution was in progress, replaced one of the brick using "replace-brick commit force" 5. Started heal using "gluster volume heal vol-name full" 6. Heal is successful and arequal confirms the checksum match. 7. Tried to remove the files and directories at the same time from fuse and nfs mount using broadcast command "rm -rvf *" 8. Fuse mount hung. Couldn't come out of the hung and any further access to the mount from different session also hungs. dmesg shows the above call trace Actual results: =============== INFO: task rm:12156 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rm D 0000000000000001 0 12156 14341 0x00000084 ffff88011bf93db8 0000000000000082 0000000000000000 ffffffff8104d1c9 ffff88011bf93d48 0000000300000000 ffff88011bf93d58 ffff880037b8cc40 ffff880118f37098 ffff88011bf93fd8 000000000000fb88 ffff880118f37098 Call Trace: [<ffffffff8104d1c9>] ? __wake_up_common+0x59/0x90 [<ffffffffa00f3075>] fuse_request_send+0xe5/0x290 [fuse] [<ffffffff81090c00>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa00f5385>] fuse_rmdir+0x85/0x110 [fuse] [<ffffffff811841fd>] vfs_rmdir+0xbd/0xf0 [<ffffffff81182e4a>] ? lookup_hash+0x3a/0x50 [<ffffffff81186793>] do_rmdir+0x103/0x120 [<ffffffff81176922>] ? vfs_write+0x132/0x1a0 [<ffffffff810d4027>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff811867dd>] sys_unlinkat+0x2d/0x40 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task rm:12156 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rm D 0000000000000001 0 12156 14341 0x00000084 ffff88011bf93db8 0000000000000082 0000000000000000 ffffffff8104d1c9 ffff88011bf93d48 0000000300000000 ffff88011bf93d58 ffff880037b8cc40 ffff880118f37098 ffff88011bf93fd8 000000000000fb88 ffff880118f37098 Call Trace: [<ffffffff8104d1c9>] ? __wake_up_common+0x59/0x90 [<ffffffffa00f3075>] fuse_request_send+0xe5/0x290 [fuse] [<ffffffff81090c00>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa00f5385>] fuse_rmdir+0x85/0x110 [fuse] [<ffffffff811841fd>] vfs_rmdir+0xbd/0xf0 [<ffffffff81182e4a>] ? lookup_hash+0x3a/0x50 [<ffffffff81186793>] do_rmdir+0x103/0x120 [<ffffffff81176922>] ? vfs_write+0x132/0x1a0 [<ffffffff810d4027>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff811867dd>] sys_unlinkat+0x2d/0x40 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task rm:12156 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rm D 0000000000000001 0 12156 14341 0x00000084 ffff88011bf93db8 0000000000000082 0000000000000000 ffffffff8104d1c9 ffff88011bf93d48 0000000300000000 ffff88011bf93d58 ffff880037b8cc40 ffff880118f37098 ffff88011bf93fd8 000000000000fb88 ffff880118f37098 Call Trace: [<ffffffff8104d1c9>] ? __wake_up_common+0x59/0x90 [<ffffffffa00f3075>] fuse_request_send+0xe5/0x290 [fuse] [<ffffffff81090c00>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa00f5385>] fuse_rmdir+0x85/0x110 [fuse] [<ffffffff811841fd>] vfs_rmdir+0xbd/0xf0 [<ffffffff81182e4a>] ? lookup_hash+0x3a/0x50 [<ffffffff81186793>] do_rmdir+0x103/0x120 [<ffffffff81176922>] ? vfs_write+0x132/0x1a0 [<ffffffff810d4027>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff811867dd>] sys_unlinkat+0x2d/0x40 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task rm:12156 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rm D 0000000000000001 0 12156 14341 0x00000084 ffff88011bf93db8 0000000000000082 0000000000000000 ffffffff8104d1c9 ffff88011bf93d48 0000000300000000 ffff88011bf93d58 ffff880037b8cc40 ffff880118f37098 ffff88011bf93fd8 000000000000fb88 ffff880118f37098 Call Trace: [<ffffffff8104d1c9>] ? __wake_up_common+0x59/0x90 [<ffffffffa00f3075>] fuse_request_send+0xe5/0x290 [fuse] [<ffffffff81090c00>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa00f5385>] fuse_rmdir+0x85/0x110 [fuse] [<ffffffff811841fd>] vfs_rmdir+0xbd/0xf0 [<ffffffff81182e4a>] ? lookup_hash+0x3a/0x50 [<ffffffff81186793>] do_rmdir+0x103/0x120 [<ffffffff81176922>] ? vfs_write+0x132/0x1a0 [<ffffffff810d4027>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff811867dd>] sys_unlinkat+0x2d/0x40 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task rm:12156 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rm D 0000000000000001 0 12156 14341 0x00000084 ffff88011bf93db8 0000000000000082 0000000000000000 ffffffff8104d1c9 ffff88011bf93d48 0000000300000000 ffff88011bf93d58 ffff880037b8cc40 ffff880118f37098 ffff88011bf93fd8 000000000000fb88 ffff880118f37098 Call Trace: [<ffffffff8104d1c9>] ? __wake_up_common+0x59/0x90 [<ffffffffa00f3075>] fuse_request_send+0xe5/0x290 [fuse] [<ffffffff81090c00>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa00f5385>] fuse_rmdir+0x85/0x110 [fuse] [<ffffffff811841fd>] vfs_rmdir+0xbd/0xf0 [<ffffffff81182e4a>] ? lookup_hash+0x3a/0x50 [<ffffffff81186793>] do_rmdir+0x103/0x120 [<ffffffff81176922>] ? vfs_write+0x132/0x1a0 [<ffffffff810d4027>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff811867dd>] sys_unlinkat+0x2d/0x40 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task rm:12156 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rm D 0000000000000001 0 12156 14341 0x00000084 ffff88011bf93db8 0000000000000082 0000000000000000 ffffffff8104d1c9 ffff88011bf93d48 0000000300000000 ffff88011bf93d58 ffff880037b8cc40 ffff880118f37098 ffff88011bf93fd8 000000000000fb88 ffff880118f37098 Call Trace: [<ffffffff8104d1c9>] ? __wake_up_common+0x59/0x90 [<ffffffffa00f3075>] fuse_request_send+0xe5/0x290 [fuse] [<ffffffff81090c00>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa00f5385>] fuse_rmdir+0x85/0x110 [fuse] [<ffffffff811841fd>] vfs_rmdir+0xbd/0xf0 [<ffffffff81182e4a>] ? lookup_hash+0x3a/0x50 [<ffffffff81186793>] do_rmdir+0x103/0x120 [<ffffffff81176922>] ? vfs_write+0x132/0x1a0 [<ffffffff810d4027>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff811867dd>] sys_unlinkat+0x2d/0x40 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task rm:12156 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rm D 0000000000000001 0 12156 14341 0x00000084 ffff88011bf93db8 0000000000000082 0000000000000000 ffffffff8104d1c9 ffff88011bf93d48 0000000300000000 ffff88011bf93d58 ffff880037b8cc40 ffff880118f37098 ffff88011bf93fd8 000000000000fb88 ffff880118f37098 Call Trace: [<ffffffff8104d1c9>] ? __wake_up_common+0x59/0x90 [<ffffffffa00f3075>] fuse_request_send+0xe5/0x290 [fuse] [<ffffffff81090c00>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa00f5385>] fuse_rmdir+0x85/0x110 [fuse] [<ffffffff811841fd>] vfs_rmdir+0xbd/0xf0 [<ffffffff81182e4a>] ? lookup_hash+0x3a/0x50 [<ffffffff81186793>] do_rmdir+0x103/0x120 [<ffffffff81176922>] ? vfs_write+0x132/0x1a0 [<ffffffff810d4027>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff811867dd>] sys_unlinkat+0x2d/0x40 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task rm:12156 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rm D 0000000000000001 0 12156 14341 0x00000084 ffff88011bf93db8 0000000000000082 0000000000000000 ffffffff8104d1c9 ffff88011bf93d48 0000000300000000 ffff88011bf93d58 ffff880037b8cc40 ffff880118f37098 ffff88011bf93fd8 000000000000fb88 ffff880118f37098 Call Trace: [<ffffffff8104d1c9>] ? __wake_up_common+0x59/0x90 [<ffffffffa00f3075>] fuse_request_send+0xe5/0x290 [fuse] [<ffffffff81090c00>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa00f5385>] fuse_rmdir+0x85/0x110 [fuse] [<ffffffff811841fd>] vfs_rmdir+0xbd/0xf0 [<ffffffff81182e4a>] ? lookup_hash+0x3a/0x50 [<ffffffff81186793>] do_rmdir+0x103/0x120 [<ffffffff81176922>] ? vfs_write+0x132/0x1a0 [<ffffffff810d4027>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff811867dd>] sys_unlinkat+0x2d/0x40 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Expected results: ================= Fuse Mount / rm should not hung.
for testing purposes, try mounting nfs mount using '-o soft' which allows the mount processes to get out of hang without a reboot.
also, will this be fixed if we do 'gluster volume set <VOL> eager-lock off'
Amar, can you please update the patch as well?
Patch fixing this issue is @ https://code.engineering.redhat.com/gerrit/7902
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html