Description of problem: ======================= If you are in virtual snap world (.snaps) and uss is disabled, ls from it errors out as expected. But if you again enable the uss which starts the new snapd process and if you try to do ls from it, it hungs. It should re-establish the connect OR it should gracefully error out. USS is enabled and we are inside the virtual snap world from client: ==================================================================== [root@wingo vol2]# cd .snaps [root@wingo .snaps]# ls rs1 rs2 [root@wingo .snaps]# cd rs1 [root@wingo rs1]# ls etc.1 etc.2 [root@wingo rs1]# cd etc.1 [root@wingo etc.1]# [root@wingo etc.1]# pwd /mnt/vol2/.snaps/rs1/etc.1 [root@wingo etc.1]# Disable the USS and try to ls from inside, it errors as expected: ================================================================= [root@inception ~]# gluster v set vol2 uss off volume set: success [root@wingo etc.1]# pwd /mnt/vol2/.snaps/rs1/etc.1 [root@wingo etc.1]# ls ls: cannot open directory .: No such file or directory [root@wingo etc.1]# ls ls: cannot open directory .: No such file or directory [root@wingo etc.1]# Re-enable the USS and try to do ls: =================================== [root@inception ~]# gluster v set vol2 uss on volume set: success [root@inception ~]# [root@wingo etc.1]# ls ^C ^C^C ^C ^C Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.6.0.34-1.el6rhs.x86_64 How reproducible: ================= always Steps to Reproduce: =================== 1. Create 4 node cluster 2. Create 2x2 volume 3. Mount on client (Fuse and NFS) 4. From Fuse do cp -rf /etc etc.1 5. From Nfs do cp -rf /etc etc.2 6. Create 2 snapshots rs1 and rs2 7. Activate snapshots rs1 and rs2 8. Enable USS 9. cd to the virtual .snaps directory from fuse and nfs From fuse: cd .snaps/rs1/etc.1 From nfs: cd .snaps/rs2/etc.2 10. ls from both the above directories, it should list entries 11. Disable the USS 12. ls again, it should error out 13. Enable the USS 14. ls again Actual results: =============== It hungs from fuse and nfs Expected results: ================= It should re-establish the connection and list the entries OR, it should gracefully error out
nfs: server inception.lab.eng.blr.redhat.com not responding, still trying INFO: task ls:6757 blocked for more than 120 seconds. Not tainted 2.6.32-504.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ls D 0000000000000000 0 6757 6339 0x00000084 ffff880119e85a68 0000000000000082 0000000000000001 0000000000000246 ffff880119e859f8 0000000000000206 ffff880119e859f8 ffffffff8109ece7 ffff880119e85a08 ffff880119594d98 ffff8801196f65f8 ffff880119e85fd8 Call Trace: [<ffffffff8109ece7>] ? finish_wait+0x67/0x80 [<ffffffff8109ee2e>] ? prepare_to_wait+0x4e/0x80 [<ffffffffa00e433d>] __fuse_request_send+0xed/0x2b0 [fuse] [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa00e4512>] fuse_request_send+0x12/0x20 [fuse] [<ffffffffa00e7ccc>] fuse_do_getattr+0x10c/0x2c0 [fuse] [<ffffffffa00e7efd>] fuse_update_attributes+0x7d/0x90 [fuse] [<ffffffffa00e9038>] fuse_permission+0xb8/0x1f0 [fuse] [<ffffffff8119e1e3>] __link_path_walk+0xb3/0x1000 [<ffffffff8119c5b5>] ? path_init+0x185/0x250 [<ffffffff8119f3ea>] path_walk+0x6a/0xe0 [<ffffffff8119f5fb>] filename_lookup+0x6b/0xc0 [<ffffffff8122d466>] ? security_file_alloc+0x16/0x20 [<ffffffff811a0ad4>] do_filp_open+0x104/0xd20 [<ffffffff811a3752>] ? vfs_ioctl+0x22/0xa0 [<ffffffff81298eea>] ? strncpy_from_user+0x4a/0x90 [<ffffffff811adf82>] ? alloc_fd+0x92/0x160 [<ffffffff8118ae07>] do_sys_open+0x67/0x130 [<ffffffff8118af10>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b [root@wingo ~]#
I am not able to re-create this problem in the latest downstream code
We were not able to re-create this problem with the below setup: Installed glusterfs-3.6.0.35 Created 4 node cluster Created 2x2 volume Followed the instruction mentioned in the description
This issue is easily reproducible with build: glusterfs-3.6.0.34 With build: glusterfs-3.6.0.35, after enabling, it errors out with "No such file or directory" and no hung observed.
Patch https://code.engineering.redhat.com/gerrit/#/c/37398/ has fixed this issue
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0038.html