spandura 2013-07-22 08:24:21 EDT Description of problem: ========================= In a distribute-replicate volume, when directories with same names are created and deleted continuously on fuse and nfs mount points, after certain time the mount points hang. Refer to bug: 922792 Version-Release number of selected component (if applicable): ============================================================== root@rhs-client11 [Jul-22-2013-16:00:29] >rpm -qa | grep glusterfs-server glusterfs-server-3.4.0.12rhs.beta5-2.el6rhs.x86_64 root@rhs-client11 [Jul-22-2013-16:00:40] >gluster --version glusterfs 3.4.0.12rhs.beta5 built on Jul 18 2013 07:00:39 How reproducible: test_bug_922792.sh =================== #!/bin/bash dir=$(dirname $(readlink -f $0)) echo 'Script in '$dir while : do mkdir -p foo$1/bar/gee mkdir -p foo$1/bar/gne mkdir -p foo$1/lna/gme rm -rf foo$1 done Steps to Reproduce: =================== 1. Create a distribute-replicate volume ( 6 x 2 ). 4 storage nodes . 3 bricks on each storage node. 2. Start the volume. 3. Create 2 fuse and 2 nfs mounts on each RHEL5.9 and RHEL6.4 clients. 4. From all the mount points execute: "test_bug_922792.sh" Actual results: =============== After sometime, fuse and nfs mount hangs. Expected results: ================ Fuse and nfs mount shouldn't hang. ------------------------------------------- Krutika Dhananjay 2014-07-28 04:53:50 EDT Couple of observations: Tried this on a 2x2 volume with 2 nfs and 2 fuse mounts and the bug is 100% reproducible. The hang stems from one of the clients failing to unlock an inodelk it had held before, on which the hung client is blocked forever. This can be deduced from the presence of the following log messages: [root@calvin glusterfs]# grep -T 'unlock' nfs.log mnt-glusterfs.log nfs.log:[2014-07-20 04:34:47.247058] I [afr-lk-common.c:676:afr_unlock_inodelk_cbk] 0-dis-rep-replicate-0: /foo/lna/gme: unlock failed on subvolume dis-rep-client-0 with lock owner d8a0227c237f0000. Reason : Stale file handle nfs.log:[2014-07-20 04:34:47.247390] I [afr-lk-common.c:676:afr_unlock_inodelk_cbk] 0-dis-rep-replicate-0: /foo/lna/gme: unlock failed on subvolume dis-rep-client-1 with lock owner d8a0227c237f0000. Reason : Stale file handle The ESTALE error on unlock is originating from server resolver, suggestive of the fact that the inode had been unlinked and is no longer part of the inode table. Furthermore, I added GF_ASSERT in afr_unlock_inodelk_cbk() to deliberately crash the client with SIGABRT on unlock failure. The core suggests that (local->loc).gfid and (local->loc).inode->gfid are both different when they should both ideally be one and the same. All this seems to suggest that this is possibly due to existing races between DHT's lookup self-heal/rmdir/mkdir codepaths.
review.gluster.org/#/c/12233/
COMMIT: http://review.gluster.org/12233 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 79c5a715f9bab49f48876ab4f4bc79d211c0d7f1 Author: Ravishankar N <ravishankar> Date: Wed Sep 16 16:35:19 2015 +0530 protocol/client: give preference to loc->gfid over inode->gfid There are xlators which perform fops even before inode gets linked. Because of this loc.gfid is preferred at the time of inodelk/entrylk but by the time unlock can happen, inode could be linked with a different gfid than the one in loc.gfid (because of the way dht was giving preference) Due to this unlock goes on a different inode than the one we sent inodelk on, which leads to hang. Credits to Pranith for the fix. Change-Id: I7d162d44852ba876f35aa1bb83e4afdb184d85b9 BUG: 1266834 Signed-off-by: Ravishankar N <ravishankar> Reviewed-on: http://review.gluster.org/12233 Tested-by: NetBSD Build System <jenkins.org> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu>
*** Bug 1262750 has been marked as a duplicate of this bug. ***
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user