Bug 1121920

Summary: AFR : fuse,nfs mount hangs when directories with same names are created and deleted continuously
Product: [Community] GlusterFS Reporter: Krutika Dhananjay <kdhananj>
Component: replicateAssignee: Krutika Dhananjay <kdhananj>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: bugs, gluster-bugs, kdhananj, ndevos, nsathyan, pcuzner, pkarampu, rhs-bugs, sdharane, spandura, vagarwal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 986916 Environment:
Last Closed: 2015-05-14 17:26:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 951195, 986916, 1286582, 1338634, 1338668, 1338669    
Bug Blocks: 1175551    

Description Krutika Dhananjay 2014-07-22 07:11:36 UTC
+++ This bug was initially created as a clone of Bug #986916 +++

Description of problem:
=========================
In a distribute-replicate volume, when directories with same names are created and deleted continuously on fuse and nfs mount points, after certain time the mount points hang. 

Refer to bug: 922792

Version-Release number of selected component (if applicable):
==============================================================
root@rhs-client11 [Jul-22-2013-16:00:29] >rpm -qa | grep glusterfs-server
glusterfs-server-3.4.0.12rhs.beta5-2.el6rhs.x86_64

root@rhs-client11 [Jul-22-2013-16:00:40] >gluster --version
glusterfs 3.4.0.12rhs.beta5 built on Jul 18 2013 07:00:39

How reproducible:

test_bug_922792.sh
===================
#!/bin/bash

dir=$(dirname $(readlink -f $0))
echo 'Script in '$dir
while :
do
        mkdir -p foo$1/bar/gee
        mkdir -p foo$1/bar/gne
        mkdir -p foo$1/lna/gme
        rm -rf foo$1
done

Steps to Reproduce:
===================
1. Create a distribute-replicate volume ( 6 x 2 ). 4 storage nodes . 3 bricks on each storage node. 

2. Start the volume.

3. Create 2 fuse and 2 nfs mounts on each RHEL5.9 and RHEL6.4 clients. 

4. From all the mount points execute: "test_bug_922792.sh" 

Actual results:
===============
After sometime, fuse and nfs mount hangs. 

Expected results:
================
Fuse and nfs mount shouldn't hang.

--- Additional comment from RHEL Product and Program Management on 2013-07-22 08:44:37 EDT ---

Since this issue was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from  on 2013-07-22 08:54:39 EDT ---

On one of the client (wingo) the mkdir and rm -rf process is in D state:
====================================================
[root@wingo ~]# ps axl | awk '$10 ~ /D/'
0     0  3946     1  18   0  65468   720 reques D    ?          0:00 mkdir -p foo/bar/gee
0     0  4608     1  18   0  65468   720 reques D    ?          0:00 mkdir -p foo/lna/gme
0     0  4616     1  15   0  58944   680 -      D    ?          0:00 rm -rf foo

[root@wingo ~]# hostname
wingo.lab.eng.blr.redhat.com

[root@wingo ~]# mount | grep gluster
glusterfs#rhs-client11:/dis_rep_vol2 on /mnt/gm1 type fuse (rw,default_permissions,allow_other,max_read=131072)
glusterfs#rhs-client11:/dis_rep_vol2 on /mnt/gm2 type fuse (rw,default_permissions,allow_other,max_read=131072)

[root@wingo ~]# mount | grep nfs
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
rhs-client11:/dis_rep_vol2 on /mnt/nfsm1 type nfs (rw,addr=10.70.36.35)
rhs-client11:/dis_rep_vol2 on /mnt/nfsm2 type nfs (rw,addr=10.70.36.35)


CLient2
===============
root@darrel [Jul-22-2013-18:20:28] >hostname
darrel.lab.eng.blr.redhat.com

root@darrel [Jul-22-2013-18:20:32] >mount | grep gluster
rhs-client11:/dis_rep_vol2 on /mnt/gm1 type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
rhs-client11:/dis_rep_vol2 on /mnt/gm2 type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)

root@darrel [Jul-22-2013-18:20:38] >mount | grep nfs
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
rhs-client11:/dis_rep_vol2 on /mnt/nfsm1 type nfs (rw,addr=10.70.36.35)
rhs-client11:/dis_rep_vol2 on /mnt/nfsm2 type nfs (rw,addr=10.70.36.35)


Volume Information:
=================
root@rhs-client11 [Jul-22-2013-17:56:33] >gluster v status
Status of volume: dis_rep_vol2
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs-client11:/rhs/brick1/b0			49152	Y	17093
Brick rhs-client12:/rhs/brick1/b1			49152	Y	31379
Brick rhs-client11:/rhs/brick1/b2			49153	Y	17105
Brick rhs-client12:/rhs/brick1/b3			49153	Y	31391
Brick rhs-client11:/rhs/brick1/b4			49154	Y	17117
Brick rhs-client12:/rhs/brick1/b5			49154	Y	31403
Brick rhs-client13:/rhs/brick1/b6			49152	Y	23157
Brick rhs-client14:/rhs/brick1/b7			49152	Y	22336
Brick rhs-client13:/rhs/brick1/b8			49153	Y	23169
Brick rhs-client14:/rhs/brick1/b9			49153	Y	22348
Brick rhs-client13:/rhs/brick1/b10			49154	Y	23181
Brick rhs-client14:/rhs/brick1/b11			49154	Y	22360
NFS Server on localhost					2049	Y	17132
Self-heal Daemon on localhost				N/A	Y	17137
NFS Server on rhs-client14				2049	Y	22375
Self-heal Daemon on rhs-client14			N/A	Y	22379
NFS Server on rhs-client12				2049	Y	31418
Self-heal Daemon on rhs-client12			N/A	Y	31422
NFS Server on rhs-client13				2049	Y	23195
Self-heal Daemon on rhs-client13			N/A	Y	23200
 
There are no active volume tasks
root@rhs-client11 [Jul-22-2013-17:56:45] >gluster v info
 
Volume Name: dis_rep_vol2
Type: Distributed-Replicate
Volume ID: dfc73c32-4b48-46b3-8430-16b6449889c9
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: rhs-client11:/rhs/brick1/b0
Brick2: rhs-client12:/rhs/brick1/b1
Brick3: rhs-client11:/rhs/brick1/b2
Brick4: rhs-client12:/rhs/brick1/b3
Brick5: rhs-client11:/rhs/brick1/b4
Brick6: rhs-client12:/rhs/brick1/b5
Brick7: rhs-client13:/rhs/brick1/b6
Brick8: rhs-client14:/rhs/brick1/b7
Brick9: rhs-client13:/rhs/brick1/b8
Brick10: rhs-client14:/rhs/brick1/b9
Brick11: rhs-client13:/rhs/brick1/b10
Brick12: rhs-client14:/rhs/brick1/b11

Comment 1 Anand Avati 2014-07-22 08:51:24 UTC
REVIEW: http://review.gluster.org/8344 (cluster/afr: Improve inodelk/entrylk failure log messages) posted (#1) for review on master by Krutika Dhananjay (kdhananj)

Comment 2 Anand Avati 2014-07-22 08:56:52 UTC
REVIEW: http://review.gluster.org/8344 (cluster/afr: Improve inodelk/entrylk failure log messages) posted (#2) for review on master by Krutika Dhananjay (kdhananj)

Comment 3 Anand Avati 2014-07-30 11:36:34 UTC
COMMIT: http://review.gluster.org/8344 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 73fc66fb2dd79b39b6021a6309fb859363c2e968
Author: Krutika Dhananjay <kdhananj>
Date:   Tue Jul 22 13:45:41 2014 +0530

    cluster/afr: Improve inodelk/entrylk failure log messages
    
    Change-Id: Ie792875546b4f8e11ebf870a6e63aaf5cb221976
    BUG: 1121920
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: http://review.gluster.org/8344
    Reviewed-by: Ravishankar N <ravishankar>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Tested-by: Pranith Kumar Karampuri <pkarampu>

Comment 4 Niels de Vos 2015-05-14 17:26:25 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 5 Niels de Vos 2015-05-14 17:35:29 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 6 Niels de Vos 2015-05-14 17:37:51 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 7 Niels de Vos 2015-05-14 17:42:42 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user