Bug 1345911

Summary: locks on file in dist-disperse not released leading to IO hangs
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: disperseAssignee: Sunil Kumar Acharya <sheggodu>
Status: CLOSED WORKSFORME QA Contact: Matt Zywusko <mzywusko>
Severity: medium Docs Contact:
Priority: medium    
Version: rhgs-3.1CC: aspandey, hswong3i, pkarampu, rhs-bugs, ubansal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-20 09:01:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1330997    

Description Nag Pavan Chilakam 2016-06-13 12:29:43 UTC
When We try to access data from an EC-volume with ganesha mount, using ls -LRt
the response can hang due to file locks being taken on 4 of the 6 bricks and which end up not getting released.

In this case I am just running ls -lRt on the client without doing any other IOs(single point mount)

Details of IO Etc which were done previously on the vol can  be found in bugs like
1) 1330997 - [Disperse volume]: IO hang seen on mount with file ops 
2)1344675 - Stale file handle seen on the mount of dist-disperse volume when doing IOs with nfs-ganesha protocol 
3)1342426 - self heal deamon killed due to oom kills on a dist-disperse volume using nfs ganesha 




[root@dhcp35-98 log]# rpm -qa|grep gluster
glusterfs-client-xlators-3.7.9-9.el7rhgs.x86_64
nfs-ganesha-gluster-2.3.1-8.el7rhgs.x86_64
glusterfs-server-3.7.9-9.el7rhgs.x86_64
glusterfs-api-3.7.9-9.el7rhgs.x86_64
glusterfs-debuginfo-3.7.9-9.el7rhgs.x86_64
glusterfs-libs-3.7.9-9.el7rhgs.x86_64
glusterfs-fuse-3.7.9-9.el7rhgs.x86_64
glusterfs-cli-3.7.9-9.el7rhgs.x86_64
glusterfs-3.7.9-9.el7rhgs.x86_64
glusterfs-ganesha-3.7.9-9.el7rhgs.x86_64
python-gluster-3.7.9-9.el7rhgs.noarch
[root@dhcp35-98 log]# rpm -qa|grep ganesha
nfs-ganesha-gluster-2.3.1-8.el7rhgs.x86_64
nfs-ganesha-2.3.1-8.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.3.1-8.el7rhgs.x86_64
glusterfs-ganesha-3.7.9-9.el7rhgs.x86_64
[root@dhcp35-98 log]# rpm -qa|grep pcs
pcs-0.9.143-15.el7.x86_64

Comment 3 Sunil Kumar Acharya 2017-09-20 09:01:39 UTC
We tried to recreate the issue having folloiwng packages on the setup.

glusterfs-server-3.8.4-44.el7rhgs.x86_64
nfs-ganesha-2.4.4-17.el7rhgs.x86_64

We didn't see any hang with and without IO on the mount point(mounted using
different VIPs).

Steps followed:

1. Create an EC volume(4+2).
2. Mounted the volume using NFS-Ganesha on 3 different clients.
3. Initiated kernel archive untar(first client) and created 200 directories with 50 files(second client)
   each on the mount point. ls -lRt was run from the third client.
4. Tried ls -lRt from all the three mount points.

Closing this issue as discussed during issue verification.