+++ This bug was initially created as a clone of Bug #1346719 +++ Description of problem: Creation of files and ls gets hanged while trying to do rm -rf in infinite loop Version-Release number of selected component (if applicable): [root@apandey gluster]# glusterfs --version glusterfs 3.9dev built on Jun 15 2016 11:39:11 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. How reproducible: 1/1 Steps to Reproduce: 1. Create a disperse volume. 2. Mount this volume on 3 mount points- m1, m2 , m3 3. Create 10000 file on m1 using for and dd. After some time start rm -rf on m2 in an infinite loop. Start ls -lRT on m3 Actual results: IO Hang has been seen. on m1, m3. Expected results: There should not be any hang. Additional info: Volume Name: vol Type: Disperse Volume ID: c81743b4-ab0e-4d9b-931b-4d67f4d24a75 Status: Started Number of Bricks: 1 x (4 + 2) = 6 Transport-type: tcp Bricks: Brick1: apandey:/brick/gluster/vol-1 Brick2: apandey:/brick/gluster/vol-2 Brick3: apandey:/brick/gluster/vol-3 Brick4: apandey:/brick/gluster/vol-4 Brick5: apandey:/brick/gluster/vol-5 Brick6: apandey:/brick/gluster/vol-6 Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: off Status of volume: vol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick apandey:/brick/gluster/vol-1 49152 0 Y 13179 Brick apandey:/brick/gluster/vol-2 49153 0 Y 13198 Brick apandey:/brick/gluster/vol-3 49154 0 Y 13217 Brick apandey:/brick/gluster/vol-4 49155 0 Y 13236 Brick apandey:/brick/gluster/vol-5 49156 0 Y 13255 Brick apandey:/brick/gluster/vol-6 49157 0 Y 13274 NFS Server on localhost N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 13302 Task Status of Volume vol ------------------------------------------------------------------------------ There are no active volume tasks [root@apandey gluster]# mount usectl on /sys/fs/fuse/connections type fusectl (rw,relatime) apandey:vol on /mnt/glu type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) apandey:vol on /mnt/gfs type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) apandey:vol on /mnt/vol type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) [root@apandey gluster]# --- Additional comment from Ashish Pandey on 2016-06-15 05:01:07 EDT --- statedump shows some blocked inodelk - [conn.1.bound_xl./brick/gluster/vol-1.active.1] gfid=00000000-0000-0000-0000-000000000001 nlookup=3 fd-count=3 ref=1 ia_type=2 [xlator.features.locks.vol-locks.inode] path=/ mandatory=0 inodelk-count=3 lock-dump.domain.domain=dht.layout.heal lock-dump.domain.domain=vol-disperse-0:self-heal lock-dump.domain.domain=vol-disperse-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 3327, owner=dc710738fd7e0000, client=0x7f283c1a7b00, connection-id=apandey-15766-2016/06/15-07:59:38:894408-vol-client-0-0-0, blocked at 2016-06-15 08:02:13, granted at 2016-06-15 08:02:13 inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 22451, owner=cc338ae8f07f0000, client=0x7f2834006660, connection-id=apandey-13531-2016/06/15-07:58:50:360055-vol-client-0-0-0, blocked at 2016-06-15 08:02:13 inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 22530, owner=6cd51d48da7f0000, client=0x7f28342db820, connection-id=apandey-19856-2016/06/15-08:01:05:258794-vol-client-0-0-0, blocked at 2016-06-15 08:02:22 --- Additional comment from Ashish Pandey on 2016-06-15 05:08:42 EDT --- Just observed that option disperse.eager-lock has come to rescue- Setting disperse.eager-lock to off started IO's and ls -lR command. gluster v set vol disperse.eager-lock off --- Additional comment from Worker Ant on 2016-08-24 11:54:52 EDT --- REVIEW: http://review.gluster.org/15309 (cluster/ec: Use locks for opendir) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Worker Ant on 2016-08-25 09:48:36 EDT --- COMMIT: http://review.gluster.org/15309 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit f013335400d033a9677797377b90b968803135f4 Author: Pranith Kumar K <pkarampu> Date: Wed Aug 24 21:01:05 2016 +0530 cluster/ec: Use locks for opendir Problem: In some cases we see that readdir keeps winding to the brick that doesn't have any blocked locks i.e. first brick. This is leading to the client assuming that there are no blocking locks on the inode so it won't give away the lock. Other clients end up blocked on the lock as if the command hung. Fix: Proper way to fix this issue is to use infra present in http://review.gluster.org/14736 This is a stop gap fix where we start taking inodelks in opendir which goes to all the bricks, this will detect if there is any contention. BUG: 1346719 Change-Id: I91109107a26f6535b945ac476338e9f21dc31eb9 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/15309 Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Ashish Pandey <aspandey>
REVIEW: http://review.gluster.org/15406 (cluster/ec: Use locks for opendir) posted (#1) for review on release-3.7 by Ashish Pandey (aspandey)
COMMIT: http://review.gluster.org/15406 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) ------ commit 36692a522ff99fe4c6127c359f4af1cc9aad8de8 Author: Pranith Kumar K <pkarampu> Date: Wed Aug 24 21:01:05 2016 +0530 cluster/ec: Use locks for opendir Problem: In some cases we see that readdir keeps winding to the brick that doesn't have any blocked locks i.e. first brick. This is leading to the client assuming that there are no blocking locks on the inode so it won't give away the lock. Other clients end up blocked on the lock as if the command hung. Fix: Proper way to fix this issue is to use infra present in http://review.gluster.org/14736 This is a stop gap fix where we start taking inodelks in opendir which goes to all the bricks, this will detect if there is any contention. cherry picked from commit f013335400d033a9677797377b90b968803135f4: >BUG: 1346719 >Change-Id: I91109107a26f6535b945ac476338e9f21dc31eb9 >Signed-off-by: Pranith Kumar K <pkarampu> >Reviewed-on: http://review.gluster.org/15309 >Smoke: Gluster Build System <jenkins.org> >CentOS-regression: Gluster Build System <jenkins.org> >NetBSD-regression: NetBSD Build System <jenkins.org> >Reviewed-by: Ashish Pandey <aspandey> Change-Id: I91109107a26f6535b945ac476338e9f21dc31eb9 BUG: 1373392 Signed-off-by: Ashish Pandey <aspandey> Reviewed-on: http://review.gluster.org/15406 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.16, please open a new bug report. glusterfs-3.7.16 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-devel/2016-October/051187.html [2] https://www.gluster.org/pipermail/gluster-users/