Description of problem: Commands hang reading or writing a file from a dispersed volume after that file is changed while the volume is degraded and then the degradation is resolved. Version-Release number of selected component (if applicable): [root@n1 ~]# gluster --version glusterfs 3.7.1 built on Jun 1 2015 17:53:10 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. How reproducible: Consistently Steps to Reproduce: 1. Write a file to a 6/2 distribute volume 2. Block access from the client to two of the bricks with iptables 3. Append data to the file 4. Flush iptables rules to reconnect client to all bricks 5. Attempt file read/write operations on the file Actual results: Command hangs indefinitely Expected results: Command succeeds, and heal is triggered Additional info: [root@n1 ~]# mount | grep rhgs /dev/mapper/rhgs_vg-rhgs_lv on /rhgs/bricks type xfs (rw,noatime,nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota) n1:ec01 on /rhgs/client/ec01 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) [root@n1 ~]# gluster volume info ec01 Volume Name: ec01 Type: Disperse Volume ID: f9f8d1d8-10d0-48cf-8292-a03860296b80 Status: Started Number of Bricks: 1 x (4 + 2) = 6 Transport-type: tcp Bricks: Brick1: n1:/rhgs/bricks/ec01-1 Brick2: n2:/rhgs/bricks/ec01-1 Brick3: n3:/rhgs/bricks/ec01-1 Brick4: n4:/rhgs/bricks/ec01-1 Brick5: n1:/rhgs/bricks/ec01-2 Brick6: n2:/rhgs/bricks/ec01-2 Options Reconfigured: performance.readdir-ahead: on [root@n1 ~]# find / -type d 2>/dev/null | tee -a /rhgs/client/ec01/dirs.txt > /dev/null [root@n1 ~]# iptables -F [root@n1 ~]# iptables -A OUTPUT -d n3 -j DROP [root@n1 ~]# iptables -A OUTPUT -d n4 -j DROP [root@n1 ~]# echo "new data" >> /rhgs/client/ec01/dirs.txt [root@n1 ~]# getfattr -d -m . -e hex /rhgs/bricks/ec01-1/dirs.txt getfattr: Removing leading '/' from absolute path names # file: rhgs/bricks/ec01-1/dirs.txt trusted.bit-rot.version=0x0200000000000000557afa920002b7eb trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000020000000000000002 trusted.ec.size=0x000000000006c369 trusted.ec.version=0x00000000000000330000000000000035 trusted.gfid=0x1ab0e229ec8548f8bc08dcb7c3874408 [root@n1 ~]# iptables -F [root@n1 ~]# file /rhgs/client/ec01/dirs.txt
If I kill the glusterfs and glusterfsd processes on n1, restart the glusterd service, and re-mount the client there, I can then access the file properly and can confirm the heal.
Dustin, Do you see it happen in the latest release? This seems very similar to the bug: 1227654 Pranith
(In reply to Pranith Kumar K from comment #2) > Dustin, > Do you see it happen in the latest release? This seems very similar to > the bug: 1227654 I haven't had the lab time to test this yet. It may be a couple of weeks before I can spend any time on it.
I am unable to reproduce this on RHGS 3.1.1