Bug 1231334

Summary: Gluster native client hangs on accessing dirty file in disperse volume
Product: [Community] GlusterFS Reporter: Dustin Black <dblack>
Component: disperseAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: 3.7.0CC: amukherj, bugs, dblack, gluster-bugs
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-24 04:08:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dustin Black 2015-06-12 17:41:58 UTC
Description of problem:
Commands hang reading or writing a file from a dispersed volume after that file is changed while the volume is degraded and then the degradation is resolved.

Version-Release number of selected component (if applicable):
[root@n1 ~]# gluster --version
glusterfs 3.7.1 built on Jun  1 2015 17:53:10
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.


How reproducible:
Consistently

Steps to Reproduce:
1. Write a file to a 6/2 distribute volume
2. Block access from the client to two of the bricks with iptables
3. Append data to the file
4. Flush iptables rules to reconnect client to all bricks
5. Attempt file read/write operations on the file

Actual results:
Command hangs indefinitely


Expected results:
Command succeeds, and heal is triggered


Additional info:

[root@n1 ~]# mount | grep rhgs
/dev/mapper/rhgs_vg-rhgs_lv on /rhgs/bricks type xfs (rw,noatime,nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota)
n1:ec01 on /rhgs/client/ec01 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

[root@n1 ~]# gluster volume info ec01
 
Volume Name: ec01
Type: Disperse
Volume ID: f9f8d1d8-10d0-48cf-8292-a03860296b80
Status: Started
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: n1:/rhgs/bricks/ec01-1
Brick2: n2:/rhgs/bricks/ec01-1
Brick3: n3:/rhgs/bricks/ec01-1
Brick4: n4:/rhgs/bricks/ec01-1
Brick5: n1:/rhgs/bricks/ec01-2
Brick6: n2:/rhgs/bricks/ec01-2
Options Reconfigured:
performance.readdir-ahead: on

[root@n1 ~]# find / -type d 2>/dev/null | tee -a /rhgs/client/ec01/dirs.txt > /dev/null

[root@n1 ~]# iptables -F
[root@n1 ~]# iptables -A OUTPUT -d n3 -j DROP
[root@n1 ~]# iptables -A OUTPUT -d n4 -j DROP

[root@n1 ~]# echo "new data" >> /rhgs/client/ec01/dirs.txt

[root@n1 ~]# getfattr -d -m . -e hex /rhgs/bricks/ec01-1/dirs.txt 
getfattr: Removing leading '/' from absolute path names
# file: rhgs/bricks/ec01-1/dirs.txt
trusted.bit-rot.version=0x0200000000000000557afa920002b7eb
trusted.ec.config=0x0000080602000200
trusted.ec.dirty=0x00000000000000020000000000000002
trusted.ec.size=0x000000000006c369
trusted.ec.version=0x00000000000000330000000000000035
trusted.gfid=0x1ab0e229ec8548f8bc08dcb7c3874408

[root@n1 ~]# iptables -F

[root@n1 ~]# file /rhgs/client/ec01/dirs.txt

Comment 1 Dustin Black 2015-06-12 17:45:21 UTC
If I kill the glusterfs and glusterfsd processes on n1, restart the glusterd service, and re-mount the client there, I can then access the file properly and can confirm the heal.

Comment 2 Pranith Kumar K 2015-08-04 04:03:06 UTC
Dustin,
    Do you see it happen in the latest release? This seems very similar to the bug: 1227654

Pranith

Comment 3 Dustin Black 2015-08-18 11:10:11 UTC
(In reply to Pranith Kumar K from comment #2)
> Dustin,
>     Do you see it happen in the latest release? This seems very similar to
> the bug: 1227654

I haven't had the lab time to test this yet. It may be a couple of weeks before I can spend any time on it.

Comment 4 Dustin Black 2015-10-21 20:22:45 UTC
I am unable to reproduce this on RHGS 3.1.1