1231334 – Gluster native client hangs on accessing dirty file in disperse volume

Bug 1231334 - Gluster native client hangs on accessing dirty file in disperse volume

Summary: Gluster native client hangs on accessing dirty file in disperse volume

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	disperse
Sub Component:
Version:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Assignee:	Pranith Kumar K
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-06-12 17:41 UTC by Dustin Black
Modified:	2015-10-24 04:08 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-10-24 04:08:43 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Dustin Black 2015-06-12 17:41:58 UTC

Description of problem:
Commands hang reading or writing a file from a dispersed volume after that file is changed while the volume is degraded and then the degradation is resolved.

Version-Release number of selected component (if applicable):
[root@n1 ~]# gluster --version
glusterfs 3.7.1 built on Jun  1 2015 17:53:10
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.


How reproducible:
Consistently

Steps to Reproduce:
1. Write a file to a 6/2 distribute volume
2. Block access from the client to two of the bricks with iptables
3. Append data to the file
4. Flush iptables rules to reconnect client to all bricks
5. Attempt file read/write operations on the file

Actual results:
Command hangs indefinitely


Expected results:
Command succeeds, and heal is triggered


Additional info:

[root@n1 ~]# mount | grep rhgs
/dev/mapper/rhgs_vg-rhgs_lv on /rhgs/bricks type xfs (rw,noatime,nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota)
n1:ec01 on /rhgs/client/ec01 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

[root@n1 ~]# gluster volume info ec01
 
Volume Name: ec01
Type: Disperse
Volume ID: f9f8d1d8-10d0-48cf-8292-a03860296b80
Status: Started
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: n1:/rhgs/bricks/ec01-1
Brick2: n2:/rhgs/bricks/ec01-1
Brick3: n3:/rhgs/bricks/ec01-1
Brick4: n4:/rhgs/bricks/ec01-1
Brick5: n1:/rhgs/bricks/ec01-2
Brick6: n2:/rhgs/bricks/ec01-2
Options Reconfigured:
performance.readdir-ahead: on

[root@n1 ~]# find / -type d 2>/dev/null | tee -a /rhgs/client/ec01/dirs.txt > /dev/null

[root@n1 ~]# iptables -F
[root@n1 ~]# iptables -A OUTPUT -d n3 -j DROP
[root@n1 ~]# iptables -A OUTPUT -d n4 -j DROP

[root@n1 ~]# echo "new data" >> /rhgs/client/ec01/dirs.txt

[root@n1 ~]# getfattr -d -m . -e hex /rhgs/bricks/ec01-1/dirs.txt 
getfattr: Removing leading '/' from absolute path names
# file: rhgs/bricks/ec01-1/dirs.txt
trusted.bit-rot.version=0x0200000000000000557afa920002b7eb
trusted.ec.config=0x0000080602000200
trusted.ec.dirty=0x00000000000000020000000000000002
trusted.ec.size=0x000000000006c369
trusted.ec.version=0x00000000000000330000000000000035
trusted.gfid=0x1ab0e229ec8548f8bc08dcb7c3874408

[root@n1 ~]# iptables -F

[root@n1 ~]# file /rhgs/client/ec01/dirs.txt

Comment 1 Dustin Black 2015-06-12 17:45:21 UTC

If I kill the glusterfs and glusterfsd processes on n1, restart the glusterd service, and re-mount the client there, I can then access the file properly and can confirm the heal.

Comment 2 Pranith Kumar K 2015-08-04 04:03:06 UTC

Dustin,
    Do you see it happen in the latest release? This seems very similar to the bug: 1227654

Pranith

Comment 3 Dustin Black 2015-08-18 11:10:11 UTC

(In reply to Pranith Kumar K from comment #2)
> Dustin,
>     Do you see it happen in the latest release? This seems very similar to
> the bug: 1227654

I haven't had the lab time to test this yet. It may be a couple of weeks before I can spend any time on it.

Comment 4 Dustin Black 2015-10-21 20:22:45 UTC

I am unable to reproduce this on RHGS 3.1.1

Note You need to log in before you can comment on or make changes to this bug.