Bug 1020713

Summary:	dist-rep + quota : directory selfheal is not healing xattr 'trusted.glusterfs.quota.limit-set'; If you bring a replica pair down
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Rachana Patel <racpatel>
Component:	distribute	Assignee:	Nagaprasad Sathyanarayana <nsathyan>
Status:	CLOSED DEFERRED	QA Contact:	Matt Zywusko <mzywusko>
Severity:	high	Docs Contact:
Priority:	medium
Version:	2.1	CC:	asriram, gluster-bugs, grajaiya, mhideo, mzywusko, nsathyan, pkarampu, rwheeler, saujain, smohan, spalai, storage-doc, vagarwal, vbellur, vmallika
Target Milestone:	---	Keywords:	Reopened, ZStream
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Known Issue
Doc Text:	In a distribute or distribute replicate volume, while setting quota limit on a directory, if one or more bricks or one or more replica sets respectively, experience downtime, quota is not enforced on those bricks or replica sets, when they are back online. As a result, the disk usage exceeds the quota limit. Workaround: Set quota limit again after the brick is back online.	Story Points:	---
Clone Of:
Clones:	1286188 1286191 (view as bug list)		Environment:
Last Closed:	2015-11-27 12:18:22 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1020127, 1286188, 1286191

Description Rachana Patel 2013-10-18 08:15:37 UTC

Description of problem:
In DHT if one or more bricks are down and Directory is created then when those bricks come up, On look up directory should be self healed on previously down brick.

Right now it is healing Directory but all xattr related to quota are not healed.

When directory is created by self heal on down bricks it has following quota xattr :-
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri
trusted.glusterfs.quota.dirty
trusted.glusterfs.quota.sizz

trusted.glusterfs.quota.limit-set xattr is missing. As a result after rebalance when this directory has layout but no quota limit set, it will violet quota limit . Bug 1002885 might depend on this bug (not sure but seems so)

Version-Release number of selected component (if applicable):
Big bend RHS ISO
+ glusterfs-*-3.4.0.35rhs-1.el6rhs.x86_64

How reproducible:
always

Steps to Reproduce:
1. create a DHT colume and fuse mount it. enable quota.
#[root@rhs-client22 rpm]# mount -t glusterfs 10.70.37.204:/xattr /mnt/xattr
#gluster volume quoa xattr enable

2. bring one or more subvolume down by lilling process and create directory from mount point
[root@rhs-client22 xattr]# mkdir down

3.now set quota limit for newly created directory
# gluster volume quota xattr limit-usage /down 50MB 50%

4. bring all bricks by gluster volume start <volname> force

5. perform lookup on newly created directory from mout point, which should self heal directory on all bricks

6. check xattr for that directory on previously down brick
down brick:-
[root@4VM5 rpm]# getfattr -d -m . -e hex /rhs/brick1/x1/down
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/x1/down
trusted.gfid=0x48f5e5bcd0a9417aa53723fcedec6893
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x0000000000000000
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size=0x0000000000000000

up brick:-
# file: rhs/brick1/x2/down
trusted.gfid=0x48f5e5bcd0a9417aa53723fcedec6893
trusted.glusterfs.dht=0x00000001000000007fffffffffffffff
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x0000000000000000
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.limit-set=0x00000000032000000000000000000032 <--------
trusted.glusterfs.quota.size=0x0000000000000000

Actual results:
It is not healing all the attribute related to quota. trusted.glusterfs.quota.limit-set is not healed. As a result after rebalance when this directory has layout but no quota limit set, it will violet quota limit

Expected results:
It should copy all xattr

Additional info:

Comment 1 Pranith Kumar K 2013-10-18 12:12:10 UTC

After discussing with Shishir about the issue we came to conclusion that plain distribute setup never guaranteed data availability when one of the bricks is down. Reason being dht does not keep track of which subvolume is the 'source' and which subvolume is stale. The other quota xattrs apart from 'quota-limit' are not 'healed' by dht but created by the quota/marker xlators.

Comment 2 Rachana Patel 2013-10-18 16:16:46 UTC

Reason for reopening the same bug is, I think root-cause is same, dht is not healing xattr so the same bug is coming in dist-rep also.

Now the bug is for dist-rep volume, sequence of action will remain same with few modification like rather than bringing on brick down we have to bring one replica down. So better to write steps again

Description of problem:
In dist-rep volume if one or more replica pairs are down and  User creates Directory then when those bricks come up, On look up directory should be self healed on previously down replica pair

Right now it is healing Directory but all xattr related to quota are not healed.

When directory is created by self heal on down bricks(replica pair) it has following quota xattr :- 
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri
trusted.glusterfs.quota.dirty
trusted.glusterfs.quota.sizz

trusted.glusterfs.quota.limit-set xattr is missing. As a result after rebalance when this directory has layout but no quota limit set, it will violet quota limit . 

Bug 1002885  might depend on this bug (not sure but seems so)
As in bug 1002885 we are adding new replica pair. self heal should create directory and xattr. As it is not done, even after rebalance xattr is missing



Version-Release number of selected component (if applicable):
Big bend RHS ISO 
+ glusterfs-*-3.4.0.35rhs-1.el6rhs.x86_64

How reproducible:
always


Steps to Reproduce:
1. create a Dist-rep volume(2x2) and fuse mount it. enable quota.

#[root@rhs-client22 rpm]# mount -t glusterfs 10.70.37.204:/dist-rept /mnt/dist-rept
#gluster volume quota dist-rept enable

2. a.bring one replica pair down by killing process 
[root@4VM5 rpm]# gluster volume status dist-rept
Status of volume: dist-rept
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.37.204:/rhs/brick1/r1			49160	Y	1078
Brick 10.70.37.142:/rhs/brick1/r1			49159	Y	872
Brick 10.70.37.204:/rhs/brick1/r2			49162	Y	1089
Brick 10.70.37.142:/rhs/brick1/r2			49160	Y	883
NFS Server on localhost					2049	Y	1101
Self-heal Daemon on localhost				N/A	Y	1110
Quota Daemon on localhost				N/A	Y	1377
NFS Server on 10.70.37.142				2049	Y	896
Self-heal Daemon on 10.70.37.142			N/A	Y	904
Quota Daemon on 10.70.37.142				N/A	Y	1188
 
There are no active volume tasks
server 1:-
[root@4VM5 rpm]# kill -9 1078
server 2:-
[root@4VM6 rpm]# kill -9 872



and create directory from mount point
[root@rhs-client22 dist-rep]# cd /mnt/dist-rept
[root@rhs-client22 dist-rept]# mkdir down

3.now set quota limit for newly created directory
[root@4VM6 rpm]# gluster volume quota  dist-rept limit-usage /down 50MB 50%
volume quota : success



4. bring all bricks up by gluster volume start <volname> force 
[root@4VM5 rpm]# gluster volume start dist-rept force
volume start: dist-rept: success


5. perform lookup on newly created directory from mout point, which should self heal directory on all bricks

6. also run heal command for volume
[root@4VM5 rpm]# gluster volume heal dist-rept 
Launching heal operation to perform index self heal on volume dist-rept has been successful 
Use heal info commands to check status

full heal also
[root@4VM5 rpm]# gluster volume heal dist-rept full
Launching heal operation to perform full self heal on volume dist-rept has been successful 
Use heal info commands to check status



7. check xattr for the directory on previously down bricks- replica pair
down bricks/ down replica pair:-
[root@4VM6 rpm]# getfattr -d -m . -e hex /rhs/brick1/r1/down
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/r1/down
trusted.afr.dist-rept-client-0=0x000000000000000000000000
trusted.afr.dist-rept-client-1=0x000000000000000000000000
trusted.gfid=0x010f34d3b5104a25b516d050adc6d01b
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x0000000000000000
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size=0x0000000000000000

[root@4VM5 rpm]# getfattr -d -m . -e hex /rhs/brick1/r1/down
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/r1/down
trusted.afr.dist-rept-client-0=0x000000000000000000000000
trusted.afr.dist-rept-client-1=0x000000000000000000000000
trusted.gfid=0x010f34d3b5104a25b516d050adc6d01b
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x0000000000000000
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size=0x0000000000000000


up bricks/ up replica pair:-
[root@4VM6 rpm]# getfattr -d -m . -e hex /rhs/brick1/r2/down
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/r2/down
trusted.gfid=0x010f34d3b5104a25b516d050adc6d01b
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x0000000000000000
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.limit-set=0x00000000032000000000000000000032
trusted.glusterfs.quota.size=0x0000000000000000


[root@4VM5 rpm]# getfattr -d -m . -e hex /rhs/brick1/r2/down
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/r2/down
trusted.gfid=0x010f34d3b5104a25b516d050adc6d01b
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x0000000000000000
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.limit-set=0x00000000032000000000000000000032
trusted.glusterfs.quota.size=0x0000000000000000






Actual results:
It is not healing all the attribute related to quota. trusted.glusterfs.quota.limit-set is not healed. 

Expected results:
It should copy all xattr

Comment 4 Pavithra 2013-10-28 04:59:06 UTC

Pranith,

Can you please review the doc text for technical accuracy?

Comment 5 Vivek Agarwal 2013-11-14 11:27:24 UTC

Moving the known issues to Doc team, to be documented in release notes for U1

Comment 6 Vivek Agarwal 2013-11-14 11:29:15 UTC

Moving the known issues to Doc team, to be documented in release notes for U1

Comment 7 Vivek Agarwal 2013-11-14 11:29:52 UTC

Moving the known issues to Doc team, to be documented in release notes for U1

Comment 8 Pavithra 2013-11-25 07:08:22 UTC

This is documented as a known issue in the Big Bend Update 1 Release Notes. Here is the link:

http://documentation-devel.engineering.redhat.com/docs/en-US/Red_Hat_Storage/2.1/html/2.1_Update_1_Release_Notes/chap-Documentation-2.1_Update_1_Release_Notes-Known_Issues.html