Bug 1020713
Summary: | dist-rep + quota : directory selfheal is not healing xattr 'trusted.glusterfs.quota.limit-set'; If you bring a replica pair down | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rachana Patel <racpatel> | |
Component: | distribute | Assignee: | Nagaprasad Sathyanarayana <nsathyan> | |
Status: | CLOSED DEFERRED | QA Contact: | Matt Zywusko <mzywusko> | |
Severity: | high | Docs Contact: | ||
Priority: | medium | |||
Version: | 2.1 | CC: | asriram, gluster-bugs, grajaiya, mhideo, mzywusko, nsathyan, pkarampu, rwheeler, saujain, smohan, spalai, storage-doc, vagarwal, vbellur, vmallika | |
Target Milestone: | --- | Keywords: | Reopened, ZStream | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Known Issue | ||
Doc Text: |
In a distribute or distribute replicate volume, while setting quota limit on a directory, if one or more bricks or one or more replica sets respectively, experience downtime, quota is not enforced on those bricks or replica sets, when they are back online. As a result, the disk usage exceeds the quota limit.
Workaround: Set quota limit again after the brick is back online.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1286188 1286191 (view as bug list) | Environment: | ||
Last Closed: | 2015-11-27 12:18:22 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1020127, 1286188, 1286191 |
Description
Rachana Patel
2013-10-18 08:15:37 UTC
After discussing with Shishir about the issue we came to conclusion that plain distribute setup never guaranteed data availability when one of the bricks is down. Reason being dht does not keep track of which subvolume is the 'source' and which subvolume is stale. The other quota xattrs apart from 'quota-limit' are not 'healed' by dht but created by the quota/marker xlators. Reason for reopening the same bug is, I think root-cause is same, dht is not healing xattr so the same bug is coming in dist-rep also. Now the bug is for dist-rep volume, sequence of action will remain same with few modification like rather than bringing on brick down we have to bring one replica down. So better to write steps again Description of problem: In dist-rep volume if one or more replica pairs are down and User creates Directory then when those bricks come up, On look up directory should be self healed on previously down replica pair Right now it is healing Directory but all xattr related to quota are not healed. When directory is created by self heal on down bricks(replica pair) it has following quota xattr :- trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri trusted.glusterfs.quota.dirty trusted.glusterfs.quota.sizz trusted.glusterfs.quota.limit-set xattr is missing. As a result after rebalance when this directory has layout but no quota limit set, it will violet quota limit . Bug 1002885 might depend on this bug (not sure but seems so) As in bug 1002885 we are adding new replica pair. self heal should create directory and xattr. As it is not done, even after rebalance xattr is missing Version-Release number of selected component (if applicable): Big bend RHS ISO + glusterfs-*-3.4.0.35rhs-1.el6rhs.x86_64 How reproducible: always Steps to Reproduce: 1. create a Dist-rep volume(2x2) and fuse mount it. enable quota. #[root@rhs-client22 rpm]# mount -t glusterfs 10.70.37.204:/dist-rept /mnt/dist-rept #gluster volume quota dist-rept enable 2. a.bring one replica pair down by killing process [root@4VM5 rpm]# gluster volume status dist-rept Status of volume: dist-rept Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.204:/rhs/brick1/r1 49160 Y 1078 Brick 10.70.37.142:/rhs/brick1/r1 49159 Y 872 Brick 10.70.37.204:/rhs/brick1/r2 49162 Y 1089 Brick 10.70.37.142:/rhs/brick1/r2 49160 Y 883 NFS Server on localhost 2049 Y 1101 Self-heal Daemon on localhost N/A Y 1110 Quota Daemon on localhost N/A Y 1377 NFS Server on 10.70.37.142 2049 Y 896 Self-heal Daemon on 10.70.37.142 N/A Y 904 Quota Daemon on 10.70.37.142 N/A Y 1188 There are no active volume tasks server 1:- [root@4VM5 rpm]# kill -9 1078 server 2:- [root@4VM6 rpm]# kill -9 872 and create directory from mount point [root@rhs-client22 dist-rep]# cd /mnt/dist-rept [root@rhs-client22 dist-rept]# mkdir down 3.now set quota limit for newly created directory [root@4VM6 rpm]# gluster volume quota dist-rept limit-usage /down 50MB 50% volume quota : success 4. bring all bricks up by gluster volume start <volname> force [root@4VM5 rpm]# gluster volume start dist-rept force volume start: dist-rept: success 5. perform lookup on newly created directory from mout point, which should self heal directory on all bricks 6. also run heal command for volume [root@4VM5 rpm]# gluster volume heal dist-rept Launching heal operation to perform index self heal on volume dist-rept has been successful Use heal info commands to check status full heal also [root@4VM5 rpm]# gluster volume heal dist-rept full Launching heal operation to perform full self heal on volume dist-rept has been successful Use heal info commands to check status 7. check xattr for the directory on previously down bricks- replica pair down bricks/ down replica pair:- [root@4VM6 rpm]# getfattr -d -m . -e hex /rhs/brick1/r1/down getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/r1/down trusted.afr.dist-rept-client-0=0x000000000000000000000000 trusted.afr.dist-rept-client-1=0x000000000000000000000000 trusted.gfid=0x010f34d3b5104a25b516d050adc6d01b trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x0000000000000000 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.size=0x0000000000000000 [root@4VM5 rpm]# getfattr -d -m . -e hex /rhs/brick1/r1/down getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/r1/down trusted.afr.dist-rept-client-0=0x000000000000000000000000 trusted.afr.dist-rept-client-1=0x000000000000000000000000 trusted.gfid=0x010f34d3b5104a25b516d050adc6d01b trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x0000000000000000 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.size=0x0000000000000000 up bricks/ up replica pair:- [root@4VM6 rpm]# getfattr -d -m . -e hex /rhs/brick1/r2/down getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/r2/down trusted.gfid=0x010f34d3b5104a25b516d050adc6d01b trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x0000000000000000 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.limit-set=0x00000000032000000000000000000032 trusted.glusterfs.quota.size=0x0000000000000000 [root@4VM5 rpm]# getfattr -d -m . -e hex /rhs/brick1/r2/down getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/r2/down trusted.gfid=0x010f34d3b5104a25b516d050adc6d01b trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x0000000000000000 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.limit-set=0x00000000032000000000000000000032 trusted.glusterfs.quota.size=0x0000000000000000 Actual results: It is not healing all the attribute related to quota. trusted.glusterfs.quota.limit-set is not healed. Expected results: It should copy all xattr Pranith, Can you please review the doc text for technical accuracy? Moving the known issues to Doc team, to be documented in release notes for U1 Moving the known issues to Doc team, to be documented in release notes for U1 Moving the known issues to Doc team, to be documented in release notes for U1 This is documented as a known issue in the Big Bend Update 1 Release Notes. Here is the link: http://documentation-devel.engineering.redhat.com/docs/en-US/Red_Hat_Storage/2.1/html/2.1_Update_1_Release_Notes/chap-Documentation-2.1_Update_1_Release_Notes-Known_Issues.html |