Bug 1315781 - AFR returns the node uuid of the same node for every file in the replica
Summary: AFR returns the node uuid of the same node for every file in the replica
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.3.0
Assignee: Karthik U S
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On: 1462693 1462790 1463250 1464078 1487647
Blocks: 1366817 1417147 1451561 1451573 1487042
TreeView+ depends on / blocked
 
Reported: 2016-03-08 15:19 UTC by Nithya Balachandran
Modified: 2017-09-21 04:53 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.8.4-26
Doc Type: Bug Fix
Doc Text:
The rebalance process uses an extended attribute to determine which node migrates a file. In replicated and erasure-coded (dispersed) volumes, only the first node of a replica set was listed in this attribute, so only the first node of a replica set migrated files. Replicated and erasure-coded volumes now list all nodes in a replica set, ensuring that rebalance processes on all nodes migrate files as expected.
Clone Of:
: 1366817 (view as bug list)
Environment:
Last Closed: 2017-09-21 04:25:52 UTC
Embargoed:


Attachments (Terms of Use)
crude testcase and logs while validationg (17.28 KB, text/plain)
2017-07-31 15:26 UTC, Nag Pavan Chilakam
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Nithya Balachandran 2016-03-08 15:19:21 UTC
Description of problem:

If the replica set is healthy, AFR always returns the uuid of the first node as the node-uuid for every file.

Impact : 
DHT uses the node-uuid to decide which node will migrate a file. With this behaviour, a single node ends up migrating all files, affecting performance.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 Atin Mukherjee 2017-04-19 14:04:08 UTC
upstream patch : https://review.gluster.org/17084

Comment 9 Atin Mukherjee 2017-05-13 02:02:19 UTC
one more upstream patch in addition to 17084: https://review.gluster.org/#/c/17239/

Comment 15 Nag Pavan Chilakam 2017-06-19 10:01:39 UTC
on_qa validation blocked due to 1462693 - with AFR now making both nodes to return UUID for a file will result in georep consuming more resources

Comment 16 Nag Pavan Chilakam 2017-07-31 15:09:46 UTC
ON_QA VALIDATION:
TEST BUILD:3.8.4-36


below are the terminologies used regularly in below cases
1x2 volume with replicas as b1 on n1 and b2 on n2
add-brick to make the volume 2x2 with new replicas as b3 on n1 and b4 on n2
TC#1)Now both nodes in a replica set must participate in rebalance...previously only one node used to migrate files(check rebal status) ---->PASS, this also reduces rebalance time overall, as now all the nodes of replica participate in rebalance instead of the first node
TC#2)When a brick is down, the node hosting the brick must continue with rebalance -->PASSES in general, see next case too, but fails to rebalance remaining files on the directory it was working on. It moves to next directory  (Raised a BZ#1476676 - Rebalance skips files when a brick goes downs inspite of afr passing both node ids of replica to rebalance)
TC#3)the other node of a replica src must be able to rebalance all files pending, when one of the src_brick is down. Nodes must be able to rebalance files from other nodes too. That is n1 must be able to rebalance files even if they are on n2 , as long as n1 and n2 are participating in same dht subvol range.
  Eg: If i have a 4 node setup with replicas on n1,n2 and n3,n4. Then if a rebalance is triggered, and b1 goes down, n1 must still be able to rebalance files, by getting from n2.(it won't be able to rebalance n3/n4 related bricks as they are in different subvols)--->PASS
TC#4)afr must still pass both the UUIDs to dht layer, even if one of the src_replicas are down. This can be verified by below -->PASS
 >on 1x2,mount vol and create atleast 3 directories(say dir{1..3}) with say 1lakh files in each 
 >now add-brick to make it 2x2
 >now trigger rebal
 >while rebal is in progress,as part of start of rebalance, rebalance picks the directories requiring rebalance. Once it starts, the first directory will be picked for rebalancing its content, say it was dir1, now bring down b1
 >rebalance from n1 may skip  files in dir1(the current dir where rebal was in progress), however, it must proceed to dir2 to rebalance those files, as afr would be still sending both node UUIDs as b2(other replica)  is still up. If it doesn't send, then n1 would stop rebalance , which is a problem. However afr does send and hence this case is working as expected, as n1 goes ahead with rebalancing of dir2 and dir3

TC#5)check with ec, if all nodes participate in rebalance-->PASS. yes all participate


TC#6) Only nodes hosting replica set which are participating in rebalance must work on reblance-->PASS
 Had 1x2 added new replica pair with b3 on n1 and b4 on n3(new node), did  a rebalance. n3 doesn't participate. That makes sense ,given that n3 is destination and afr of primary replica pair passes uuid of only n1 and n2 to dht layer(because, that afr replica exists only on n1 and n2) . Same with remove brick..only nodes hosting bricks being removed, particpate in rebalnce


TC#7) Check for arbiter volume, All nodes must participate====>PASS

Comment 17 Nag Pavan Chilakam 2017-07-31 15:11:22 UTC
moving to verified as Most thematic(testcases to test core functionality of the fix) are working and PASSED

However raised , below bugs BZ#1476676 and BZ#1476828

Comment 18 Nag Pavan Chilakam 2017-07-31 15:24:58 UTC
also raised bZ#1476852 - DHT layer must dynamically load balance rebalance activity instead of hard presetting entries for each node

Comment 19 Nag Pavan Chilakam 2017-07-31 15:26:27 UTC
Created attachment 1307190 [details]
crude testcase and logs while validationg

Comment 21 Karthik U S 2017-08-16 06:32:45 UTC
Looks good to me.

Comment 23 errata-xmlrpc 2017-09-21 04:25:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 24 errata-xmlrpc 2017-09-21 04:53:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.