Bug 1315781 - AFR returns the node uuid of the same node for every file in the replica
AFR returns the node uuid of the same node for every file in the replica
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate (Show other bugs)
3.1
Unspecified Unspecified
unspecified Severity unspecified
: ---
: RHGS 3.3.0
Assigned To: Karthik U S
nchilaka
:
Depends On: 1464078 1462693 1462790 1463250 1487647
Blocks: 1366817 1417147 1451561 1451573 1487042
  Show dependency treegraph
 
Reported: 2016-03-08 10:19 EST by Nithya Balachandran
Modified: 2017-09-21 00:53 EDT (History)
9 users (show)

See Also:
Fixed In Version: glusterfs-3.8.4-26
Doc Type: Bug Fix
Doc Text:
The rebalance process uses an extended attribute to determine which node migrates a file. In replicated and erasure-coded (dispersed) volumes, only the first node of a replica set was listed in this attribute, so only the first node of a replica set migrated files. Replicated and erasure-coded volumes now list all nodes in a replica set, ensuring that rebalance processes on all nodes migrate files as expected.
Story Points: ---
Clone Of:
: 1366817 (view as bug list)
Environment:
Last Closed: 2017-09-21 00:25:52 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
crude testcase and logs while validationg (17.28 KB, text/plain)
2017-07-31 11:26 EDT, nchilaka
no flags Details

  None (edit)
Description Nithya Balachandran 2016-03-08 10:19:21 EST
Description of problem:

If the replica set is healthy, AFR always returns the uuid of the first node as the node-uuid for every file.

Impact : 
DHT uses the node-uuid to decide which node will migrate a file. With this behaviour, a single node ends up migrating all files, affecting performance.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 8 Atin Mukherjee 2017-04-19 10:04:08 EDT
upstream patch : https://review.gluster.org/17084
Comment 9 Atin Mukherjee 2017-05-12 22:02:19 EDT
one more upstream patch in addition to 17084: https://review.gluster.org/#/c/17239/
Comment 15 nchilaka 2017-06-19 06:01:39 EDT
on_qa validation blocked due to 1462693 - with AFR now making both nodes to return UUID for a file will result in georep consuming more resources
Comment 16 nchilaka 2017-07-31 11:09:46 EDT
ON_QA VALIDATION:
TEST BUILD:3.8.4-36


below are the terminologies used regularly in below cases
1x2 volume with replicas as b1 on n1 and b2 on n2
add-brick to make the volume 2x2 with new replicas as b3 on n1 and b4 on n2
TC#1)Now both nodes in a replica set must participate in rebalance...previously only one node used to migrate files(check rebal status) ---->PASS, this also reduces rebalance time overall, as now all the nodes of replica participate in rebalance instead of the first node
TC#2)When a brick is down, the node hosting the brick must continue with rebalance -->PASSES in general, see next case too, but fails to rebalance remaining files on the directory it was working on. It moves to next directory  (Raised a BZ#1476676 - Rebalance skips files when a brick goes downs inspite of afr passing both node ids of replica to rebalance)
TC#3)the other node of a replica src must be able to rebalance all files pending, when one of the src_brick is down. Nodes must be able to rebalance files from other nodes too. That is n1 must be able to rebalance files even if they are on n2 , as long as n1 and n2 are participating in same dht subvol range.
  Eg: If i have a 4 node setup with replicas on n1,n2 and n3,n4. Then if a rebalance is triggered, and b1 goes down, n1 must still be able to rebalance files, by getting from n2.(it won't be able to rebalance n3/n4 related bricks as they are in different subvols)--->PASS
TC#4)afr must still pass both the UUIDs to dht layer, even if one of the src_replicas are down. This can be verified by below -->PASS
 >on 1x2,mount vol and create atleast 3 directories(say dir{1..3}) with say 1lakh files in each 
 >now add-brick to make it 2x2
 >now trigger rebal
 >while rebal is in progress,as part of start of rebalance, rebalance picks the directories requiring rebalance. Once it starts, the first directory will be picked for rebalancing its content, say it was dir1, now bring down b1
 >rebalance from n1 may skip  files in dir1(the current dir where rebal was in progress), however, it must proceed to dir2 to rebalance those files, as afr would be still sending both node UUIDs as b2(other replica)  is still up. If it doesn't send, then n1 would stop rebalance , which is a problem. However afr does send and hence this case is working as expected, as n1 goes ahead with rebalancing of dir2 and dir3

TC#5)check with ec, if all nodes participate in rebalance-->PASS. yes all participate


TC#6) Only nodes hosting replica set which are participating in rebalance must work on reblance-->PASS
 Had 1x2 added new replica pair with b3 on n1 and b4 on n3(new node), did  a rebalance. n3 doesn't participate. That makes sense ,given that n3 is destination and afr of primary replica pair passes uuid of only n1 and n2 to dht layer(because, that afr replica exists only on n1 and n2) . Same with remove brick..only nodes hosting bricks being removed, particpate in rebalnce


TC#7) Check for arbiter volume, All nodes must participate====>PASS
Comment 17 nchilaka 2017-07-31 11:11:22 EDT
moving to verified as Most thematic(testcases to test core functionality of the fix) are working and PASSED

However raised , below bugs BZ#1476676 and BZ#1476828
Comment 18 nchilaka 2017-07-31 11:24:58 EDT
also raised bZ#1476852 - DHT layer must dynamically load balance rebalance activity instead of hard presetting entries for each node
Comment 19 nchilaka 2017-07-31 11:26 EDT
Created attachment 1307190 [details]
crude testcase and logs while validationg
Comment 21 Karthik U S 2017-08-16 02:32:45 EDT
Looks good to me.
Comment 23 errata-xmlrpc 2017-09-21 00:25:52 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774
Comment 24 errata-xmlrpc 2017-09-21 00:53:56 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Note You need to log in before you can comment on or make changes to this bug.