Bug 1315781

Summary:

AFR returns the node uuid of the same node for every file in the replica

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Nithya Balachandran <nbalacha>

Component:

replicate

Assignee:

Karthik U S <ksubrahm>

Status:

CLOSED ERRATA

QA Contact:

Nag Pavan Chilakam <nchilaka>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

rhgs-3.1

CC:

amukherj, aspandey, asrivast, ksubrahm, nchilaka, ravishankar, rcyriac, rhinduja, rhs-bugs

Target Milestone:

---

Target Release:

RHGS 3.3.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

glusterfs-3.8.4-26

Doc Type:

Bug Fix

Doc Text:

The rebalance process uses an extended attribute to determine which node migrates a file. In replicated and erasure-coded (dispersed) volumes, only the first node of a replica set was listed in this attribute, so only the first node of a replica set migrated files. Replicated and erasure-coded volumes now list all nodes in a replica set, ensuring that rebalance processes on all nodes migrate files as expected.

Story Points:

---

Clone Of:

Clones:

1366817 (view as bug list)

Environment:

Last Closed:

2017-09-21 04:25:52 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1462693, 1462790, 1463250, 1464078, 1487647

Bug Blocks:

1366817, 1417147, 1451561, 1451573, 1487042

Attachments:

Description	Flags
crude testcase and logs while validationg	none

Description Nithya Balachandran 2016-03-08 15:19:21 UTC

Description of problem:

If the replica set is healthy, AFR always returns the uuid of the first node as the node-uuid for every file.

Impact : 
DHT uses the node-uuid to decide which node will migrate a file. With this behaviour, a single node ends up migrating all files, affecting performance.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 Atin Mukherjee 2017-04-19 14:04:08 UTC

upstream patch : https://review.gluster.org/17084

Comment 9 Atin Mukherjee 2017-05-13 02:02:19 UTC

one more upstream patch in addition to 17084: https://review.gluster.org/#/c/17239/

Comment 15 Nag Pavan Chilakam 2017-06-19 10:01:39 UTC

on_qa validation blocked due to 1462693 - with AFR now making both nodes to return UUID for a file will result in georep consuming more resources

Comment 16 Nag Pavan Chilakam 2017-07-31 15:09:46 UTC

ON_QA VALIDATION:
TEST BUILD:3.8.4-36


below are the terminologies used regularly in below cases
1x2 volume with replicas as b1 on n1 and b2 on n2
add-brick to make the volume 2x2 with new replicas as b3 on n1 and b4 on n2
TC#1)Now both nodes in a replica set must participate in rebalance...previously only one node used to migrate files(check rebal status) ---->PASS, this also reduces rebalance time overall, as now all the nodes of replica participate in rebalance instead of the first node
TC#2)When a brick is down, the node hosting the brick must continue with rebalance -->PASSES in general, see next case too, but fails to rebalance remaining files on the directory it was working on. It moves to next directory  (Raised a BZ#1476676 - Rebalance skips files when a brick goes downs inspite of afr passing both node ids of replica to rebalance)
TC#3)the other node of a replica src must be able to rebalance all files pending, when one of the src_brick is down. Nodes must be able to rebalance files from other nodes too. That is n1 must be able to rebalance files even if they are on n2 , as long as n1 and n2 are participating in same dht subvol range.
  Eg: If i have a 4 node setup with replicas on n1,n2 and n3,n4. Then if a rebalance is triggered, and b1 goes down, n1 must still be able to rebalance files, by getting from n2.(it won't be able to rebalance n3/n4 related bricks as they are in different subvols)--->PASS
TC#4)afr must still pass both the UUIDs to dht layer, even if one of the src_replicas are down. This can be verified by below -->PASS
 >on 1x2,mount vol and create atleast 3 directories(say dir{1..3}) with say 1lakh files in each 
 >now add-brick to make it 2x2
 >now trigger rebal
 >while rebal is in progress,as part of start of rebalance, rebalance picks the directories requiring rebalance. Once it starts, the first directory will be picked for rebalancing its content, say it was dir1, now bring down b1
 >rebalance from n1 may skip  files in dir1(the current dir where rebal was in progress), however, it must proceed to dir2 to rebalance those files, as afr would be still sending both node UUIDs as b2(other replica)  is still up. If it doesn't send, then n1 would stop rebalance , which is a problem. However afr does send and hence this case is working as expected, as n1 goes ahead with rebalancing of dir2 and dir3

TC#5)check with ec, if all nodes participate in rebalance-->PASS. yes all participate


TC#6) Only nodes hosting replica set which are participating in rebalance must work on reblance-->PASS
 Had 1x2 added new replica pair with b3 on n1 and b4 on n3(new node), did  a rebalance. n3 doesn't participate. That makes sense ,given that n3 is destination and afr of primary replica pair passes uuid of only n1 and n2 to dht layer(because, that afr replica exists only on n1 and n2) . Same with remove brick..only nodes hosting bricks being removed, particpate in rebalnce


TC#7) Check for arbiter volume, All nodes must participate====>PASS

Comment 17 Nag Pavan Chilakam 2017-07-31 15:11:22 UTC

moving to verified as Most thematic(testcases to test core functionality of the fix) are working and PASSED

However raised , below bugs BZ#1476676 and BZ#1476828

Comment 18 Nag Pavan Chilakam 2017-07-31 15:24:58 UTC

also raised bZ#1476852 - DHT layer must dynamically load balance rebalance activity instead of hard presetting entries for each node

Comment 19 Nag Pavan Chilakam 2017-07-31 15:26:27 UTC

Created attachment 1307190 [details]
crude testcase and logs while validationg

Comment 21 Karthik U S 2017-08-16 06:32:45 UTC

Looks good to me.

Comment 23 errata-xmlrpc 2017-09-21 04:25:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 24 errata-xmlrpc 2017-09-21 04:53:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774