979376 – Rebalance : Gluster volume remove brick status displays 2 entries for one host

Bug 979376 - Rebalance : Gluster volume remove brick status displays 2 entries for one host

Summary: Rebalance : Gluster volume remove brick status displays 2 entries for one host

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 2.1.2
Assignee:	Kaushal
QA Contact:	senaik
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1015045
TreeView+	depends on / blocked

Reported:	2013-06-28 11:14 UTC by senaik
Modified:	2015-09-01 12:23 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.4.0.47.1u2rhs
Doc Type:	Bug Fix
Doc Text:	Previously, when one of the hosts in a cluster was restarted, the remove-brick status command displayed two entries for the same host. With this fix, the command works as expected.
Clone Of:
Environment:
Last Closed:	2014-02-25 07:32:18 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2014:0208	0	normal	SHIPPED_LIVE	Red Hat Storage 2.1 enhancement and bug fix update #2	2014-02-25 12:20:30 UTC

Description senaik 2013-06-28 11:14:28 UTC

Description of problem:
======================= 
If one of the nodes in the cluster is rebooted when gluster volume remove-brick operation is running, remove brick status displays 2 entries for one host .

Version-Release number of selected component (if applicable):
=============================================================== 
3.4.0.12rhs-1.el6rhs.x86_64


How reproducible:
================= 
Always


Steps to Reproduce:
===================== 
1.Create a distribute volume with 3 bricks 
gluster v create testvol 10.70.34.105:/rhs/brick1/br1 10.70.34.86:/rhs/brick1/br2 10.70.34.85:/rhs/brick1/br3
volume create: testvol: success: please start the volume to access data

2.Fill the mount point with some files 

3.Perform remove brick operation 
gluster volume remove-brick testvol 10.70.34.85:/rhs/brick1/br3 start 
volume remove-brick start: success
ID: 749edc8f-9da5-4d64-b844-a230cb4d820b

4. Check status

 gluster volume remove-brick testvol 10.70.34.85:/rhs/brick1/br3 status
Node    Rebalanced-files   size    scanned  failures   status  run-time in secs
localhost     0            0Bytes   0         0      not started    0.00
10.70.34.86   0            0Bytes   0         0      not started    0.00
10.70.34.85   17           170.0MB  180       0      in progress    4.00

5. Reboot one of the nodes [10.70.34.86]

6. Check remove brick status [local host is displayed twice]

gluster volume remove-brick testvol 10.70.34.85:/rhs/brick1/br3 status
Node    Rebalanced-files   size    scanned  failures   status  run-time in secs
localhost     0            0Bytes   0         0      not started    0.00
localhost     0            0Bytes   0         0      not started    0.00
10.70.34.85   88          880.0MB   250       0      completed      16.00


Actual results:
=================== 

Remove brick status displays local host twice 

Expected results:
================ 
Status should show one entry per host 


Additional info:
================= 
gluster peer status
Number of Peers: 2

Hostname: 10.70.34.86
Uuid: e33a6ffa-969d-4b84-8e40-1274aab4be80
State: Peer in Cluster (Connected)

Hostname: 10.70.34.85
Uuid: 800e7dbd-2f0d-4d43-af18-16a13142466f
State: Peer in Cluster (Connected)

Comment 2 senaik 2013-06-28 11:40:06 UTC

sosreports : 
http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/979376/

Comment 3 Amar Tumballi 2013-11-26 07:19:48 UTC

Kaushal, can you see if its still the issue ? I don't remember seeing this in any of the recent builds. If not seen, can you move the bug to ON_QA?

Comment 4 Kaushal 2013-12-09 10:55:55 UTC

Changes made for bug 1019846 fix this issue. Moving to ON_QA.

Comment 5 Pavithra 2014-01-03 06:29:11 UTC

Can you please verify the doc text for technical accuracy?

Comment 6 Kaushal 2014-01-03 07:09:01 UTC

The doc text looks fine.

Comment 7 senaik 2014-01-16 13:44:26 UTC

Version : glusterfs 3.4.0.55rhs
=======
Previously when one of the nodes was rebooted while remove brick status was in progress and when remove brick status was checked, it showed local host twice. Now only the node where the brick was removed is shown .
In my opinion , when a remove brick operation is started , it should show the nodes FROM where the data is moving TO which node. So the SOURCE and DESTINATION nodes should be shown . 

Steps :
=====

Created a dist volume with 3 bricks and start it 
Mount the file and create some files 
Remove brick 
gluster volume remove-brick dist1 10.70.37.111:/rhs/brick1/e1 start                                                                                                                                  
volume remove-brick start: success
ID: 1e6763f0-4f68-41b1-8bda-786befc80a8a

[root@boo ~]# gluster volume remove-brick dist1 10.70.37.111:/rhs/brick1/e1 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                            10.70.37.111               16       160.0MB            17             0             0          in progress               4.00


Could you please clarify on this .

Comment 8 Kaushal 2014-01-21 07:25:19 UTC

When bricks are removed, the data on it is rebalanced onto the remaining bricks, so there is no exact destination. This is unlike a replace brick where we have explicit source and destination.

For both the processes, the destinations are passive whereas the source is active. The destinations do not need to do anything other than have a running brick. All the work is done by the source, so it is the one collecting the stats for the procedure. In the destinations view, they are just serving requests of another client. Since, only the source contains information specific to the process (rebalance/remove-brick/replace-brick), the status command only give information from the source.

Comment 9 senaik 2014-01-21 09:04:15 UTC

As per comment 3 in https://bugzilla.redhat.com/show_bug.cgi?id=1030932 , layout changes for existing directories for remove brick and it migrates data from the non decommissioned bricks as well, so in this case shouldn't we be showing all the nodes present in the status ?

Comment 10 Kaushal 2014-01-21 09:36:10 UTC

A rebalance process should only be concerned with migrating data from those bricks of the volume which are present on the peer on which the rebalance process is running. In case of remove-brick, the rebalance processes will be launched only on those peers which contained the bricks, so they should only be migrating data from the remove-bricks. But, if those peers also contain other bricks belonging to the volume, it appears that the rebalance processes will also rebalance the data on those bricks (this is incorrect IMO, which is what bug-1030932 implies).
But even then, the processes will only be launched on the peers containing the bricks being removed, so the output of the status command is still correct.

Comment 11 senaik 2014-01-24 08:55:46 UTC

Version : glusterfs 3.4.0.55rhs
=======
Previously when one of the nodes was rebooted while remove brick status was in progress and when remove brick status was checked, it showed local host twice. Now only the node where the brick was removed is shown .
Marking the bug 'Verified'

Comment 13 errata-xmlrpc 2014-02-25 07:32:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html

Note You need to log in before you can comment on or make changes to this bug.