Bug 1296796

Summary:	[DHT]: Rebalance info for remove brick operation is not showing after glusterd restart
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Byreddy <bsrirama>
Component:	distribute	Assignee:	Nithya Balachandran <nbalacha>
Status:	CLOSED ERRATA	QA Contact:	Prasad Desala <tdesala>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.1	CC:	amukherj, bsrirama, kramdoss, mzywusko, nbalacha, rcyriac, rgowdapp, rhinduja, sasundar, smohan, tdesala
Target Milestone:	---
Target Release:	RHGS 3.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	dht-remove-brick, dht-3.2.0-proposed
Fixed In Version:	glusterfs-3.8.4-1	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1351021 (view as bug list)		Environment:
Last Closed:	2017-03-23 05:26:22 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1351021, 1351522, 1352771

Description Byreddy 2016-01-08 05:04:34 UTC

Description of problem:
=======================
Had two node cluster (node-1 and node-2)  with Distributed volume (1*2), mounted it as fuse  and started IO, during IO in progress, started remove brick operation and restart glusterd on the node which is hosting the brick to remove,
after glusterd restart there is not rebalance info displaying like "Rebalanced-files,     size,       scanned" all the things it's showing as zeros.



Version-Release number of selected component (if applicable):
==============================================================
glusterfs-3.7.5-14


How reproducible:
=================
Always

Steps to Reproduce:
===================
1.Have a two node cluster (node-1 and node-2)
2.Create a Distributed volume using both the node bricks (1*2)
3.Mounted the volume as Fuse and start IO
4. When IO is in progress, start the remove brick of node-2.
5. Check the remove brick status // it will show the rebalance info
6. Stop and start the glusterd on node-2
7. Check the remove brick status again on both the nodes //it won't show the rebalance info.

Actual results:
===============
No rebalance info displaying after glusterd restart


Expected results:
=================
It should show Rebalance info even after glusterd restart.



Additional info:

Comment 6 Bhaskarakiran 2016-01-12 11:00:42 UTC

To add to this, detach-tier status doesn't show any stats though the log shows the progress of files being migrated

[root@transformers ~]# gluster v detach-tier dpvol status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
         tettnang.lab.eng.blr.redhat.com                0        0Bytes             0             0             0          in progress               0.00
[root@transformers ~]#

Comment 9 Atin Mukherjee 2016-01-30 03:25:54 UTC

RCA:

remove brick operation when in progress is determined by a flag 'decommission_is_in_progress' in volume. This flag doesn't get persisted though and because of which on a glusterd restart the information is lost and all such validations of blocking remove brick commit when rebalance is in progress is skipped through. I agree with QE that this is a potential data loss situation and should be considered as *blocker*.

I've posted a fix in upstream http://review.gluster.org/#/c/13323/

Comment 10 Atin Mukherjee 2016-01-30 04:08:00 UTC

Oops, I made a mistake here, I was supposed to put this analysis for another bug. Please ignore #comment 9. Moving back status to 'New'

Comment 14 Atin Mukherjee 2016-09-17 15:06:04 UTC

Upstream mainline : http://review.gluster.org/14827
Upstream 3.8 : http://review.gluster.org/14856

And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.

Comment 17 Prasad Desala 2016-10-18 05:23:42 UTC

Verified this BZ against glusterfs version: 3.8.4-2.el7rhgs.x86_64.

Here are the steps that were performed,
1) Created a Distributed replica volume and started it.
2) FUSE Mounted the volume and start IO.
3) While IO is in progress, started removing few bricks.
4) Checked the remove brick status and it is showing rebalance info.
5) Stopped and started the glusterd on the nodes where bricks were removed.
6) Checked the remove brick status again on all the nodes and the rebalance info is being displayed.

Hence, moving this BZ to Verified.

Comment 19 errata-xmlrpc 2017-03-23 05:26:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html