1296796 – [DHT]: Rebalance info for remove brick operation is not showing after glusterd restart

Bug 1296796 - [DHT]: Rebalance info for remove brick operation is not showing after glusterd restart

Summary: [DHT]: Rebalance info for remove brick operation is not showing after glust...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	distribute
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Nithya Balachandran
QA Contact:	Prasad Desala
Docs Contact:
URL:
Whiteboard:	dht-remove-brick, dht-3.2.0-proposed
Depends On:
Blocks:	1351021 1351522 1352771
TreeView+	depends on / blocked

Reported:	2016-01-08 05:04 UTC by Byreddy
Modified:	2017-03-23 05:26 UTC (History)
CC List:	11 users (show)
Fixed In Version:	glusterfs-3.8.4-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1351021 (view as bug list)
Environment:
Last Closed:	2017-03-23 05:26:22 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Byreddy 2016-01-08 05:04:34 UTC

Description of problem:
=======================
Had two node cluster (node-1 and node-2)  with Distributed volume (1*2), mounted it as fuse  and started IO, during IO in progress, started remove brick operation and restart glusterd on the node which is hosting the brick to remove,
after glusterd restart there is not rebalance info displaying like "Rebalanced-files,     size,       scanned" all the things it's showing as zeros.



Version-Release number of selected component (if applicable):
==============================================================
glusterfs-3.7.5-14


How reproducible:
=================
Always

Steps to Reproduce:
===================
1.Have a two node cluster (node-1 and node-2)
2.Create a Distributed volume using both the node bricks (1*2)
3.Mounted the volume as Fuse and start IO
4. When IO is in progress, start the remove brick of node-2.
5. Check the remove brick status // it will show the rebalance info
6. Stop and start the glusterd on node-2
7. Check the remove brick status again on both the nodes //it won't show the rebalance info.

Actual results:
===============
No rebalance info displaying after glusterd restart


Expected results:
=================
It should show Rebalance info even after glusterd restart.



Additional info:

Comment 6 Bhaskarakiran 2016-01-12 11:00:42 UTC

To add to this, detach-tier status doesn't show any stats though the log shows the progress of files being migrated

[root@transformers ~]# gluster v detach-tier dpvol status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
         tettnang.lab.eng.blr.redhat.com                0        0Bytes             0             0             0          in progress               0.00
[root@transformers ~]#

Comment 9 Atin Mukherjee 2016-01-30 03:25:54 UTC

RCA:

remove brick operation when in progress is determined by a flag 'decommission_is_in_progress' in volume. This flag doesn't get persisted though and because of which on a glusterd restart the information is lost and all such validations of blocking remove brick commit when rebalance is in progress is skipped through. I agree with QE that this is a potential data loss situation and should be considered as *blocker*.

I've posted a fix in upstream http://review.gluster.org/#/c/13323/

Comment 10 Atin Mukherjee 2016-01-30 04:08:00 UTC

Oops, I made a mistake here, I was supposed to put this analysis for another bug. Please ignore #comment 9. Moving back status to 'New'

Comment 14 Atin Mukherjee 2016-09-17 15:06:04 UTC

Upstream mainline : http://review.gluster.org/14827
Upstream 3.8 : http://review.gluster.org/14856

And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.

Comment 17 Prasad Desala 2016-10-18 05:23:42 UTC

Verified this BZ against glusterfs version: 3.8.4-2.el7rhgs.x86_64.

Here are the steps that were performed,
1) Created a Distributed replica volume and started it.
2) FUSE Mounted the volume and start IO.
3) While IO is in progress, started removing few bricks.
4) Checked the remove brick status and it is showing rebalance info.
5) Stopped and started the glusterd on the nodes where bricks were removed.
6) Checked the remove brick status again on all the nodes and the rebalance info is being displayed.

Hence, moving this BZ to Verified.

Comment 19 errata-xmlrpc 2017-03-23 05:26:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.