Bug 1296796 - [DHT]: Rebalance info for remove brick operation is not showing after glusterd restart
Summary: [DHT]: Rebalance info for remove brick operation is not showing after glust...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.2.0
Assignee: Nithya Balachandran
QA Contact: Prasad Desala
URL:
Whiteboard: dht-remove-brick, dht-3.2.0-proposed
Depends On:
Blocks: 1351021 1351522 1352771
TreeView+ depends on / blocked
 
Reported: 2016-01-08 05:04 UTC by Byreddy
Modified: 2017-03-23 05:26 UTC (History)
11 users (show)

Fixed In Version: glusterfs-3.8.4-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1351021 (view as bug list)
Environment:
Last Closed: 2017-03-23 05:26:22 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description Byreddy 2016-01-08 05:04:34 UTC
Description of problem:
=======================
Had two node cluster (node-1 and node-2)  with Distributed volume (1*2), mounted it as fuse  and started IO, during IO in progress, started remove brick operation and restart glusterd on the node which is hosting the brick to remove,
after glusterd restart there is not rebalance info displaying like "Rebalanced-files,     size,       scanned" all the things it's showing as zeros.



Version-Release number of selected component (if applicable):
==============================================================
glusterfs-3.7.5-14


How reproducible:
=================
Always

Steps to Reproduce:
===================
1.Have a two node cluster (node-1 and node-2)
2.Create a Distributed volume using both the node bricks (1*2)
3.Mounted the volume as Fuse and start IO
4. When IO is in progress, start the remove brick of node-2.
5. Check the remove brick status // it will show the rebalance info
6. Stop and start the glusterd on node-2
7. Check the remove brick status again on both the nodes //it won't show the rebalance info.

Actual results:
===============
No rebalance info displaying after glusterd restart


Expected results:
=================
It should show Rebalance info even after glusterd restart.



Additional info:

Comment 6 Bhaskarakiran 2016-01-12 11:00:42 UTC
To add to this, detach-tier status doesn't show any stats though the log shows the progress of files being migrated

[root@transformers ~]# gluster v detach-tier dpvol status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
         tettnang.lab.eng.blr.redhat.com                0        0Bytes             0             0             0          in progress               0.00
[root@transformers ~]#

Comment 9 Atin Mukherjee 2016-01-30 03:25:54 UTC
RCA:

remove brick operation when in progress is determined by a flag 'decommission_is_in_progress' in volume. This flag doesn't get persisted though and because of which on a glusterd restart the information is lost and all such validations of blocking remove brick commit when rebalance is in progress is skipped through. I agree with QE that this is a potential data loss situation and should be considered as *blocker*.

I've posted a fix in upstream http://review.gluster.org/#/c/13323/

Comment 10 Atin Mukherjee 2016-01-30 04:08:00 UTC
Oops, I made a mistake here, I was supposed to put this analysis for another bug. Please ignore #comment 9. Moving back status to 'New'

Comment 14 Atin Mukherjee 2016-09-17 15:06:04 UTC
Upstream mainline : http://review.gluster.org/14827
Upstream 3.8 : http://review.gluster.org/14856

And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.

Comment 17 Prasad Desala 2016-10-18 05:23:42 UTC
Verified this BZ against glusterfs version: 3.8.4-2.el7rhgs.x86_64.

Here are the steps that were performed,
1) Created a Distributed replica volume and started it.
2) FUSE Mounted the volume and start IO.
3) While IO is in progress, started removing few bricks.
4) Checked the remove brick status and it is showing rebalance info.
5) Stopped and started the glusterd on the nodes where bricks were removed.
6) Checked the remove brick status again on all the nodes and the rebalance info is being displayed.

Hence, moving this BZ to Verified.

Comment 19 errata-xmlrpc 2017-03-23 05:26:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.