Bug 1296796 - [DHT]: Rebalance info for remove brick operation is not showing after glusterd restart
[DHT]: Rebalance info for remove brick operation is not showing after glust...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute (Show other bugs)
3.1
Unspecified Unspecified
unspecified Severity unspecified
: ---
: RHGS 3.2.0
Assigned To: Nithya Balachandran
Prasad Desala
dht-remove-brick, dht-3.2.0-proposed
:
Depends On:
Blocks: 1351021 1351522 1352771
  Show dependency treegraph
 
Reported: 2016-01-08 00:04 EST by Byreddy
Modified: 2017-03-23 01:26 EDT (History)
11 users (show)

See Also:
Fixed In Version: glusterfs-3.8.4-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1351021 (view as bug list)
Environment:
Last Closed: 2017-03-23 01:26:22 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Byreddy 2016-01-08 00:04:34 EST
Description of problem:
=======================
Had two node cluster (node-1 and node-2)  with Distributed volume (1*2), mounted it as fuse  and started IO, during IO in progress, started remove brick operation and restart glusterd on the node which is hosting the brick to remove,
after glusterd restart there is not rebalance info displaying like "Rebalanced-files,     size,       scanned" all the things it's showing as zeros.



Version-Release number of selected component (if applicable):
==============================================================
glusterfs-3.7.5-14


How reproducible:
=================
Always

Steps to Reproduce:
===================
1.Have a two node cluster (node-1 and node-2)
2.Create a Distributed volume using both the node bricks (1*2)
3.Mounted the volume as Fuse and start IO
4. When IO is in progress, start the remove brick of node-2.
5. Check the remove brick status // it will show the rebalance info
6. Stop and start the glusterd on node-2
7. Check the remove brick status again on both the nodes //it won't show the rebalance info.

Actual results:
===============
No rebalance info displaying after glusterd restart


Expected results:
=================
It should show Rebalance info even after glusterd restart.



Additional info:
Comment 6 Bhaskarakiran 2016-01-12 06:00:42 EST
To add to this, detach-tier status doesn't show any stats though the log shows the progress of files being migrated

[root@transformers ~]# gluster v detach-tier dpvol status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
         tettnang.lab.eng.blr.redhat.com                0        0Bytes             0             0             0          in progress               0.00
[root@transformers ~]#
Comment 9 Atin Mukherjee 2016-01-29 22:25:54 EST
RCA:

remove brick operation when in progress is determined by a flag 'decommission_is_in_progress' in volume. This flag doesn't get persisted though and because of which on a glusterd restart the information is lost and all such validations of blocking remove brick commit when rebalance is in progress is skipped through. I agree with QE that this is a potential data loss situation and should be considered as *blocker*.

I've posted a fix in upstream http://review.gluster.org/#/c/13323/
Comment 10 Atin Mukherjee 2016-01-29 23:08:00 EST
Oops, I made a mistake here, I was supposed to put this analysis for another bug. Please ignore #comment 9. Moving back status to 'New'
Comment 14 Atin Mukherjee 2016-09-17 11:06:04 EDT
Upstream mainline : http://review.gluster.org/14827
Upstream 3.8 : http://review.gluster.org/14856

And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.
Comment 17 Prasad Desala 2016-10-18 01:23:42 EDT
Verified this BZ against glusterfs version: 3.8.4-2.el7rhgs.x86_64.

Here are the steps that were performed,
1) Created a Distributed replica volume and started it.
2) FUSE Mounted the volume and start IO.
3) While IO is in progress, started removing few bricks.
4) Checked the remove brick status and it is showing rebalance info.
5) Stopped and started the glusterd on the nodes where bricks were removed.
6) Checked the remove brick status again on all the nodes and the rebalance info is being displayed.

Hence, moving this BZ to Verified.
Comment 19 errata-xmlrpc 2017-03-23 01:26:22 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.