Bug 1154316

Summary: DHT-Rebalance:-Rebalance running on a node will never receive stop command if glusterd has been restarted on the same node while rebalance is in progress
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: shylesh <shmohan>
Component: distributeAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED DEFERRED QA Contact: shylesh <shmohan>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.1CC: spalai, surs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1286100 (view as bug list) Environment:
Last Closed: 2015-11-27 10:48:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1286100    

Description shylesh 2014-10-18 17:38:23 UTC
Description of problem:
While rebalance is in progress if glusterd restarts on a node and later if we try to stop the rebalance on the same node it will not stop.

Version-Release number of selected component (if applicable):
3.4.0.69rhs-1.el6rhs.x86_64

How reproducible:
Always

Steps to Reproduce:
1.created a dist-rep volume
2.created some data and add-brick
3.start rebalance
4. while rebalance is in progress on of the node restart glusterd
5. once glusterd is restarted stop the rebalance process


Actual results:
The node on which glusterd was restarted rebalance process will not stop and it will run to completion eventually, it never receives the stop command
 
[root@rhs-client4 mnt]# gluster v rebalance distrep start
volume rebalance: distrep: success: Starting rebalance on volume distrep has been successful.
ID: 5e0229e9-ae13-4da5-8701-21486de873cd
[root@rhs-client4 mnt]# gluster v rebalance distrep status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes          2342             0             0          in progress              10.00
     rhs-client39.lab.eng.blr.redhat.com                0        0Bytes          2468             0             0          in progress              10.00
      rhs-gp-srv2.lab.eng.blr.redhat.com              249         1.8MB          2092             0             0          in progress              10.00
volume rebalance: distrep: success:
[root@rhs-client4 mnt]# service glusterd restart
Stopping glusterd:                                         [  OK  ]
Starting glusterd:                                         [  OK  ]
[root@rhs-client4 mnt]# gluster v rebalance distrep status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0          in progress               0.00
     rhs-client39.lab.eng.blr.redhat.com              163       954.1KB          5376             0             0          in progress              30.00
      rhs-gp-srv2.lab.eng.blr.redhat.com              733         4.5MB          4746             0             0          in progress              31.00
volume rebalance: distrep: success:
[root@rhs-client4 mnt]# gluster v rebalance distrep stop
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0          in progress               0.00
     rhs-client39.lab.eng.blr.redhat.com              200       986.8KB          5608             0             0              stopped              34.00
      rhs-gp-srv2.lab.eng.blr.redhat.com              854        12.4MB          5203             0             0              stopped              34.00
volume rebalance: distrep: success: rebalance process may be in the middle of a file migration.
The process will be fully stopped once the migration of the file is complete.
Please check rebalance process for completion before doing any further brick related tasks on the volume.
[root@rhs-client4 mnt]# gluster v rebalance distrep status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0          in progress               0.00
     rhs-client39.lab.eng.blr.redhat.com              201       986.9KB          5608             3             0              stopped              34.00
      rhs-gp-srv2.lab.eng.blr.redhat.com              854        12.4MB          5203             2             0              stopped              34.00
volume rebalance: distrep: success:
[root@rhs-client4 mnt]# gluster v rebalance distrep stop
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0          in progress               0.00
     rhs-client39.lab.eng.blr.redhat.com              201       986.9KB          5608             3             0              stopped              34.00
      rhs-gp-srv2.lab.eng.blr.redhat.com              854        12.4MB          5203             2             0              stopped              34.00
volume rebalance: distrep: success: rebalance process may be in the middle of a file migration.
The process will be fully stopped once the migration of the file is complete.
Please check rebalance process for completion before doing any further brick related tasks on the volume.

Comment 2 Susant Kumar Palai 2015-11-27 10:48:23 UTC
Cloning to 3.1. To be fixed in future release.