Previously, status from a previous remove-brick or Rebalance operation was not reset before starting a new remove-brick or Rebalance operation. As a result, remove-brick status displayed the output of a previous Rebalance operation on those nodes which did not participate in an ongoing remove-brick operation. With this update, the status of the remove-brick or Rebalance operation is set to NOT-STARTED before starting remove-brick or Rebalance operations again on all the nodes in the cluster.
Description of problem:
For the nodes which does not participate in remove brick , remove brick status gives the ouptut of rebalance.
Version-Release number of selected component (if applicable):
glusterfs-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-libs-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.35.1u2rhs-1.el6rhs.x86_64
samba-glusterfs-3.6.9-160.3.el6rhs.x86_64
How reproducible:
Always
Steps to Reproduce:
1. Create a distribute volume with 2 bricks.
2. start rebalance on the volume and stop it.
3. Now rebalance status shows stopped for the nodes where rebalance was running. Following is the ouput for the same.
[root@localhost ~]# gluster vol rebalance vol_dis_rep status
Node Rebalanced-files size scanned failures skipped status run time in secs
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 2 2.0GB 61 0 20 completed 66.00
10.70.37.140 0 0Bytes 60 0 0 completed 0.00
10.70.37.75 0 0Bytes 0 0 0 not started 0.00
10.70.37.43 0 0Bytes 0 0 0 stopped 0.00
volume rebalance: vol_dis_rep: success:
4. Now start remove brick.
5. Once started check the ouput. The following is what it displays.
[root@localhost ~]# gluster vol remove-brick vol_dis_rep 10.70.37.108:/rhs/brick3/b5 10.70.37.140:/rhs/brick3/b6 status
Node Rebalanced-files size scanned failures skipped status run-time in secs
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 2 2.0GB 61 0 0 completed 66.00
10.70.37.140 0 0Bytes 60 0 0 completed 0.00
10.70.37.75 0 0Bytes 0 0 0 not started 0.00
10.70.37.43 0 0Bytes 0 0 0 stopped 0.00
Actual results:
For the nodes on which remove brick is not started, it shows ouput of rebalance status.
Expected results:
For the nodes on which remove brick is not started it should show status as "notstarted" or the nodes which does not participate in the remove brick should not be shown in the status.
Additional info:
Remove-brick status output still have the nodes where rebalance was stopped.
Attaching the screenshot for the same.
From screenshot 8, it is clear that rebalance was stopped on localhost and 10.70.37.182.
From screenshot 9, it is clear that remove-brick was started on 10.70.37.177 and 10.70.37.109.
10.70.37.182 was not at all participating in remove-brick but it was still shown in the remove-brick status which is incorrect.
So moving this back to assgined.
verified and works fine with glusterfs-server-3.4.0.50rhs-1.el6rhs.x86_64.
Now remove-brick does not give the output of rebalance. Remove-brick status shows only the nodes which participates in remove-brick.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
http://rhn.redhat.com/errata/RHEA-2014-0208.html