1036464 – Rebalance status does not give the correct ouput and rebalance starts automatically when glusterd is made down and made up after a while.

Bug 1036464 - Rebalance status does not give the correct ouput and rebalance starts automatically when glusterd is made down and made up after a while.

Summary: Rebalance status does not give the correct ouput and rebalance starts automat...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Kaushal
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1015045 1023921
TreeView+	depends on / blocked

Reported:	2013-12-02 04:17 UTC by Kaushal
Modified:	2014-11-11 08:25 UTC (History)
CC List:	11 users (show)
Fixed In Version:	glusterfs-3.6.0beta1
Clone Of:	1023921
Environment:
Last Closed:	2014-11-11 08:25:01 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Kaushal 2013-12-02 04:17:49 UTC

+++ This bug was initially created as a clone of Bug #1023921 +++

Description of problem:
Rebalance status does not give the correct ouput when glusterd is made down and made up after a while.

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-libs-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.35.1u2rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.35.1u2rhs-1.el6rhs.x86_64
samba-glusterfs-3.6.9-160.3.el6rhs.x86_64


How reproducible:
Always

Steps to Reproduce:
1. Create a distribute volume with 2 bricks.
2. Stop glusterd in one of the node.
3. start rebalance on the volume created.
4. Now check for the rebalance status using the command "gluster vol rebalance <vol_Name> status". The following is seen in the output.
[root@localhost ~]# gluster vol rebalance vol_dis status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             1             0               failed               0.00
                             10.70.37.43                0        0Bytes             0             1             0               failed               0.00
                             10.70.37.75                0        0Bytes             0             1             0               failed               0.00
volume rebalance: vol_dis: success: 

5. Now make the glusterd up in the node, where it was stopped.
6. Now check the rebalance status again. The following is seen in the ouput.

[root@localhost ~]# gluster vol rebalance vol_dis status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes            10             0             0            completed               0.00
                             10.70.37.43                0        0Bytes            10             0             0            completed               0.00
                             10.70.37.75                0        0Bytes            10             0             3            completed               0.00
                            10.70.37.108                0        0Bytes            10             0             2            completed               0.00
volume rebalance: vol_dis: success: 


Actual results:
The rebalance status it shows is, prior to the one when glusterd was made down.

Expected results:
It should always show the last run rebalance output.

Additional info:

--- Additional comment from RHEL Product and Program Management on 2013-10-28 16:24:45 IST ---

Since this issue was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.


--- Additional comment from RamaKasturi on 2013-10-29 18:10:37 IST ---

Above issue is not seen in glusterfs update1.

1) following is the ouput when glusterd was made down and rebalance was run

[root@localhost ~]# gluster vol rebalance vol_dis status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             1             0               failed               0.00
                             10.70.34.85                0        0Bytes             0             1             0               failed               0.00
                             10.70.34.86                0        0Bytes             0             1             0               failed               0.00
volume rebalance: vol_dis: success: 

2) Following is the ouput seen when glusterd is made up and checked for the status using the command "gluster vol rebalance vol_dis status"

[root@localhost ~]# gluster vol rebalance vol_dis status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             1             0               not started               0.00
                             10.70.37.43                0        0Bytes             0             1             0               failed               0.00
                             10.70.37.75                0        0Bytes             0             1             0               failed               0.00

volume rebalance: vol_dis: success:

--- Additional comment from RamaKasturi on 2013-10-29 19:02:58 IST ---

The following is also seen when doing the above steps.

1. Create a distribute volume with 2 bircks.
2. Now add a brick to the volume.
3. Stop glusterd in one of the node and start rebalance.
4. The following is the ouput seen 

[root@localhost ~]# gluster vol rebalance vol_dis status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             1             0               failed               0.00
                             10.70.37.43                0        0Bytes             0             1             0               failed               0.00
                             10.70.37.75                0        0Bytes             0             1             0               failed               0.00
volume rebalance: vol_dis: success: 

5. Now start glusterd in the node and check for the status . The following output comes.

[root@localhost ~]# gluster vol rebalance vol_dis status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes            42             0            15          in progress              17.00
                             10.70.37.43                0        0Bytes            60             0             2            completed               0.00
                             10.70.37.75                0        0Bytes            60             0             0            completed               0.00
                            10.70.37.108                0        0Bytes             1             0             0          in progress              17.00
volume rebalance: vol_dis: success: 

Actual results:

After doing step 5 , rebalance process starts automatically which it should not.

--- Additional comment from Dusmant on 2013-10-30 15:35:20 IST ---

Needed by RHSC

--- Additional comment from Kaushal on 2013-11-28 10:09:01 IST ---

Taking the bug under my name as I'm actively working on this right now. I should have done this earlier, but since I was the only one working on the RHSC dependencies at that time, I left it at that. My mistake.

Comment 1 Anand Avati 2013-12-02 04:29:46 UTC

REVIEW: http://review.gluster.org/6334 (glusterd: Improve rebalance handling during volume sync) posted (#2) for review on master by Kaushal M (kaushal)

Comment 2 Anand Avati 2013-12-02 06:00:35 UTC

REVIEW: http://review.gluster.org/6334 (glusterd: Improve rebalance handling during volume sync) posted (#3) for review on master by Kaushal M (kaushal)

Comment 3 Anand Avati 2013-12-03 09:39:28 UTC

REVIEW: http://review.gluster.org/6334 (glusterd: Improve rebalance handling during volume sync) posted (#4) for review on master by Kaushal M (kaushal)

Comment 4 Anand Avati 2013-12-04 10:14:30 UTC

REVIEW: http://review.gluster.org/6334 (glusterd: Improve rebalance handling during volume sync) posted (#5) for review on master by Kaushal M (kaushal)

Comment 5 Anand Avati 2013-12-05 03:56:07 UTC

REVIEW: http://review.gluster.org/6334 (glusterd: Improve rebalance handling during volume sync) posted (#6) for review on master by Kaushal M (kaushal)

Comment 6 Anand Avati 2013-12-11 06:07:09 UTC

REVIEW: http://review.gluster.org/6334 (glusterd: Improve rebalance handling during volume sync) posted (#7) for review on master by Kaushal M (kaushal)

Comment 7 Anand Avati 2013-12-11 07:30:17 UTC

COMMIT: http://review.gluster.org/6334 committed in master by Vijay Bellur (vbellur) 
------
commit cb44756616f2ef9a6480adf104efa108300b06c3
Author: Kaushal M <kaushal>
Date:   Fri Nov 22 11:27:14 2013 +0530

    glusterd: Improve rebalance handling during volume sync
    
    Glusterd will now correctly copy existing rebalance information when a
    volinfo is updated during volume sync. If the existing rebalance
    information was stale, then any existing rebalance process will be
    termimnated. A new rebalance process will be started only if there is no
    existing rebalance process. The rebalance process will not be started if
    the existing rebalance session had completed, failed or been stopped.
    
    Change-Id: I68c5984267c188734da76770ba557662d4ea3ee0
    BUG: 1036464
    Signed-off-by: Kaushal M <kaushal>
    Reviewed-on: http://review.gluster.org/6334
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 8 Anand Avati 2013-12-23 08:59:38 UTC

REVIEW: http://review.gluster.org/6564 (glusterd: Improve rebalance handling during volume sync) posted (#1) for review on release-3.5 by Krishnan Parthasarathi (kparthas)

Comment 9 Anand Avati 2013-12-23 14:58:00 UTC

COMMIT: http://review.gluster.org/6564 committed in release-3.5 by Vijay Bellur (vbellur) 
------
commit b07107511c51ae518a1a952ff9c223673cd218a8
Author: Krishnan Parthasarathi <kparthas>
Date:   Mon Dec 23 14:07:50 2013 +0530

    glusterd: Improve rebalance handling during volume sync
    
            Backport of http://review.gluster.org/6334
    
    Glusterd will now correctly copy existing rebalance information when a
    volinfo is updated during volume sync. If the existing rebalance
    information was stale, then any existing rebalance process will be
    termimnated. A new rebalance process will be started only if there is no
    existing rebalance process. The rebalance process will not be started if
    the existing rebalance session had completed, failed or been stopped.
    
    Change-Id: I68c5984267c188734da76770ba557662d4ea3ee0
    BUG: 1036464
    Signed-off-by: Kaushal M <kaushal>
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/6564
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 10 Niels de Vos 2014-09-22 12:33:08 UTC

A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 11 Niels de Vos 2014-11-11 08:25:01 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users

Note You need to log in before you can comment on or make changes to this bug.