Bug 964164

Summary: gluster volume status output shows rebalance as running, though the rebalance status output shows the tasks as completed
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rejy M Cyriac <rcyriac>
Component: glusterdAssignee: Kaushal <kaushal>
Status: CLOSED ERRATA QA Contact: Rejy M Cyriac <rcyriac>
Severity: medium Docs Contact:
Priority: high    
Version: 2.1CC: dpati, kaushal, nsathyan, rhs-bugs, ssamanta, vagarwal, vbellur
Target Milestone: ---   
Target Release: RHGS 3.0.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.35.1u2rhs Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-09-22 19:28:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rejy M Cyriac 2013-05-17 12:02:24 UTC
Description of problem:
After adding a new brick from a new node, on a 2 node Distribute volume, rebalance was run, and allowed to be completed, as per the output of the rebalance status command. But still the gluster volume status output shows rebalance as running.

Version-Release number of selected component (if applicable):

glusterfs-server-3.4.0.8rhs-1.el6rhs.x86_64
gluster-swift-container-1.4.8-4.el6.noarch
glusterfs-fuse-3.4.0.8rhs-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.8rhs-1.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
vdsm-gluster-4.10.2-4.0.qa5.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-3.4.0.8rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.8rhs-1.el6rhs.x86_64
gluster-swift-object-1.4.8-4.el6.noarch

How reproducible:


Steps to Reproduce:
1.Create 2 node Distribute volume, start using the volume, then add another brick, and run rebalance.

Volume Name: RHEV-RHS_Dist
Type: Distribute
Volume ID: 4e1ba3b2-3d16-4c0d-b131-8aad989af138
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick4/RHEV-RHS_Dist
Brick2: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick4/RHEV-RHS_Dist
Brick3: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick4/RHEV-RHS_Dist
Options Reconfigured:
cluster.subvols-per-directory: 1
storage.owner-gid: 36
storage.owner-uid: 36
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off

2. Check output of rebalance status command for rebalance completion.

#gluster volume rebalance RHEV-RHS_Dist status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                6        10.0GB            23             3      completed           171.00
     rhs-client37.lab.eng.blr.redhat.com                2        25.0GB            19             6      completed           311.00
      rhs-client4.lab.eng.blr.redhat.com                0        0Bytes            18             0      completed             2.00
     rhs-client15.lab.eng.blr.redhat.com                0        0Bytes            18             0      completed             2.00
volume rebalance: RHEV-RHS_Dist: success: 

3. Check output of gluster volume status command, and rebalance is reported as running

[Fri May 17 16:02:30 root@rhs-client45:~ ] #gluster volume status
Status of volume: RHEV-RHS_Dist
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs-client45.lab.eng.blr.redhat.com:/rhs/brick4/R
HEV-RHS_Dist						49163	Y	30387
Brick rhs-client37.lab.eng.blr.redhat.com:/rhs/brick4/R
HEV-RHS_Dist						49163	Y	30010
Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick4/R
HEV-RHS_Dist						49164	Y	30715
NFS Server on localhost					2049	Y	31230
NFS Server on f167208d-13df-4532-897f-0887204a2e39	2049	Y	30725
NFS Server on 9abcd448-f230-411c-9565-8f75a782f56a	2049	Y	30651
NFS Server on 838d97b8-6881-43ba-8f67-b0d17fea74cf	2049	Y	30816
 
           Task                                      ID         Status
           ----                                      --         ------
      Rebalance    f6c0d972-4f79-439a-9570-974b6e7c69d8              3


The whole string of commands is given below.

------------------------------------------------------------------------

[Fri May 17 15:47:22 root@rhs-client45:~ ] #gluster volume info
 
Volume Name: RHEV-RHS_Dist
Type: Distribute
Volume ID: 4e1ba3b2-3d16-4c0d-b131-8aad989af138
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick4/RHEV-RHS_Dist
Brick2: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick4/RHEV-RHS_Dist
Options Reconfigured:
cluster.subvols-per-directory: 1
storage.owner-gid: 36
storage.owner-uid: 36
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off

[Fri May 17 15:47:30 root@rhs-client45:~ ] #gluster volume status
Status of volume: RHEV-RHS_Dist
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs-client45.lab.eng.blr.redhat.com:/rhs/brick4/R
HEV-RHS_Dist						49163	Y	30387
Brick rhs-client37.lab.eng.blr.redhat.com:/rhs/brick4/R
HEV-RHS_Dist						49163	Y	30010
NFS Server on localhost					2049	Y	30398
NFS Server on 9abcd448-f230-411c-9565-8f75a782f56a	2049	Y	29855
NFS Server on 838d97b8-6881-43ba-8f67-b0d17fea74cf	2049	Y	30021
NFS Server on f167208d-13df-4532-897f-0887204a2e39	2049	Y	29917
 
There are no active volume tasks

[Fri May 17 15:49:16 root@rhs-client45:~ ] #gluster volume add-brick RHEV-RHS_Dist  rhs-client15.lab.eng.blr.redhat.com:/rhs/brick4/RHEV-RHS_Dist
volume add-brick: success

[Fri May 17 15:49:44 root@rhs-client45:~ ] #gluster volume info
 
Volume Name: RHEV-RHS_Dist
Type: Distribute
Volume ID: 4e1ba3b2-3d16-4c0d-b131-8aad989af138
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick4/RHEV-RHS_Dist
Brick2: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick4/RHEV-RHS_Dist
Brick3: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick4/RHEV-RHS_Dist
Options Reconfigured:
cluster.subvols-per-directory: 1
storage.owner-gid: 36
storage.owner-uid: 36
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off

[Fri May 17 15:49:52 root@rhs-client45:~ ] #gluster volume status
Status of volume: RHEV-RHS_Dist
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs-client45.lab.eng.blr.redhat.com:/rhs/brick4/R
HEV-RHS_Dist						49163	Y	30387
Brick rhs-client37.lab.eng.blr.redhat.com:/rhs/brick4/R
HEV-RHS_Dist						49163	Y	30010
Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick4/R
HEV-RHS_Dist						49164	Y	30715
NFS Server on localhost					2049	Y	31230
NFS Server on 9abcd448-f230-411c-9565-8f75a782f56a	2049	Y	30651
NFS Server on f167208d-13df-4532-897f-0887204a2e39	2049	Y	30725
NFS Server on 838d97b8-6881-43ba-8f67-b0d17fea74cf	2049	Y	30816
 
There are no active volume tasks

[Fri May 17 15:49:59 root@rhs-client45:~ ] #gluster volume rebalance RHEV-RHS_Dist start
volume rebalance: RHEV-RHS_Dist: success: Starting rebalance on volume RHEV-RHS_Dist has been successful.
ID: f6c0d972-4f79-439a-9570-974b6e7c69d8

[Fri May 17 15:50:52 root@rhs-client45:~ ] #gluster volume status
Status of volume: RHEV-RHS_Dist
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs-client45.lab.eng.blr.redhat.com:/rhs/brick4/R
HEV-RHS_Dist						49163	Y	30387
Brick rhs-client37.lab.eng.blr.redhat.com:/rhs/brick4/R
HEV-RHS_Dist						49163	Y	30010
Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick4/R
HEV-RHS_Dist						49164	Y	30715
NFS Server on localhost					2049	Y	31230
NFS Server on 9abcd448-f230-411c-9565-8f75a782f56a	2049	Y	30651
NFS Server on f167208d-13df-4532-897f-0887204a2e39	2049	Y	30725
NFS Server on 838d97b8-6881-43ba-8f67-b0d17fea74cf	2049	Y	30816
 
           Task                                      ID         Status
           ----                                      --         ------
      Rebalance    f6c0d972-4f79-439a-9570-974b6e7c69d8              1

[Fri May 17 15:51:05 root@rhs-client45:~ ] #gluster volume rebalance RHEV-RHS_Dist status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                5         3.0MB            11             0    in progress            20.00
     rhs-client37.lab.eng.blr.redhat.com                0        0Bytes            10             0    in progress            20.00
      rhs-client4.lab.eng.blr.redhat.com                0        0Bytes            18             0      completed             2.00
     rhs-client15.lab.eng.blr.redhat.com                0        0Bytes            18             0      completed             2.00
volume rebalance: RHEV-RHS_Dist: success: 
[Fri May 17 15:51:12 root@rhs-client45:~ ] #gluster volume rebalance RHEV-RHS_Dist status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                6        10.0GB            23             3      completed           171.00
     rhs-client37.lab.eng.blr.redhat.com                0        0Bytes            10             0    in progress           206.00
      rhs-client4.lab.eng.blr.redhat.com                0        0Bytes            18             0      completed             2.00
     rhs-client15.lab.eng.blr.redhat.com                0        0Bytes            18             0      completed             2.00
volume rebalance: RHEV-RHS_Dist: success: 
[Fri May 17 15:54:18 root@rhs-client45:~ ] #gluster volume rebalance RHEV-RHS_Dist status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                6        10.0GB            23             3      completed           171.00
     rhs-client37.lab.eng.blr.redhat.com                1        15.0GB            14             2    in progress           286.00
      rhs-client4.lab.eng.blr.redhat.com                0        0Bytes            18             0      completed             2.00
     rhs-client15.lab.eng.blr.redhat.com                0        0Bytes            18             0      completed             2.00
volume rebalance: RHEV-RHS_Dist: success: 
[Fri May 17 15:55:38 root@rhs-client45:~ ] #gluster volume rebalance RHEV-RHS_Dist status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                6        10.0GB            23             3      completed           171.00
     rhs-client37.lab.eng.blr.redhat.com                2        25.0GB            19             6      completed           311.00
      rhs-client4.lab.eng.blr.redhat.com                0        0Bytes            18             0      completed             2.00
     rhs-client15.lab.eng.blr.redhat.com                0        0Bytes            18             0      completed             2.00
volume rebalance: RHEV-RHS_Dist: success: 

[Fri May 17 16:02:30 root@rhs-client45:~ ] #gluster volume status
Status of volume: RHEV-RHS_Dist
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs-client45.lab.eng.blr.redhat.com:/rhs/brick4/R
HEV-RHS_Dist						49163	Y	30387
Brick rhs-client37.lab.eng.blr.redhat.com:/rhs/brick4/R
HEV-RHS_Dist						49163	Y	30010
Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick4/R
HEV-RHS_Dist						49164	Y	30715
NFS Server on localhost					2049	Y	31230
NFS Server on f167208d-13df-4532-897f-0887204a2e39	2049	Y	30725
NFS Server on 9abcd448-f230-411c-9565-8f75a782f56a	2049	Y	30651
NFS Server on 838d97b8-6881-43ba-8f67-b0d17fea74cf	2049	Y	30816
 
           Task                                      ID         Status
           ----                                      --         ------
      Rebalance    f6c0d972-4f79-439a-9570-974b6e7c69d8              3

[Fri May 17 16:11:09 root@rhs-client45:~ ] #gluster volume rebalance RHEV-RHS_Dist status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                6        10.0GB            23             3      completed           171.00
     rhs-client37.lab.eng.blr.redhat.com                2        25.0GB            19             6      completed           311.00
      rhs-client4.lab.eng.blr.redhat.com                0        0Bytes            18             0      completed             2.00
     rhs-client15.lab.eng.blr.redhat.com                0        0Bytes            18             0      completed             2.00
volume rebalance: RHEV-RHS_Dist: success: 

[Fri May 17 16:11:23 root@rhs-client45:~ ] #gluster volume status
Status of volume: RHEV-RHS_Dist
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs-client45.lab.eng.blr.redhat.com:/rhs/brick4/R
HEV-RHS_Dist						49163	Y	30387
Brick rhs-client37.lab.eng.blr.redhat.com:/rhs/brick4/R
HEV-RHS_Dist						49163	Y	30010
Brick rhs-client15.lab.eng.blr.redhat.com:/rhs/brick4/R
HEV-RHS_Dist						49164	Y	30715
NFS Server on localhost					2049	Y	31230
NFS Server on 838d97b8-6881-43ba-8f67-b0d17fea74cf	2049	Y	30816
NFS Server on f167208d-13df-4532-897f-0887204a2e39	2049	Y	30725
NFS Server on 9abcd448-f230-411c-9565-8f75a782f56a	2049	Y	30651
 
           Task                                      ID         Status
           ----                                      --         ------
      Rebalance    f6c0d972-4f79-439a-9570-974b6e7c69d8              3

[Fri May 17 16:20:43 root@rhs-client45:~ ] #gluster volume rebalance RHEV-RHS_Dist status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                6        10.0GB            23             3      completed           171.00
     rhs-client37.lab.eng.blr.redhat.com                2        25.0GB            19             6      completed           311.00
      rhs-client4.lab.eng.blr.redhat.com                0        0Bytes            18             0      completed             2.00
     rhs-client15.lab.eng.blr.redhat.com                0        0Bytes            18             0      completed             2.00
volume rebalance: RHEV-RHS_Dist: success: 
[Fri May 17 16:20:46 root@rhs-client45:~ ] #

------------------------------------------------------------------------

  
Actual results:

Inconsistency on status of the rebalance task in the output of the two commands -
gluster volume status

gluster volume rebalance <VOLUME-NAME> status

Expected results:

All status commands must be consistent and accurate in reporting status of tasks.

Additional info:

Comment 2 Kaushal 2013-10-17 11:27:54 UTC
The rebalance status being shown in 'volume status' is completed. The confusion in this case was because the rebalance status was being shown as an index number instead of a string. '3' is completed, whereas '1' is started/running.

With changes done for bug 955611, now 'volume status' proper strings instead of an unknown index. With this, instead of getting '3' as the rebalance status in 'volume status', you'd get 'completed'.

Moving this to MODIFIED as the changes introduced for 955611 shouldn't cause this confusion anymore.

Comment 3 Kaushal 2013-10-22 10:32:57 UTC
The fix is available is glusterfs-3.4.0.35.1u2rhs.

Comment 4 Rejy M Cyriac 2013-12-02 08:54:14 UTC
Verified on glusterfs-server-3.4.0.44.1u2rhs-1.el6rhs.x86_64

The outputs are much clearer now. Examples of current outputs given below.

....................................................................
# gluster volume status
Status of volume: revol
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs1:/srv/rhs/brick1/revol	49152	Y	6835
Brick rhs2:/srv/rhs/brick1/revol	49152	Y	6738
NFS Server on localhost			2049	Y	6847
NFS Server on rhs4			2049	Y	7583
NFS Server on rhs3			2049	Y	6686
NFS Server on rhs2			2049	Y	6750
 
Task Status of Volume revol
------------------------------------------------------------------------------
There are no active volume tasks

....

# gluster volume status
Status of volume: revol
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs1:/srv/rhs/brick1/revol	49152	Y	6835
Brick rhs2:/srv/rhs/brick1/revol	49152	Y	6738
Brick rhs3:/srv/rhs/brick2/revol	49152	Y	6820
NFS Server on localhost			2049	Y	7045
NFS Server on rhs4			2049	Y	7715
NFS Server on rhs3			2049	Y	6832
NFS Server on rhs2			2049	Y	6907
 
Task Status of Volume revol
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 62cff49e-cd7f-417e-a1ff-7bfcd245203b
Status               : in progress         

....

# gluster volume status
Status of volume: revol
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs1:/srv/rhs/brick1/revol	49152	Y	6835
Brick rhs2:/srv/rhs/brick1/revol	49152	Y	6738
Brick rhs3:/srv/rhs/brick2/revol	49152	Y	6820
NFS Server on localhost			2049	Y	7045
NFS Server on rhs4			2049	Y	7715
NFS Server on rhs3			2049	Y	6832
NFS Server on rhs2			2049	Y	6907
 
Task Status of Volume revol
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 62cff49e-cd7f-417e-a1ff-7bfcd245203b
Status               : completed
....................................................................

Comment 6 errata-xmlrpc 2014-09-22 19:28:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html