Bug 956188

Summary:	DHT - rebalance - 'gluster volume rebalance <volname> status' shows 2 entry ( 2 rows) for one host
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Rachana Patel <racpatel>
Component:	distribute	Assignee:	Nithya Balachandran <nbalacha>
Status:	CLOSED EOL	QA Contact:	Matt Zywusko <mzywusko>
Severity:	medium	Docs Contact:
Priority:	high
Version:	2.1	CC:	khoi.mai2008, nsathyan, pkarampu, rhs-bugs, rwheeler, sdharane, spalai, vagarwal, vbellur
Target Milestone:	---	Keywords:	ZStream
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	963524 1286068 1286069 1286071 1286072 (view as bug list)		Environment:
Last Closed:	2015-11-27 10:30:38 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	963524, 1286068, 1286069, 1286071, 1286072

Description Rachana Patel 2013-04-24 12:26:19 UTC

Description of problem:
DHT - rebalance - 'gluster volume rebalance <volname> status' shows 2 entry ( 2 rows) for one host

Version-Release number of selected component (if applicable):
3.4.0.1rhs-1.el6rhs.x86_64

How reproducible:
always


Steps to Reproduce:
Steps to Reproduce:
1. Create a Distributed volume having 2 or more sub-volume and start the volume.

2. Fuse Mount the volume from the client-1 using “mount -t glusterfs  server:/<volume> <client-1_mount_point>”

mount -t glusterfs XXX:/<volname> /mnt/XXX

3. From mount point create some files and perform rename operation.
or add brick to that volume or change sub-vol per dir option

4.Run rebalance command for that volume.
5.execute command 'gluster volume rebalance <volname> status'

Actual results:
[root@mia ~]# gluster volume rebalance v2 status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                0        0Bytes          1230             0      completed             3.00
             fred.lab.eng.blr.redhat.com                0        0Bytes          1230             0      completed             3.00
             fred.lab.eng.blr.redhat.com                0        0Bytes          1230             0      completed             3.00
              fan.lab.eng.blr.redhat.com                0        0Bytes          1230             0      completed             3.00
volume rebalance: v2: success: 

--> fred.lab.eng.blr.redhat.com  has 2 entries in output

Expected results:
there should be only one entry per RHS node/peer 

Additional info:

Comment 3 senaik 2013-05-02 06:23:53 UTC

Facing similar issue along with rebalance status not showing status of one node which is in cluster 

gluster rebalance status :

Node        Rebalanced-files  size   scanned failures status   run time in secs
localhost      0             0Bytes   7       1       completed     0.00
localhost      0             0Bytes   7       1       completed     0.00
localhost      0             0Bytes   7       1       completed     0.00
10.70.34.86    0             0Bytes   8       0       completed     0.00

Output is showing 3 entries for one host and does not show status of one node which is in cluster. 
Also there are 3 nodes in the cluster , but rebalance status shows 4 rows , there should be only one entry per RHS node/peer . 

[root@fillmore ~]# gluster p s
Number of Peers: 2

Hostname: 10.70.34.85
Uuid: 35a8481a-4a77-4149-a883-9db0b68e954f
State: Peer in Cluster (Connected)

Hostname: 10.70.34.86
Uuid: d834977d-9bfd-4940-8843-aedc9130bd12
State: Peer in Cluster (Connected)

Comment 4 senaik 2013-05-02 07:00:42 UTC

sos report for comment 3 can be found at : 

http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/956188/

Comment 5 senaik 2013-05-08 07:50:07 UTC

The issue mentioned in Comment3 is consistently seen on all the nodes after restarting glusterd on any one node :

[root@fillmore tmp]# service glusterd restart
Starting glusterd:                                         [  OK  ]


gluster v rebalance vol14 status

Node      Rebalanced-files size    scanned    failures status run time in secs

localhost    0              0Bytes      0        0     not started    0.00 
localhost    0              0Bytes      0        0     not started    0.00
localhost    0              0Bytes      0        0     not started    0.00
10.70.34.86  0              0Bytes      0        0     not started    0.00

Another issue I faced while checking rebalance status constantly was that , only the local host was reported once in the output and the other nodes in the cluster were missing .

Node     Rebalanced-files size    scanned   failures  status     run time in secs
localhost     0           0Bytes   0           0      not started   0.00


Gluster peer status information : 

[root@fillmore tmp]# gluster peer status
Number of Peers: 2

Hostname: 10.70.34.85
Uuid: 35a8481a-4a77-4149-a883-9db0b68e954f
State: Peer in Cluster (Connected)

Hostname: 10.70.34.86
Uuid: d834977d-9bfd-4940-8843-aedc9130bd12
State: Peer in Cluster (Connected)

Comment 6 Pranith Kumar K 2013-05-15 11:58:06 UTC

I am able to re-create the bug with the steps given by rachana. 1/1

Comment 9 Rachana Patel 2013-06-27 09:26:36 UTC

verified with 3.4.0.12rhs-1.el6rhs.x86_64, able to reproduce so moving back to 'Assigned'


volume info:-
gluster v i dis_rep
 Volume Name: dis_rep
Type: Distributed-Replicate
Volume ID: e13e880d-916b-43a6-9b10-7a5c38ddc133
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.34.86:/rhs/brick1/d1
Brick2: 10.70.34.85:/rhs/brick1/d2
Brick3: 10.70.34.105:/rhs/brick1/d3
Brick4: 10.70.34.86:/rhs/brick1/d4
Brick5: 10.70.34.85:/rhs/brick1/d5
Brick6: 10.70.34.86:/rhs/brick1/d6

1) start rebalance and while rebalance is in progress  rebooted 10.70.35.85


[root@jay ~]# gluster v rebalance dis_rep status
Node Rebalanced-files          size       scanned      failures         status run time in secs
---------      -----------   -----------   -----------   -----------   ------------   --------------
localhost               22       220.0MB            60             0    in progress           111.00
 localhost               22       220.0MB            60             0    in progress           111.00
10.70.34.105               22       220.0MB           194             0    in progress           111.00

Comment 11 Pranith Kumar K 2013-06-27 09:33:33 UTC

Rachana,
     Test case mentioned in the bug description does not include any step to reboot the node. Is the issue happening even when the node is not rebooted?

Pranith

Comment 12 Khoi Mai 2013-09-18 16:18:00 UTC

While I have experienced the same issue, what is the result when it completes?  Are there missing files after the rebalance?  I don't quite understand what 
gluster> volume rebalance devstatic status
                                    Node Rebalanced-files          size       gluster> volume rebalance devstatic status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                0        0Bytes         46989             2    in progress           314.00
                               localhost                0        0Bytes         46989             2    in progress           314.00
                               localhost                0        0Bytes         46989             2    in progress           314.00
                               localhost                0        0Bytes         46989             2    in progress           314.00
                                omdx14f0                0        0Bytes         46977             0    in progress           314.00


then 373 seconds later the peers are listed:

gluster> volume rebalance devstatic status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                0        0Bytes         58238             4    in progress           373.00
                                omhq1832                5       118.0KB         58234             0    in progress           373.00
                                omdx1448                0        0Bytes         58206             0    in progress           372.00
                                omdx14f0                0        0Bytes         58172             0    in progress           373.00


please help me understand what I'm suppose to make from this report.

Comment 13 Khoi Mai 2013-09-18 16:19:24 UTC

While I have experienced the same issue, what is the result when it completes?  Are there missing files after the rebalance?  I don't quite understand what 
gluster> volume rebalance devstatic status
                                    Node Rebalanced-files          size       gluster> volume rebalance devstatic status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                0        0Bytes         46989             2    in progress           314.00
                               localhost                0        0Bytes         46989             2    in progress           314.00
                               localhost                0        0Bytes         46989             2    in progress           314.00
                               localhost                0        0Bytes         46989             2    in progress           314.00
                                omdx14f0                0        0Bytes         46977             0    in progress           314.00


then 373 seconds later the peers are listed:

gluster> volume rebalance devstatic status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                0        0Bytes         58238             4    in progress           373.00
                                omhq1832                5       118.0KB         58234             0    in progress           373.00
                                omdx1448                0        0Bytes         58206             0    in progress           372.00
                                omdx14f0                0        0Bytes         58172             0    in progress           373.00


please help me understand what I'm suppose to make from this report.

Comment 14 Khoi Mai 2013-09-18 21:46:50 UTC

Here is what is on my node01 /var/log/glusterfs/devstatic-rebalance.log

[2013-09-18 20:21:50.479081] I [dht-rebalance.c:1690:gf_defrag_status_get] 0-glusterfs: Rebalance is completed. Time taken is 15788.00 secs
[2013-09-18 20:21:50.479104] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3688880, failures: 1127


grep failures /var/log/glusterfs/devstatic-rebalance.log

[2013-09-18 19:21:30.803333] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215328, failures: 366
[2013-09-18 19:21:32.276603] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215422, failures: 366
[2013-09-18 19:21:33.283772] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215452, failures: 366
[2013-09-18 19:21:34.131694] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215483, failures: 366
[2013-09-18 19:21:35.043714] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215559, failures: 368
[2013-09-18 19:21:35.859756] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215613, failures: 369
[2013-09-18 19:21:36.595615] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215656, failures: 369
[2013-09-18 19:21:37.475340] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215712, failures: 369
[2013-09-18 19:21:38.243189] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215775, failures: 372
[2013-09-18 19:21:38.915318] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215999, failures: 372
[2013-09-18 19:23:45.650095] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3231985, failures: 376
[2013-09-18 19:51:19.016311] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3354410, failures: 643
[2013-09-18 19:55:00.390527] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3389551, failures: 779
[2013-09-18 19:58:40.006348] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3406103, failures: 779
[2013-09-18 19:58:40.822272] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3406121, failures: 779
[2013-09-18 19:58:41.445609] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3406140, failures: 779
[2013-09-18 19:58:42.037005] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3406216, failures: 779
[2013-09-18 19:58:42.597474] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3406256, failures: 779
[2013-09-18 19:58:43.077477] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3406280, failures: 779
[2013-09-18 20:09:00.865398] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3548106, failures: 969
[2013-09-18 20:21:50.479104] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3688880, failures: 1127

not quite sure what "failures" mean in rebalance.

Comment 20 Susant Kumar Palai 2015-11-27 10:30:38 UTC

Cloning this bug to 3.1. Will be fixed in future release.