Description of problem: DHT - rebalance - 'gluster volume rebalance <volname> status' shows 2 entry ( 2 rows) for one host Version-Release number of selected component (if applicable): 3.4.0.1rhs-1.el6rhs.x86_64 How reproducible: always Steps to Reproduce: Steps to Reproduce: 1. Create a Distributed volume having 2 or more sub-volume and start the volume. 2. Fuse Mount the volume from the client-1 using “mount -t glusterfs server:/<volume> <client-1_mount_point>” mount -t glusterfs XXX:/<volname> /mnt/XXX 3. From mount point create some files and perform rename operation. or add brick to that volume or change sub-vol per dir option 4.Run rebalance command for that volume. 5.execute command 'gluster volume rebalance <volname> status' Actual results: [root@mia ~]# gluster volume rebalance v2 status Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 1230 0 completed 3.00 fred.lab.eng.blr.redhat.com 0 0Bytes 1230 0 completed 3.00 fred.lab.eng.blr.redhat.com 0 0Bytes 1230 0 completed 3.00 fan.lab.eng.blr.redhat.com 0 0Bytes 1230 0 completed 3.00 volume rebalance: v2: success: --> fred.lab.eng.blr.redhat.com has 2 entries in output Expected results: there should be only one entry per RHS node/peer Additional info:
Facing similar issue along with rebalance status not showing status of one node which is in cluster gluster rebalance status : Node Rebalanced-files size scanned failures status run time in secs localhost 0 0Bytes 7 1 completed 0.00 localhost 0 0Bytes 7 1 completed 0.00 localhost 0 0Bytes 7 1 completed 0.00 10.70.34.86 0 0Bytes 8 0 completed 0.00 Output is showing 3 entries for one host and does not show status of one node which is in cluster. Also there are 3 nodes in the cluster , but rebalance status shows 4 rows , there should be only one entry per RHS node/peer . [root@fillmore ~]# gluster p s Number of Peers: 2 Hostname: 10.70.34.85 Uuid: 35a8481a-4a77-4149-a883-9db0b68e954f State: Peer in Cluster (Connected) Hostname: 10.70.34.86 Uuid: d834977d-9bfd-4940-8843-aedc9130bd12 State: Peer in Cluster (Connected)
sos report for comment 3 can be found at : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/956188/
The issue mentioned in Comment3 is consistently seen on all the nodes after restarting glusterd on any one node : [root@fillmore tmp]# service glusterd restart Starting glusterd: [ OK ] gluster v rebalance vol14 status Node Rebalanced-files size scanned failures status run time in secs localhost 0 0Bytes 0 0 not started 0.00 localhost 0 0Bytes 0 0 not started 0.00 localhost 0 0Bytes 0 0 not started 0.00 10.70.34.86 0 0Bytes 0 0 not started 0.00 Another issue I faced while checking rebalance status constantly was that , only the local host was reported once in the output and the other nodes in the cluster were missing . Node Rebalanced-files size scanned failures status run time in secs localhost 0 0Bytes 0 0 not started 0.00 Gluster peer status information : [root@fillmore tmp]# gluster peer status Number of Peers: 2 Hostname: 10.70.34.85 Uuid: 35a8481a-4a77-4149-a883-9db0b68e954f State: Peer in Cluster (Connected) Hostname: 10.70.34.86 Uuid: d834977d-9bfd-4940-8843-aedc9130bd12 State: Peer in Cluster (Connected)
I am able to re-create the bug with the steps given by rachana. 1/1
verified with 3.4.0.12rhs-1.el6rhs.x86_64, able to reproduce so moving back to 'Assigned' volume info:- gluster v i dis_rep Volume Name: dis_rep Type: Distributed-Replicate Volume ID: e13e880d-916b-43a6-9b10-7a5c38ddc133 Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: 10.70.34.86:/rhs/brick1/d1 Brick2: 10.70.34.85:/rhs/brick1/d2 Brick3: 10.70.34.105:/rhs/brick1/d3 Brick4: 10.70.34.86:/rhs/brick1/d4 Brick5: 10.70.34.85:/rhs/brick1/d5 Brick6: 10.70.34.86:/rhs/brick1/d6 1) start rebalance and while rebalance is in progress rebooted 10.70.35.85 [root@jay ~]# gluster v rebalance dis_rep status Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 22 220.0MB 60 0 in progress 111.00 localhost 22 220.0MB 60 0 in progress 111.00 10.70.34.105 22 220.0MB 194 0 in progress 111.00
Rachana, Test case mentioned in the bug description does not include any step to reboot the node. Is the issue happening even when the node is not rebooted? Pranith
While I have experienced the same issue, what is the result when it completes? Are there missing files after the rebalance? I don't quite understand what gluster> volume rebalance devstatic status Node Rebalanced-files size gluster> volume rebalance devstatic status Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 46989 2 in progress 314.00 localhost 0 0Bytes 46989 2 in progress 314.00 localhost 0 0Bytes 46989 2 in progress 314.00 localhost 0 0Bytes 46989 2 in progress 314.00 omdx14f0 0 0Bytes 46977 0 in progress 314.00 then 373 seconds later the peers are listed: gluster> volume rebalance devstatic status Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 58238 4 in progress 373.00 omhq1832 5 118.0KB 58234 0 in progress 373.00 omdx1448 0 0Bytes 58206 0 in progress 372.00 omdx14f0 0 0Bytes 58172 0 in progress 373.00 please help me understand what I'm suppose to make from this report.
Here is what is on my node01 /var/log/glusterfs/devstatic-rebalance.log [2013-09-18 20:21:50.479081] I [dht-rebalance.c:1690:gf_defrag_status_get] 0-glusterfs: Rebalance is completed. Time taken is 15788.00 secs [2013-09-18 20:21:50.479104] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3688880, failures: 1127 grep failures /var/log/glusterfs/devstatic-rebalance.log [2013-09-18 19:21:30.803333] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215328, failures: 366 [2013-09-18 19:21:32.276603] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215422, failures: 366 [2013-09-18 19:21:33.283772] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215452, failures: 366 [2013-09-18 19:21:34.131694] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215483, failures: 366 [2013-09-18 19:21:35.043714] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215559, failures: 368 [2013-09-18 19:21:35.859756] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215613, failures: 369 [2013-09-18 19:21:36.595615] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215656, failures: 369 [2013-09-18 19:21:37.475340] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215712, failures: 369 [2013-09-18 19:21:38.243189] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215775, failures: 372 [2013-09-18 19:21:38.915318] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215999, failures: 372 [2013-09-18 19:23:45.650095] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3231985, failures: 376 [2013-09-18 19:51:19.016311] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3354410, failures: 643 [2013-09-18 19:55:00.390527] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3389551, failures: 779 [2013-09-18 19:58:40.006348] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3406103, failures: 779 [2013-09-18 19:58:40.822272] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3406121, failures: 779 [2013-09-18 19:58:41.445609] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3406140, failures: 779 [2013-09-18 19:58:42.037005] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3406216, failures: 779 [2013-09-18 19:58:42.597474] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3406256, failures: 779 [2013-09-18 19:58:43.077477] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3406280, failures: 779 [2013-09-18 20:09:00.865398] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3548106, failures: 969 [2013-09-18 20:21:50.479104] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3688880, failures: 1127 not quite sure what "failures" mean in rebalance.
Cloning this bug to 3.1. Will be fixed in future release.