Description of problem: I launched a rebalance operation on my 25x2 distributed-replicate volume about two days ago. The output of "gluster volume rebalance bigdata status" has been bizarre to say the least. Sometimes (not sure how to reproduce, and it doesn't always happen), all but one line will be "localhost [...]" with the same stats (see below). Other times, some of the hosts will show up as ips instead of hostnames. This only started happening after updating from 3.3.1 to 3.4.0. [root@ml59 ~]# gluster volume rebalance bigdata status Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 localhost 55172 39.1GB 1573175 55162 in progress 133395.00 ml26 0 0Bytes 4978892 0 in progress 133395.00 And a few minutes later: [root@ml59 ~]# gluster volume rebalance bigdata status Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 55361 39.1GB 1573364 55162 in progress 133780.00 138.15.169.24 670 9.4MB 5000192 30813 in progress 133781.00 ml40 50688 24.6GB 2578643 90126 in progress 133780.00 ml44 0 0Bytes 4964013 0 in progress 133780.00 ml31 0 0Bytes 4964271 0 in progress 133780.00 ml41 0 0Bytes 5000275 0 in progress 133780.00 ml47 39822 14.4GB 1436576 60227 in progress 133780.00 ml51 58416 12.1GB 1068126 4098 in progress 133780.00 ml54 0 0Bytes 5000348 5 in progress 133780.00 ml26 0 0Bytes 5000337 0 in progress 133780.00 ml55 55277 24.0GB 1694681 26855 in progress 133780.00 ml43 46195 13.3GB 1292287 20762 in progress 133780.00 ml52 0 0Bytes 4963915 0 in progress 133780.00 ml25 3829 1.1GB 4966775 48727 in progress 133780.00 ml56 10383 1.5GB 4971886 80063 in progress 133780.00 ml30 55267 27.5GB 1716359 40853 in progress 133780.00 ml29 0 0Bytes 4963601 0 in progress 133780.00 ml46 0 0Bytes 4963686 0 in progress 133780.00 ml57 0 0Bytes 5000260 0 in progress 133780.00 ml48 0 0Bytes 5000316 0 in progress 133780.00 ml45 53871 10.5GB 1154447 32244 in progress 133780.00 volume rebalance: bigdata: success: Other servers: [root@ml59 ~]# ssh ml01 gluster volume rebalance bigdata status Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 670 9.4MB 5012249 30813 in progress 133872.00 ml57 0 0Bytes 5012454 0 in progress 133871.00 ml59 55407 39.1GB 1573410 55162 in progress 133871.00 ml47 39852 14.4GB 1437131 60258 in progress 133871.00 ml56 10383 1.5GB 4974125 80063 in progress 133871.00 ml55 55323 24.0GB 1694727 26855 in progress 133871.00 ml26 0 0Bytes 5012312 0 in progress 133871.00 ml30 55313 27.5GB 1716481 40853 in progress 133871.00 ml29 0 0Bytes 4965849 0 in progress 133871.00 ml46 0 0Bytes 4966025 0 in progress 133871.00 ml44 0 0Bytes 4966358 0 in progress 133871.00 ml31 0 0Bytes 4966510 0 in progress 133871.00 ml25 3829 1.1GB 4967588 48727 in progress 133871.00 ml43 46223 13.3GB 1292898 20783 in progress 133871.00 ml54 0 0Bytes 5012460 5 in progress 133871.00 ml45 53888 10.5GB 1154645 32255 in progress 133871.00 ml40 50688 24.6GB 2583404 90126 in progress 133871.00 ml52 0 0Bytes 4966154 0 in progress 133871.00 ml48 0 0Bytes 5012427 0 in progress 133871.00 ml41 0 0Bytes 5012333 0 in progress 133871.00 ml51 58431 12.1GB 1068276 4099 in progress 133871.00 volume rebalance: bigdata: success: [root@ml59 ~]# ssh ml25 gluster volume rebalance bigdata status Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 localhost 3829 1.1GB 4968833 48727 in progress 133928.00 ml26 0 0Bytes 5016912 0 in progress 133928.00 Yet a few more minutes later: [root@ml59 ~]# ssh ml25 gluster volume rebalance bigdata status Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 localhost 3829 1.1GB 4970848 48727 in progress 134034.00 ml29 0 0Bytes 4967704 0 in progress 134034.00 volume rebalance: bigdata: success: Notice how the bottom server is now ml29 instead of ml26. How reproducible: Not sure how to reproduce. Additional info: [root@ml59 ~]# gluster peer status Number of Peers: 20 Hostname: 138.15.169.24 Uuid: 5c338e03-28ff-429b-b702-0a04e25565f8 State: Peer in Cluster (Connected) Hostname: ml40 Uuid: ffcc06ae-100a-4fa2-888e-803a41ae946c State: Peer in Cluster (Connected) Hostname: ml44 Uuid: ebf08063-ccf6-4c37-bb18-b5b19b93b1c6 State: Peer in Cluster (Connected) Hostname: ml31 Uuid: 699019f6-2f4a-45cb-bfa4-f209745f8a6d State: Peer in Cluster (Connected) Hostname: ml41 Uuid: b404851f-dfd5-4746-a3bd-81bb0d888009 State: Peer in Cluster (Connected) Hostname: ml47 Uuid: e831092d-b196-46ec-947d-a5635e8fbd1e State: Peer in Cluster (Connected) Hostname: ml51 Uuid: 5491b6dc-0f96-43d9-95d9-a41018a8542c State: Peer in Cluster (Connected) Hostname: ml54 Uuid: c55580fa-2c9d-493d-b9d1-3bce016c8b29 State: Peer in Cluster (Connected) Hostname: ml26 Uuid: d3d937da-45af-40c0-a219-b6ae3d1d1502 State: Peer in Cluster (Connected) Hostname: ml55 Uuid: 366339ed-52e5-4722-a1b3-e3bb1c49ea4f State: Peer in Cluster (Connected) Hostname: ml43 Uuid: a9044e9a-39e1-4907-8921-43da870b7f31 State: Peer in Cluster (Connected) Hostname: ml52 Uuid: 4de42f67-4cca-4d28-8600-9018172563ba State: Peer in Cluster (Connected) Hostname: ml25 Uuid: ee33e881-2e05-45bc-b550-5ab80f25c4f1 State: Peer in Cluster (Connected) Hostname: ml56 Uuid: 04a8272c-c921-4f20-8c73-de3c87b36feb State: Peer in Cluster (Connected) Hostname: ml30 Uuid: e56b4c57-a058-4464-a1e6-c4676ebf00cc State: Peer in Cluster (Connected) Hostname: ml29 Uuid: 58aa8a16-5d2b-4c06-8f06-2fd0f7fc5a37 State: Peer in Cluster (Connected) Hostname: ml46 Uuid: af74d39b-09d6-47ba-9c3b-72d993dca4ce State: Peer in Cluster (Connected) Hostname: ml57 Uuid: ef5becbb-6af7-429a-a62b-a09ecfa1c5f6 State: Peer in Cluster (Connected) Hostname: ml48 Uuid: efd79145-bfd9-4eea-b7a7-50be18d9ffe0 State: Peer in Cluster (Connected) Hostname: ml45 Uuid: 0eebbceb-8f62-4c55-8160-41348f90e191 State: Peer in Cluster (Connected) # gluster volume info Volume Name: bigdata Type: Distributed-Replicate Volume ID: 56498956-7b4b-4ee3-9d2b-4c8cfce26051 Status: Started Number of Bricks: 25 x 2 = 50 Transport-type: tcp Bricks: Brick1: ml43:/mnt/donottouch/localb/brick Brick2: ml44:/mnt/donottouch/localb/brick Brick3: ml43:/mnt/donottouch/localc/brick Brick4: ml44:/mnt/donottouch/localc/brick Brick5: ml45:/mnt/donottouch/localb/brick Brick6: ml46:/mnt/donottouch/localb/brick Brick7: ml45:/mnt/donottouch/localc/brick Brick8: ml46:/mnt/donottouch/localc/brick Brick9: ml47:/mnt/donottouch/localb/brick Brick10: ml48:/mnt/donottouch/localb/brick Brick11: ml47:/mnt/donottouch/localc/brick Brick12: ml48:/mnt/donottouch/localc/brick Brick13: ml45:/mnt/donottouch/locald/brick Brick14: ml46:/mnt/donottouch/locald/brick Brick15: ml47:/mnt/donottouch/locald/brick Brick16: ml48:/mnt/donottouch/locald/brick Brick17: ml51:/mnt/donottouch/localb/brick Brick18: ml52:/mnt/donottouch/localb/brick Brick19: ml51:/mnt/donottouch/localc/brick Brick20: ml52:/mnt/donottouch/localc/brick Brick21: ml51:/mnt/donottouch/locald/brick Brick22: ml52:/mnt/donottouch/locald/brick Brick23: ml59:/mnt/donottouch/locald/brick Brick24: ml54:/mnt/donottouch/locald/brick Brick25: ml59:/mnt/donottouch/localc/brick Brick26: ml54:/mnt/donottouch/localc/brick Brick27: ml59:/mnt/donottouch/localb/brick Brick28: ml54:/mnt/donottouch/localb/brick Brick29: ml55:/mnt/donottouch/localb/brick Brick30: ml29:/mnt/donottouch/localb/brick Brick31: ml55:/mnt/donottouch/localc/brick Brick32: ml29:/mnt/donottouch/localc/brick Brick33: ml30:/mnt/donottouch/localc/brick Brick34: ml31:/mnt/donottouch/localc/brick Brick35: ml30:/mnt/donottouch/localb/brick Brick36: ml31:/mnt/donottouch/localb/brick Brick37: ml40:/mnt/donottouch/localb/brick Brick38: ml41:/mnt/donottouch/localb/brick Brick39: ml40:/mnt/donottouch/localc/brick Brick40: ml41:/mnt/donottouch/localc/brick Brick41: ml56:/mnt/donottouch/localb/brick Brick42: ml57:/mnt/donottouch/localb/brick Brick43: ml56:/mnt/donottouch/localc/brick Brick44: ml57:/mnt/donottouch/localc/brick Brick45: ml25:/mnt/donottouch/localb/brick Brick46: ml26:/mnt/donottouch/localb/brick Brick47: ml01:/mnt/donottouch/localb/brick Brick48: ml25:/mnt/donottouch/localc/brick Brick49: ml01:/mnt/donottouch/localc/brick Brick50: ml26:/mnt/donottouch/localc/brick Options Reconfigured: performance.quick-read: on nfs.disable: on nfs.register-with-portmap: OFF # gluster volume status Status of volume: bigdata Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick ml43:/mnt/donottouch/localb/brick 49152 Y 1202 Brick ml44:/mnt/donottouch/localb/brick 49152 Y 12997 Brick ml43:/mnt/donottouch/localc/brick 49153 Y 1206 Brick ml44:/mnt/donottouch/localc/brick 49153 Y 13003 Brick ml45:/mnt/donottouch/localb/brick 49152 Y 18330 Brick ml46:/mnt/donottouch/localb/brick 49152 Y 5408 Brick ml45:/mnt/donottouch/localc/brick 49153 Y 18336 Brick ml46:/mnt/donottouch/localc/brick 49153 Y 5412 Brick ml47:/mnt/donottouch/localb/brick 49152 Y 4188 Brick ml48:/mnt/donottouch/localb/brick 49152 Y 19622 Brick ml47:/mnt/donottouch/localc/brick 49153 Y 4192 Brick ml48:/mnt/donottouch/localc/brick 49153 Y 19626 Brick ml45:/mnt/donottouch/locald/brick 49154 Y 18341 Brick ml46:/mnt/donottouch/locald/brick 49154 Y 5418 Brick ml47:/mnt/donottouch/locald/brick 49154 Y 4197 Brick ml48:/mnt/donottouch/locald/brick 49154 Y 19632 Brick ml51:/mnt/donottouch/localb/brick 49152 Y 14905 Brick ml52:/mnt/donottouch/localb/brick 49152 Y 17792 Brick ml51:/mnt/donottouch/localc/brick 49153 Y 14909 Brick ml52:/mnt/donottouch/localc/brick 49153 Y 17796 Brick ml51:/mnt/donottouch/locald/brick 49154 Y 14914 Brick ml52:/mnt/donottouch/locald/brick 49154 Y 17801 Brick ml59:/mnt/donottouch/locald/brick 49152 Y 9806 Brick ml54:/mnt/donottouch/locald/brick 49152 Y 31252 Brick ml59:/mnt/donottouch/localc/brick 49153 Y 9810 Brick ml54:/mnt/donottouch/localc/brick 49153 Y 31257 Brick ml59:/mnt/donottouch/localb/brick 49154 Y 9816 Brick ml54:/mnt/donottouch/localb/brick 49154 Y 31271 Brick ml55:/mnt/donottouch/localb/brick 49152 Y 8592 Brick ml29:/mnt/donottouch/localb/brick 49152 Y 26350 Brick ml55:/mnt/donottouch/localc/brick 49153 Y 8593 Brick ml29:/mnt/donottouch/localc/brick 49153 Y 26356 Brick ml30:/mnt/donottouch/localc/brick 49152 Y 29093 Brick ml31:/mnt/donottouch/localc/brick 49152 Y 26159 Brick ml30:/mnt/donottouch/localb/brick 49153 Y 29099 Brick ml31:/mnt/donottouch/localb/brick 49153 Y 26164 Brick ml40:/mnt/donottouch/localb/brick 49152 Y 11005 Brick ml41:/mnt/donottouch/localb/brick 49152 Y 20418 Brick ml40:/mnt/donottouch/localc/brick 49153 Y 11011 Brick ml41:/mnt/donottouch/localc/brick 49153 Y 20424 Brick ml56:/mnt/donottouch/localb/brick 49152 Y 1704 Brick ml57:/mnt/donottouch/localb/brick 49152 Y 1326 Brick ml56:/mnt/donottouch/localc/brick 49153 Y 1708 Brick ml57:/mnt/donottouch/localc/brick 49153 Y 1330 Brick ml25:/mnt/donottouch/localb/brick 49152 Y 6761 Brick ml26:/mnt/donottouch/localb/brick 49152 Y 590 Brick ml01:/mnt/donottouch/localb/brick 49152 Y 13431 Brick ml25:/mnt/donottouch/localc/brick 49153 Y 6765 Brick ml01:/mnt/donottouch/localc/brick 49153 Y 13435 Brick ml26:/mnt/donottouch/localc/brick 49153 Y 596 Self-heal Daemon on localhost N/A Y 9824 Self-heal Daemon on ml40 N/A Y 11019 Self-heal Daemon on ml45 N/A Y 18350 Self-heal Daemon on ml41 N/A Y 20432 Self-heal Daemon on ml43 N/A Y 2128 Self-heal Daemon on ml52 N/A Y 17810 Self-heal Daemon on ml54 N/A Y 31267 Self-heal Daemon on ml44 N/A Y 13011 Self-heal Daemon on ml29 N/A Y 26364 Self-heal Daemon on ml57 N/A Y 1340 Self-heal Daemon on ml47 N/A Y 4206 Self-heal Daemon on ml30 N/A Y 29107 Self-heal Daemon on ml56 N/A Y 1716 Self-heal Daemon on ml51 N/A Y 14923 Self-heal Daemon on ml55 N/A Y 8604 Self-heal Daemon on ml48 N/A Y 19640 Self-heal Daemon on ml31 N/A Y 26172 Self-heal Daemon on 138.15.169.24 N/A Y 13445 Self-heal Daemon on ml46 N/A Y 5426 Self-heal Daemon on ml26 N/A Y 604 Self-heal Daemon on ml25 N/A Y 6773 Task ID Status ---- -- ------ Rebalance 1f4a8910-17ed-41a3-b10e-06fe32e4b517 1 # cat /etc/system-release Scientific Linux release 6.1 (Carbon) # uname -a Linux ml59 2.6.32-131.17.1.el6.x86_64 #1 SMP Wed Oct 5 17:19:54 CDT 2011 x86_64 x86_64 x86_64 GNU/Linux # rpm -qa|grep gluster glusterfs-server-3.4.0-1.el6.x86_64 glusterfs-fuse-3.4.0-1.el6.x86_64 glusterfs-debuginfo-3.4.0-1.el6.x86_64 glusterfs-3.4.0-1.el6.x86_64
GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5. This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs". If there is no response by the end of the month, this bug will get automatically closed.
GlusterFS 3.4.x has reached end-of-life. If this bug still exists in a later release please reopen this and change the version or open a new bug.