Bug 985946 - volume rebalance status outputting nonsense
Summary: volume rebalance status outputting nonsense
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: cli
Version: 3.4.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-18 14:51 UTC by Pierre-Francois Laquerre
Modified: 2015-10-07 13:16 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-10-07 13:16:38 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Pierre-Francois Laquerre 2013-07-18 14:51:40 UTC
Description of problem:

I launched a rebalance operation on my 25x2 distributed-replicate volume about two days ago. The output of "gluster volume rebalance bigdata status" has been bizarre to say the least. Sometimes (not sure how to reproduce, and it doesn't always happen), all but one line will be "localhost [...]" with the same stats (see below). Other times, some of the hosts will show up as ips instead of hostnames. This only started happening after updating from 3.3.1 to 3.4.0.

[root@ml59 ~]# gluster volume rebalance bigdata status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                               localhost            55172        39.1GB       1573175         55162    in progress        133395.00
                                    ml26                0        0Bytes       4978892             0    in progress        133395.00

And a few minutes later:

[root@ml59 ~]# gluster volume rebalance bigdata status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost            55361        39.1GB       1573364         55162    in progress        133780.00
                           138.15.169.24              670         9.4MB       5000192         30813    in progress        133781.00
                                    ml40            50688        24.6GB       2578643         90126    in progress        133780.00
                                    ml44                0        0Bytes       4964013             0    in progress        133780.00
                                    ml31                0        0Bytes       4964271             0    in progress        133780.00
                                    ml41                0        0Bytes       5000275             0    in progress        133780.00
                                    ml47            39822        14.4GB       1436576         60227    in progress        133780.00
                                    ml51            58416        12.1GB       1068126          4098    in progress        133780.00
                                    ml54                0        0Bytes       5000348             5    in progress        133780.00
                                    ml26                0        0Bytes       5000337             0    in progress        133780.00
                                    ml55            55277        24.0GB       1694681         26855    in progress        133780.00
                                    ml43            46195        13.3GB       1292287         20762    in progress        133780.00
                                    ml52                0        0Bytes       4963915             0    in progress        133780.00
                                    ml25             3829         1.1GB       4966775         48727    in progress        133780.00
                                    ml56            10383         1.5GB       4971886         80063    in progress        133780.00
                                    ml30            55267        27.5GB       1716359         40853    in progress        133780.00
                                    ml29                0        0Bytes       4963601             0    in progress        133780.00
                                    ml46                0        0Bytes       4963686             0    in progress        133780.00
                                    ml57                0        0Bytes       5000260             0    in progress        133780.00
                                    ml48                0        0Bytes       5000316             0    in progress        133780.00
                                    ml45            53871        10.5GB       1154447         32244    in progress        133780.00
volume rebalance: bigdata: success: 

Other servers:

[root@ml59 ~]# ssh ml01 gluster volume rebalance bigdata status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost              670         9.4MB       5012249         30813    in progress        133872.00
                                    ml57                0        0Bytes       5012454             0    in progress        133871.00
                                    ml59            55407        39.1GB       1573410         55162    in progress        133871.00
                                    ml47            39852        14.4GB       1437131         60258    in progress        133871.00
                                    ml56            10383         1.5GB       4974125         80063    in progress        133871.00
                                    ml55            55323        24.0GB       1694727         26855    in progress        133871.00
                                    ml26                0        0Bytes       5012312             0    in progress        133871.00
                                    ml30            55313        27.5GB       1716481         40853    in progress        133871.00
                                    ml29                0        0Bytes       4965849             0    in progress        133871.00
                                    ml46                0        0Bytes       4966025             0    in progress        133871.00
                                    ml44                0        0Bytes       4966358             0    in progress        133871.00
                                    ml31                0        0Bytes       4966510             0    in progress        133871.00
                                    ml25             3829         1.1GB       4967588         48727    in progress        133871.00
                                    ml43            46223        13.3GB       1292898         20783    in progress        133871.00
                                    ml54                0        0Bytes       5012460             5    in progress        133871.00
                                    ml45            53888        10.5GB       1154645         32255    in progress        133871.00
                                    ml40            50688        24.6GB       2583404         90126    in progress        133871.00
                                    ml52                0        0Bytes       4966154             0    in progress        133871.00
                                    ml48                0        0Bytes       5012427             0    in progress        133871.00
                                    ml41                0        0Bytes       5012333             0    in progress        133871.00
                                    ml51            58431        12.1GB       1068276          4099    in progress        133871.00
volume rebalance: bigdata: success: 

[root@ml59 ~]# ssh ml25 gluster volume rebalance bigdata status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                               localhost             3829         1.1GB       4968833         48727    in progress        133928.00
                                    ml26                0        0Bytes       5016912             0    in progress        133928.00

Yet a few more minutes later:
[root@ml59 ~]# ssh ml25 gluster volume rebalance bigdata status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                               localhost             3829         1.1GB       4970848         48727    in progress        134034.00
                                    ml29                0        0Bytes       4967704             0    in progress        134034.00
volume rebalance: bigdata: success: 

Notice how the bottom server is now ml29 instead of ml26.

How reproducible:
Not sure how to reproduce.

Additional info:
[root@ml59 ~]# gluster peer status
Number of Peers: 20

Hostname: 138.15.169.24
Uuid: 5c338e03-28ff-429b-b702-0a04e25565f8
State: Peer in Cluster (Connected)

Hostname: ml40
Uuid: ffcc06ae-100a-4fa2-888e-803a41ae946c
State: Peer in Cluster (Connected)

Hostname: ml44
Uuid: ebf08063-ccf6-4c37-bb18-b5b19b93b1c6
State: Peer in Cluster (Connected)

Hostname: ml31
Uuid: 699019f6-2f4a-45cb-bfa4-f209745f8a6d
State: Peer in Cluster (Connected)

Hostname: ml41
Uuid: b404851f-dfd5-4746-a3bd-81bb0d888009
State: Peer in Cluster (Connected)

Hostname: ml47
Uuid: e831092d-b196-46ec-947d-a5635e8fbd1e
State: Peer in Cluster (Connected)

Hostname: ml51
Uuid: 5491b6dc-0f96-43d9-95d9-a41018a8542c
State: Peer in Cluster (Connected)

Hostname: ml54
Uuid: c55580fa-2c9d-493d-b9d1-3bce016c8b29
State: Peer in Cluster (Connected)

Hostname: ml26
Uuid: d3d937da-45af-40c0-a219-b6ae3d1d1502
State: Peer in Cluster (Connected)

Hostname: ml55
Uuid: 366339ed-52e5-4722-a1b3-e3bb1c49ea4f
State: Peer in Cluster (Connected)

Hostname: ml43
Uuid: a9044e9a-39e1-4907-8921-43da870b7f31
State: Peer in Cluster (Connected)

Hostname: ml52
Uuid: 4de42f67-4cca-4d28-8600-9018172563ba
State: Peer in Cluster (Connected)

Hostname: ml25
Uuid: ee33e881-2e05-45bc-b550-5ab80f25c4f1
State: Peer in Cluster (Connected)

Hostname: ml56
Uuid: 04a8272c-c921-4f20-8c73-de3c87b36feb
State: Peer in Cluster (Connected)

Hostname: ml30
Uuid: e56b4c57-a058-4464-a1e6-c4676ebf00cc
State: Peer in Cluster (Connected)

Hostname: ml29
Uuid: 58aa8a16-5d2b-4c06-8f06-2fd0f7fc5a37
State: Peer in Cluster (Connected)

Hostname: ml46
Uuid: af74d39b-09d6-47ba-9c3b-72d993dca4ce
State: Peer in Cluster (Connected)

Hostname: ml57
Uuid: ef5becbb-6af7-429a-a62b-a09ecfa1c5f6
State: Peer in Cluster (Connected)

Hostname: ml48
Uuid: efd79145-bfd9-4eea-b7a7-50be18d9ffe0
State: Peer in Cluster (Connected)

Hostname: ml45
Uuid: 0eebbceb-8f62-4c55-8160-41348f90e191
State: Peer in Cluster (Connected)



# gluster volume info
 
Volume Name: bigdata
Type: Distributed-Replicate
Volume ID: 56498956-7b4b-4ee3-9d2b-4c8cfce26051
Status: Started
Number of Bricks: 25 x 2 = 50
Transport-type: tcp
Bricks:
Brick1: ml43:/mnt/donottouch/localb/brick
Brick2: ml44:/mnt/donottouch/localb/brick
Brick3: ml43:/mnt/donottouch/localc/brick
Brick4: ml44:/mnt/donottouch/localc/brick
Brick5: ml45:/mnt/donottouch/localb/brick
Brick6: ml46:/mnt/donottouch/localb/brick
Brick7: ml45:/mnt/donottouch/localc/brick
Brick8: ml46:/mnt/donottouch/localc/brick
Brick9: ml47:/mnt/donottouch/localb/brick
Brick10: ml48:/mnt/donottouch/localb/brick
Brick11: ml47:/mnt/donottouch/localc/brick
Brick12: ml48:/mnt/donottouch/localc/brick
Brick13: ml45:/mnt/donottouch/locald/brick
Brick14: ml46:/mnt/donottouch/locald/brick
Brick15: ml47:/mnt/donottouch/locald/brick
Brick16: ml48:/mnt/donottouch/locald/brick
Brick17: ml51:/mnt/donottouch/localb/brick
Brick18: ml52:/mnt/donottouch/localb/brick
Brick19: ml51:/mnt/donottouch/localc/brick
Brick20: ml52:/mnt/donottouch/localc/brick
Brick21: ml51:/mnt/donottouch/locald/brick
Brick22: ml52:/mnt/donottouch/locald/brick
Brick23: ml59:/mnt/donottouch/locald/brick
Brick24: ml54:/mnt/donottouch/locald/brick
Brick25: ml59:/mnt/donottouch/localc/brick
Brick26: ml54:/mnt/donottouch/localc/brick
Brick27: ml59:/mnt/donottouch/localb/brick
Brick28: ml54:/mnt/donottouch/localb/brick
Brick29: ml55:/mnt/donottouch/localb/brick
Brick30: ml29:/mnt/donottouch/localb/brick
Brick31: ml55:/mnt/donottouch/localc/brick
Brick32: ml29:/mnt/donottouch/localc/brick
Brick33: ml30:/mnt/donottouch/localc/brick
Brick34: ml31:/mnt/donottouch/localc/brick
Brick35: ml30:/mnt/donottouch/localb/brick
Brick36: ml31:/mnt/donottouch/localb/brick
Brick37: ml40:/mnt/donottouch/localb/brick
Brick38: ml41:/mnt/donottouch/localb/brick
Brick39: ml40:/mnt/donottouch/localc/brick
Brick40: ml41:/mnt/donottouch/localc/brick
Brick41: ml56:/mnt/donottouch/localb/brick
Brick42: ml57:/mnt/donottouch/localb/brick
Brick43: ml56:/mnt/donottouch/localc/brick
Brick44: ml57:/mnt/donottouch/localc/brick
Brick45: ml25:/mnt/donottouch/localb/brick
Brick46: ml26:/mnt/donottouch/localb/brick
Brick47: ml01:/mnt/donottouch/localb/brick
Brick48: ml25:/mnt/donottouch/localc/brick
Brick49: ml01:/mnt/donottouch/localc/brick
Brick50: ml26:/mnt/donottouch/localc/brick
Options Reconfigured:
performance.quick-read: on
nfs.disable: on
nfs.register-with-portmap: OFF

# gluster volume status
Status of volume: bigdata
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick ml43:/mnt/donottouch/localb/brick                 49152   Y       1202
Brick ml44:/mnt/donottouch/localb/brick                 49152   Y       12997
Brick ml43:/mnt/donottouch/localc/brick                 49153   Y       1206
Brick ml44:/mnt/donottouch/localc/brick                 49153   Y       13003
Brick ml45:/mnt/donottouch/localb/brick                 49152   Y       18330
Brick ml46:/mnt/donottouch/localb/brick                 49152   Y       5408
Brick ml45:/mnt/donottouch/localc/brick                 49153   Y       18336
Brick ml46:/mnt/donottouch/localc/brick                 49153   Y       5412
Brick ml47:/mnt/donottouch/localb/brick                 49152   Y       4188
Brick ml48:/mnt/donottouch/localb/brick                 49152   Y       19622
Brick ml47:/mnt/donottouch/localc/brick                 49153   Y       4192
Brick ml48:/mnt/donottouch/localc/brick                 49153   Y       19626
Brick ml45:/mnt/donottouch/locald/brick                 49154   Y       18341
Brick ml46:/mnt/donottouch/locald/brick                 49154   Y       5418
Brick ml47:/mnt/donottouch/locald/brick                 49154   Y       4197
Brick ml48:/mnt/donottouch/locald/brick                 49154   Y       19632
Brick ml51:/mnt/donottouch/localb/brick                 49152   Y       14905
Brick ml52:/mnt/donottouch/localb/brick                 49152   Y       17792
Brick ml51:/mnt/donottouch/localc/brick                 49153   Y       14909
Brick ml52:/mnt/donottouch/localc/brick                 49153   Y       17796
Brick ml51:/mnt/donottouch/locald/brick                 49154   Y       14914
Brick ml52:/mnt/donottouch/locald/brick                 49154   Y       17801
Brick ml59:/mnt/donottouch/locald/brick                 49152   Y       9806
Brick ml54:/mnt/donottouch/locald/brick                 49152   Y       31252
Brick ml59:/mnt/donottouch/localc/brick                 49153   Y       9810
Brick ml54:/mnt/donottouch/localc/brick                 49153   Y       31257
Brick ml59:/mnt/donottouch/localb/brick                 49154   Y       9816
Brick ml54:/mnt/donottouch/localb/brick                 49154   Y       31271
Brick ml55:/mnt/donottouch/localb/brick                 49152   Y       8592
Brick ml29:/mnt/donottouch/localb/brick                 49152   Y       26350
Brick ml55:/mnt/donottouch/localc/brick                 49153   Y       8593
Brick ml29:/mnt/donottouch/localc/brick                 49153   Y       26356
Brick ml30:/mnt/donottouch/localc/brick                 49152   Y       29093
Brick ml31:/mnt/donottouch/localc/brick                 49152   Y       26159
Brick ml30:/mnt/donottouch/localb/brick                 49153   Y       29099
Brick ml31:/mnt/donottouch/localb/brick                 49153   Y       26164
Brick ml40:/mnt/donottouch/localb/brick                 49152   Y       11005
Brick ml41:/mnt/donottouch/localb/brick                 49152   Y       20418
Brick ml40:/mnt/donottouch/localc/brick                 49153   Y       11011
Brick ml41:/mnt/donottouch/localc/brick                 49153   Y       20424
Brick ml56:/mnt/donottouch/localb/brick                 49152   Y       1704
Brick ml57:/mnt/donottouch/localb/brick                 49152   Y       1326
Brick ml56:/mnt/donottouch/localc/brick                 49153   Y       1708
Brick ml57:/mnt/donottouch/localc/brick                 49153   Y       1330
Brick ml25:/mnt/donottouch/localb/brick                 49152   Y       6761
Brick ml26:/mnt/donottouch/localb/brick                 49152   Y       590
Brick ml01:/mnt/donottouch/localb/brick                 49152   Y       13431
Brick ml25:/mnt/donottouch/localc/brick                 49153   Y       6765
Brick ml01:/mnt/donottouch/localc/brick                 49153   Y       13435
Brick ml26:/mnt/donottouch/localc/brick                 49153   Y       596
Self-heal Daemon on localhost                           N/A     Y       9824
Self-heal Daemon on ml40                                N/A     Y       11019
Self-heal Daemon on ml45                                N/A     Y       18350
Self-heal Daemon on ml41                                N/A     Y       20432
Self-heal Daemon on ml43                                N/A     Y       2128
Self-heal Daemon on ml52                                N/A     Y       17810
Self-heal Daemon on ml54                                N/A     Y       31267
Self-heal Daemon on ml44                                N/A     Y       13011
Self-heal Daemon on ml29                                N/A     Y       26364
Self-heal Daemon on ml57                                N/A     Y       1340
Self-heal Daemon on ml47                                N/A     Y       4206
Self-heal Daemon on ml30                                N/A     Y       29107
Self-heal Daemon on ml56                                N/A     Y       1716
Self-heal Daemon on ml51                                N/A     Y       14923
Self-heal Daemon on ml55                                N/A     Y       8604
Self-heal Daemon on ml48                                N/A     Y       19640
Self-heal Daemon on ml31                                N/A     Y       26172
Self-heal Daemon on 138.15.169.24                       N/A     Y       13445
Self-heal Daemon on ml46                                N/A     Y       5426
Self-heal Daemon on ml26                                N/A     Y       604
Self-heal Daemon on ml25                                N/A     Y       6773
 
           Task                                      ID         Status
           ----                                      --         ------
      Rebalance    1f4a8910-17ed-41a3-b10e-06fe32e4b517              1

# cat /etc/system-release
Scientific Linux release 6.1 (Carbon)

# uname -a
Linux ml59 2.6.32-131.17.1.el6.x86_64 #1 SMP Wed Oct 5 17:19:54 CDT 2011 x86_64 x86_64 x86_64 GNU/Linux

# rpm -qa|grep gluster
glusterfs-server-3.4.0-1.el6.x86_64
glusterfs-fuse-3.4.0-1.el6.x86_64
glusterfs-debuginfo-3.4.0-1.el6.x86_64
glusterfs-3.4.0-1.el6.x86_64

Comment 1 Niels de Vos 2015-05-17 22:01:00 UTC
GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5.

This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs".

If there is no response by the end of the month, this bug will get automatically closed.

Comment 2 Kaleb KEITHLEY 2015-10-07 13:16:38 UTC
GlusterFS 3.4.x has reached end-of-life.

If this bug still exists in a later release please reopen this and change the version or open a new bug.


Note You need to log in before you can comment on or make changes to this bug.