Bug 1708531
Summary: | gluster rebalance status brain splits | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Qigang <wangqg1> |
Component: | distribute | Assignee: | Susant Kumar Palai <spalai> |
Status: | CLOSED DEFERRED | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | mainline | CC: | amukherj, rhs-bugs, sasundar, storage-qa-internal, vbellur |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-02-18 08:19:37 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Qigang
2019-05-10 07:26:23 UTC
(In reply to Qigang from comment #0) > Version-Release number of selected component (if applicable): > Glusterfs3.12 > could you provide the complete details of rpms used ? rpm -qa | grep gluster Also the platform used, gluster volume information. [root@byg612sv160 ~]# rpm -qa | grep gluster glusterfs-rdma-3.12.3-1.el7.x86_64 glusterfs-client-xlators-3.12.3-1.el7.x86_64 glusterfs-3.12.3-1.el7.x86_64 glusterfs-cli-3.12.3-1.el7.x86_64 glusterfs-libs-3.12.3-1.el7.x86_64 glusterfs-fuse-3.12.3-1.el7.x86_64 glusterfs-api-3.12.3-1.el7.x86_64 glusterfs-server-3.12.3-1.el7.x86_64 [root@byg612sv160 ~]# uname -a Linux byg612sv160 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux [root@byg612sv160 ~]# [root@byg612sv160 ~]# gluster volume status Status of volume: gv0 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick g173:/dfs/brick1/gv0 49152 0 Y 2683 Brick g174:/dfs/brick1/gv0 49152 0 Y 2617 Brick g61:/dfs/brick1/gv0 49152 0 Y 2367 Brick g62:/dfs/brick1/gv0 49152 0 Y 3236 Brick g121:/dfs/brick1/gv0 49152 0 Y 2064 Brick g122:/dfs/brick1/gv0 49152 0 Y 2075 Brick g201:/dfs/brick1/gv0 49152 0 Y 3034 Brick g202:/dfs/brick1/gv0 49152 0 Y 2399 Brick g203:/dfs/brick1/gv0 49152 0 Y 2892 Brick g206:/dfs/brick1/gv0 49152 0 Y 2485 Brick g150:/dfs/brick1/gv0 49152 0 Y 3276 Brick g151:/dfs/brick1/gv0 49152 0 Y 3062 Brick g152:/dfs/brick1/gv0 49152 0 Y 187895 Brick g153:/dfs/brick1/gv0 49152 0 Y 61796 Brick g154:/dfs/brick1/gv0 49152 0 Y 147263 Brick g155:/dfs/brick1/gv0 49152 0 Y 61524 Brick g156:/dfs/brick1/gv0 49152 0 Y 253395 Brick g157:/dfs/brick1/gv0 49152 0 Y 3222 Brick g160:/dfs/brick1/gv0 49152 0 Y 249217 Brick g161:/dfs/brick1/gv0 49152 0 Y 192749 Self-heal Daemon on localhost N/A N/A Y 330489 Self-heal Daemon on g206 N/A N/A Y 3652 Self-heal Daemon on g203 N/A N/A Y 67033 Self-heal Daemon on g61 N/A N/A Y 441931 Self-heal Daemon on g152 N/A N/A Y 188517 Self-heal Daemon on g151 N/A N/A Y 72832 Self-heal Daemon on g154 N/A N/A Y 423672 Self-heal Daemon on g155 N/A N/A Y 375165 Self-heal Daemon on g122 N/A N/A Y 442967 Self-heal Daemon on g150 N/A N/A Y 329818 Self-heal Daemon on g153 N/A N/A Y 27126 Self-heal Daemon on g157 N/A N/A Y 102113 Self-heal Daemon on g156 N/A N/A Y 319339 Self-heal Daemon on g202 N/A N/A Y 81427 Self-heal Daemon on g62 N/A N/A Y 108351 Self-heal Daemon on g161 N/A N/A Y 218759 Self-heal Daemon on g121 N/A N/A Y 358359 Self-heal Daemon on g201 N/A N/A Y 32230 Self-heal Daemon on g173 N/A N/A Y 44555 Self-heal Daemon on g174 N/A N/A Y 41594 Task Status of Volume gv0 ------------------------------------------------------------------------------ Task : Rebalance ID : 5e50a6d6-1e3b-4468-9b0b-9a9ec48dee3c Status : in progress [root@byg612sv160 ~]# Is the rebalance process still running on the nodes? You can use ps ax |grep rebalance to check. Rebalance will trying to finish migrating the files in its queue before terminating. It may be the reason it did not stop. Also, are you using the upstream release bits or the supported RHGS ? Yes, the rebalance process is still running, and it has been making very slow progress for almost a week. It looks like it is not migrating files. It is just doing fix-layout. We have over 110TB files (and many of them are small files) in our gluster storage. The version numbers do not match the downstream RHBZ builds. Moving this to the Community release. (In reply to Qigang from comment #6) > Yes, the rebalance process is still running, and it has been making very > slow progress for almost a week. It looks like it is not migrating files. It > is just doing fix-layout. We have over 110TB files (and many of them are > small files) in our gluster storage. Do you have a lot of directories? If yes, then fixing the layout on those will take a lot of time but do not show up in the status. The problem with the cli commands is probably because of a mismatch in the glusterd node info files. Asking Atin to provide the steps to work around this. If you do not have lookup-optimize enabled on the volume, you can kill the rebalance processes, then perform the steps Atin will provide to clean up the node_state.info files. Yes, we have a lot of directories. The rebalance log file /var/log/glusterfs/gv0-rebalance.log can give scanned folder information and thus can be viewed as a status report. But it is way too slow and there isn't a progress bar. We have no idea how long it will take. ----one item in rebalance.log file---- [2019-05-13 05:09:10.236068] I [MSGID: 109081] [dht-common.c:4379:dht_setxattr] 0-gv0-dht: fixing the layout of /yangdk2_data/data/meitu/meitu_img/train/gameplaying/954742707 ----one item in rebalance.log file---- The rebalance process is only observed in the two newly added pairs. Our lookup-optimize setting is off. Thank you very much. Closing this bug as there is no activity. Please reopen if you have any new concerns. |