Bug 956188
Summary: | DHT - rebalance - 'gluster volume rebalance <volname> status' shows 2 entry ( 2 rows) for one host | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rachana Patel <racpatel> | |
Component: | distribute | Assignee: | Nithya Balachandran <nbalacha> | |
Status: | CLOSED EOL | QA Contact: | Matt Zywusko <mzywusko> | |
Severity: | medium | Docs Contact: | ||
Priority: | high | |||
Version: | 2.1 | CC: | khoi.mai2008, nsathyan, pkarampu, rhs-bugs, rwheeler, sdharane, spalai, vagarwal, vbellur | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 963524 1286068 1286069 1286071 1286072 (view as bug list) | Environment: | ||
Last Closed: | 2015-11-27 10:30:38 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 963524, 1286068, 1286069, 1286071, 1286072 |
Description
Rachana Patel
2013-04-24 12:26:19 UTC
Facing similar issue along with rebalance status not showing status of one node which is in cluster gluster rebalance status : Node Rebalanced-files size scanned failures status run time in secs localhost 0 0Bytes 7 1 completed 0.00 localhost 0 0Bytes 7 1 completed 0.00 localhost 0 0Bytes 7 1 completed 0.00 10.70.34.86 0 0Bytes 8 0 completed 0.00 Output is showing 3 entries for one host and does not show status of one node which is in cluster. Also there are 3 nodes in the cluster , but rebalance status shows 4 rows , there should be only one entry per RHS node/peer . [root@fillmore ~]# gluster p s Number of Peers: 2 Hostname: 10.70.34.85 Uuid: 35a8481a-4a77-4149-a883-9db0b68e954f State: Peer in Cluster (Connected) Hostname: 10.70.34.86 Uuid: d834977d-9bfd-4940-8843-aedc9130bd12 State: Peer in Cluster (Connected) sos report for comment 3 can be found at : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/956188/ The issue mentioned in Comment3 is consistently seen on all the nodes after restarting glusterd on any one node : [root@fillmore tmp]# service glusterd restart Starting glusterd: [ OK ] gluster v rebalance vol14 status Node Rebalanced-files size scanned failures status run time in secs localhost 0 0Bytes 0 0 not started 0.00 localhost 0 0Bytes 0 0 not started 0.00 localhost 0 0Bytes 0 0 not started 0.00 10.70.34.86 0 0Bytes 0 0 not started 0.00 Another issue I faced while checking rebalance status constantly was that , only the local host was reported once in the output and the other nodes in the cluster were missing . Node Rebalanced-files size scanned failures status run time in secs localhost 0 0Bytes 0 0 not started 0.00 Gluster peer status information : [root@fillmore tmp]# gluster peer status Number of Peers: 2 Hostname: 10.70.34.85 Uuid: 35a8481a-4a77-4149-a883-9db0b68e954f State: Peer in Cluster (Connected) Hostname: 10.70.34.86 Uuid: d834977d-9bfd-4940-8843-aedc9130bd12 State: Peer in Cluster (Connected) I am able to re-create the bug with the steps given by rachana. 1/1 verified with 3.4.0.12rhs-1.el6rhs.x86_64, able to reproduce so moving back to 'Assigned' volume info:- gluster v i dis_rep Volume Name: dis_rep Type: Distributed-Replicate Volume ID: e13e880d-916b-43a6-9b10-7a5c38ddc133 Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: 10.70.34.86:/rhs/brick1/d1 Brick2: 10.70.34.85:/rhs/brick1/d2 Brick3: 10.70.34.105:/rhs/brick1/d3 Brick4: 10.70.34.86:/rhs/brick1/d4 Brick5: 10.70.34.85:/rhs/brick1/d5 Brick6: 10.70.34.86:/rhs/brick1/d6 1) start rebalance and while rebalance is in progress rebooted 10.70.35.85 [root@jay ~]# gluster v rebalance dis_rep status Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 22 220.0MB 60 0 in progress 111.00 localhost 22 220.0MB 60 0 in progress 111.00 10.70.34.105 22 220.0MB 194 0 in progress 111.00 Rachana, Test case mentioned in the bug description does not include any step to reboot the node. Is the issue happening even when the node is not rebooted? Pranith While I have experienced the same issue, what is the result when it completes? Are there missing files after the rebalance? I don't quite understand what gluster> volume rebalance devstatic status Node Rebalanced-files size gluster> volume rebalance devstatic status Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 46989 2 in progress 314.00 localhost 0 0Bytes 46989 2 in progress 314.00 localhost 0 0Bytes 46989 2 in progress 314.00 localhost 0 0Bytes 46989 2 in progress 314.00 omdx14f0 0 0Bytes 46977 0 in progress 314.00 then 373 seconds later the peers are listed: gluster> volume rebalance devstatic status Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 58238 4 in progress 373.00 omhq1832 5 118.0KB 58234 0 in progress 373.00 omdx1448 0 0Bytes 58206 0 in progress 372.00 omdx14f0 0 0Bytes 58172 0 in progress 373.00 please help me understand what I'm suppose to make from this report. While I have experienced the same issue, what is the result when it completes? Are there missing files after the rebalance? I don't quite understand what gluster> volume rebalance devstatic status Node Rebalanced-files size gluster> volume rebalance devstatic status Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 46989 2 in progress 314.00 localhost 0 0Bytes 46989 2 in progress 314.00 localhost 0 0Bytes 46989 2 in progress 314.00 localhost 0 0Bytes 46989 2 in progress 314.00 omdx14f0 0 0Bytes 46977 0 in progress 314.00 then 373 seconds later the peers are listed: gluster> volume rebalance devstatic status Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 58238 4 in progress 373.00 omhq1832 5 118.0KB 58234 0 in progress 373.00 omdx1448 0 0Bytes 58206 0 in progress 372.00 omdx14f0 0 0Bytes 58172 0 in progress 373.00 please help me understand what I'm suppose to make from this report. Here is what is on my node01 /var/log/glusterfs/devstatic-rebalance.log [2013-09-18 20:21:50.479081] I [dht-rebalance.c:1690:gf_defrag_status_get] 0-glusterfs: Rebalance is completed. Time taken is 15788.00 secs [2013-09-18 20:21:50.479104] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3688880, failures: 1127 grep failures /var/log/glusterfs/devstatic-rebalance.log [2013-09-18 19:21:30.803333] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215328, failures: 366 [2013-09-18 19:21:32.276603] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215422, failures: 366 [2013-09-18 19:21:33.283772] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215452, failures: 366 [2013-09-18 19:21:34.131694] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215483, failures: 366 [2013-09-18 19:21:35.043714] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215559, failures: 368 [2013-09-18 19:21:35.859756] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215613, failures: 369 [2013-09-18 19:21:36.595615] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215656, failures: 369 [2013-09-18 19:21:37.475340] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215712, failures: 369 [2013-09-18 19:21:38.243189] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215775, failures: 372 [2013-09-18 19:21:38.915318] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3215999, failures: 372 [2013-09-18 19:23:45.650095] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3231985, failures: 376 [2013-09-18 19:51:19.016311] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3354410, failures: 643 [2013-09-18 19:55:00.390527] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3389551, failures: 779 [2013-09-18 19:58:40.006348] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3406103, failures: 779 [2013-09-18 19:58:40.822272] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3406121, failures: 779 [2013-09-18 19:58:41.445609] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3406140, failures: 779 [2013-09-18 19:58:42.037005] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3406216, failures: 779 [2013-09-18 19:58:42.597474] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3406256, failures: 779 [2013-09-18 19:58:43.077477] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3406280, failures: 779 [2013-09-18 20:09:00.865398] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3548106, failures: 969 [2013-09-18 20:21:50.479104] I [dht-rebalance.c:1693:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 3688880, failures: 1127 not quite sure what "failures" mean in rebalance. Cloning this bug to 3.1. Will be fixed in future release. |