Bug 862618
Summary: | Mismatch in failure counts between rebalance logs and status | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | shylesh <shmohan> | ||||||
Component: | distribute | Assignee: | Nithya Balachandran <nbalacha> | ||||||
Status: | CLOSED WORKSFORME | QA Contact: | storage-qa-internal <storage-qa-internal> | ||||||
Severity: | low | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | unspecified | CC: | grajaiya, nbalacha, rgowdapp, rhs-bugs, rwheeler, smohan, vbellur | ||||||
Target Milestone: | --- | Keywords: | ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | triaged, dht-rebalance-usability | ||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-08-29 06:08:57 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Not able to reproduce the issue. Please update the bug, if you hit the issue again. Also, attach the glusterd logs, along with the cli logs Created attachment 621494 [details]
glusterd logs
Not able to reproduce the issue, and bug is related to a stats being incorrect for rebalance. Reducing the severity and priority |
Created attachment 620777 [details] rebalance fail counts Description of problem: There is a mismatch in failure counts between status and logs of rebalance Version-Release number of selected component (if applicable): [root@rhs-gp-srv4 glusterfs]# rpm -qa | grep gluster glusterfs-fuse-3.3.0rhsvirt1-6.el6rhs.x86_64 vdsm-gluster-4.9.6-14.el6rhs.noarch gluster-swift-plugin-1.0-5.noarch gluster-swift-container-1.4.8-4.el6.noarch org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch glusterfs-3.3.0rhsvirt1-6.el6rhs.x86_64 glusterfs-server-3.3.0rhsvirt1-6.el6rhs.x86_64 glusterfs-rdma-3.3.0rhsvirt1-6.el6rhs.x86_64 gluster-swift-proxy-1.4.8-4.el6.noarch gluster-swift-account-1.4.8-4.el6.noarch gluster-swift-doc-1.4.8-4.el6.noarch glusterfs-geo-replication-3.3.0rhsvirt1-6.el6rhs.x86_64 gluster-swift-1.4.8-4.el6.noarch gluster-swift-object-1.4.8-4.el6.noarch How reproducible: Steps to Reproduce: 1. created a single brick distribute volume 2. had some VM images on this volume 3. added a new brick and started rebalance 4. while rebalance is running re-started glusterd on one of the node 5. On that node rebalance status command shows failure count as 1 Actual results: If we look at the status failure count is 1 but log says failure count as 0 Additional info: Volume Name: rebal Type: Distribute Volume ID: 0952e193-a12c-420a-b752-a77c54b3bf98 Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: rhs-gp-srv4.lab.eng.blr.redhat.com:/rebal Brick2: rhs-gp-srv11.lab.eng.blr.redhat.com:/rebal Options Reconfigured: cluster.eager-lock: enable storage.linux-aio: off performance.read-ahead: disable performance.stat-prefetch: disable performance.io-cache: disable performance.quick-read: disable [root@rhs-gp-srv4 glusterfs]# gluster v rebalance rebal status Node Rebalanced-files size scanned failures status --------- ----------- ----------- ----------- ----------- ------------ localhost 11 128849050259 42 1 completed rhs-gp-srv12.lab.eng.blr.redhat.com 0 0 32 0 completed rhs-gp-srv11.lab.eng.blr.redhat.com 0 0 32 0 completed rhs-gp-srv15.lab.eng.blr.redhat.com 0 0 32 0 completed where as log on the peer where we can see the failure says ==================================== [2012-10-03 07:02:36.639436] I [dht-rebalance.c:1063:gf_defrag_migrate_data] 0-rebal-dht: migrate data called on /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/vms/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd [2012-10-03 07:02:36.642296] I [dht-rebalance.c:647:dht_migrate_file] 0-rebal-dht: /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/vms/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd.ovf: attempting to move from rebal-client-0 to rebal-client-1 [2012-10-03 07:02:36.647204] I [dht-rebalance.c:856:dht_migrate_file] 0-rebal-dht: completed migration of /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/vms/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd.ovf from subvolume rebal-client-0 to rebal-client-1 [2012-10-03 07:02:36.652056] I [dht-common.c:2337:dht_setxattr] 0-rebal-dht: fixing the layout of /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/tasks [2012-10-03 07:02:36.652578] I [dht-rebalance.c:1063:gf_defrag_migrate_data] 0-rebal-dht: migrate data called on /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/tasks [2012-10-03 07:02:36.657795] I [dht-rebalance.c:1619:gf_defrag_status_get] 0-glusterfs: Rebalance is completed [2012-10-03 07:02:36.657823] I [dht-rebalance.c:1622:gf_defrag_status_get] 0-glusterfs: Files migrated: 11, size: 128849050259, lookups: 42, failures: 0 [2012-10-03 07:02:36.658403] W [glusterfsd.c:906:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3910ae5ccd] (-->/lib64/libpthread.so.0() [0x39112077f1] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x405d2d]))) 0-: received signum (15), shutting down