862618 – Mismatch in failure counts between rebalance logs and status

Bug 862618 - Mismatch in failure counts between rebalance logs and status

Summary: Mismatch in failure counts between rebalance logs and status

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	distribute
Sub Component:
Version:	unspecified
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Nithya Balachandran
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:	triaged, dht-rebalance-usability
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-10-03 11:53 UTC by shylesh
Modified:	2017-08-29 06:08 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-29 06:08:57 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
rebalance fail counts (7.16 KB, application/x-gzip) 2012-10-03 11:53 UTC, shylesh	no flags	Details
glusterd logs (131.65 KB, text/x-log) 2012-10-04 09:36 UTC, shylesh	no flags	Details
View All

Description shylesh 2012-10-03 11:53:40 UTC

Created attachment 620777 [details]
rebalance fail counts

Description of problem:
There is a mismatch in failure counts between status and logs of rebalance

Version-Release number of selected component (if applicable):
[root@rhs-gp-srv4 glusterfs]# rpm -qa | grep gluster
glusterfs-fuse-3.3.0rhsvirt1-6.el6rhs.x86_64
vdsm-gluster-4.9.6-14.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-container-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-server-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-rdma-3.3.0rhsvirt1-6.el6rhs.x86_64
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-geo-replication-3.3.0rhsvirt1-6.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch


How reproducible:


Steps to Reproduce:
1. created a single brick distribute volume 
2. had some VM images on this volume 
3. added a new brick and started rebalance
4. while rebalance is running re-started glusterd on one of the node
5. On that node rebalance status command shows failure count as 1

Actual results:

If we look at the status failure count is 1 but log says failure count as 0

Additional info:
Volume Name: rebal
Type: Distribute
Volume ID: 0952e193-a12c-420a-b752-a77c54b3bf98
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: rhs-gp-srv4.lab.eng.blr.redhat.com:/rebal
Brick2: rhs-gp-srv11.lab.eng.blr.redhat.com:/rebal
Options Reconfigured:
cluster.eager-lock: enable
storage.linux-aio: off
performance.read-ahead: disable
performance.stat-prefetch: disable
performance.io-cache: disable
performance.quick-read: disable



[root@rhs-gp-srv4 glusterfs]# gluster v rebalance rebal status
                                    Node Rebalanced-files          size       scanned      failures         status
                               ---------      -----------   -----------   -----------   -----------   ------------
                               localhost               11 128849050259           42            1      completed
     rhs-gp-srv12.lab.eng.blr.redhat.com                0            0           32            0      completed
     rhs-gp-srv11.lab.eng.blr.redhat.com                0            0           32            0      completed
     rhs-gp-srv15.lab.eng.blr.redhat.com                0            0           32            0      completed




where as log on the peer where we can see the failure says 
====================================
[2012-10-03 07:02:36.639436] I [dht-rebalance.c:1063:gf_defrag_migrate_data] 0-rebal-dht: migrate data called on /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/vms/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd
[2012-10-03 07:02:36.642296] I [dht-rebalance.c:647:dht_migrate_file] 0-rebal-dht: /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/vms/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd.ovf: attempting to move from rebal-client-0 to rebal-client-1
[2012-10-03 07:02:36.647204] I [dht-rebalance.c:856:dht_migrate_file] 0-rebal-dht: completed migration of /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/vms/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd.ovf from subvolume rebal-client-0 to rebal-client-1
[2012-10-03 07:02:36.652056] I [dht-common.c:2337:dht_setxattr] 0-rebal-dht: fixing the layout of /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/tasks
[2012-10-03 07:02:36.652578] I [dht-rebalance.c:1063:gf_defrag_migrate_data] 0-rebal-dht: migrate data called on /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/tasks
[2012-10-03 07:02:36.657795] I [dht-rebalance.c:1619:gf_defrag_status_get] 0-glusterfs: Rebalance is completed
[2012-10-03 07:02:36.657823] I [dht-rebalance.c:1622:gf_defrag_status_get] 0-glusterfs: Files migrated: 11, size: 128849050259, lookups: 42, failures: 0
[2012-10-03 07:02:36.658403] W [glusterfsd.c:906:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3910ae5ccd] (-->/lib64/libpthread.so.0() [0x39112077f1] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x405d2d]))) 0-: received signum (15), shutting down

Comment 2 shishir gowda 2012-10-04 05:54:21 UTC

Not able to reproduce the issue.
Please update the bug, if you hit the issue again.
Also, attach the glusterd logs, along with the cli logs

Comment 3 shylesh 2012-10-04 09:36:49 UTC

Created attachment 621494 [details]
glusterd logs

Comment 4 shishir gowda 2012-10-05 07:24:51 UTC

Not able to reproduce the issue, and bug is related to a stats being incorrect for rebalance. Reducing the severity and priority

Note You need to log in before you can comment on or make changes to this bug.