Bug 862618

Summary:

Mismatch in failure counts between rebalance logs and status

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

shylesh <shmohan>

Component:

distribute

Assignee:

Nithya Balachandran <nbalacha>

Status:

CLOSED WORKSFORME

QA Contact:

storage-qa-internal <storage-qa-internal>

Severity:

low

Docs Contact:

Priority:

low

Version:

unspecified

CC:

grajaiya, nbalacha, rgowdapp, rhs-bugs, rwheeler, smohan, vbellur

Target Milestone:

---

Keywords:

ZStream

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

triaged, dht-rebalance-usability

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-08-29 06:08:57 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
rebalance fail counts	none
glusterd logs	none

Description shylesh 2012-10-03 11:53:40 UTC

Created attachment 620777 [details]
rebalance fail counts

Description of problem:
There is a mismatch in failure counts between status and logs of rebalance

Version-Release number of selected component (if applicable):
[root@rhs-gp-srv4 glusterfs]# rpm -qa | grep gluster
glusterfs-fuse-3.3.0rhsvirt1-6.el6rhs.x86_64
vdsm-gluster-4.9.6-14.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-container-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-server-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-rdma-3.3.0rhsvirt1-6.el6rhs.x86_64
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-geo-replication-3.3.0rhsvirt1-6.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch


How reproducible:


Steps to Reproduce:
1. created a single brick distribute volume 
2. had some VM images on this volume 
3. added a new brick and started rebalance
4. while rebalance is running re-started glusterd on one of the node
5. On that node rebalance status command shows failure count as 1

Actual results:

If we look at the status failure count is 1 but log says failure count as 0

Additional info:
Volume Name: rebal
Type: Distribute
Volume ID: 0952e193-a12c-420a-b752-a77c54b3bf98
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: rhs-gp-srv4.lab.eng.blr.redhat.com:/rebal
Brick2: rhs-gp-srv11.lab.eng.blr.redhat.com:/rebal
Options Reconfigured:
cluster.eager-lock: enable
storage.linux-aio: off
performance.read-ahead: disable
performance.stat-prefetch: disable
performance.io-cache: disable
performance.quick-read: disable



[root@rhs-gp-srv4 glusterfs]# gluster v rebalance rebal status
                                    Node Rebalanced-files          size       scanned      failures         status
                               ---------      -----------   -----------   -----------   -----------   ------------
                               localhost               11 128849050259           42            1      completed
     rhs-gp-srv12.lab.eng.blr.redhat.com                0            0           32            0      completed
     rhs-gp-srv11.lab.eng.blr.redhat.com                0            0           32            0      completed
     rhs-gp-srv15.lab.eng.blr.redhat.com                0            0           32            0      completed




where as log on the peer where we can see the failure says 
====================================
[2012-10-03 07:02:36.639436] I [dht-rebalance.c:1063:gf_defrag_migrate_data] 0-rebal-dht: migrate data called on /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/vms/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd
[2012-10-03 07:02:36.642296] I [dht-rebalance.c:647:dht_migrate_file] 0-rebal-dht: /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/vms/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd.ovf: attempting to move from rebal-client-0 to rebal-client-1
[2012-10-03 07:02:36.647204] I [dht-rebalance.c:856:dht_migrate_file] 0-rebal-dht: completed migration of /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/vms/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd.ovf from subvolume rebal-client-0 to rebal-client-1
[2012-10-03 07:02:36.652056] I [dht-common.c:2337:dht_setxattr] 0-rebal-dht: fixing the layout of /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/tasks
[2012-10-03 07:02:36.652578] I [dht-rebalance.c:1063:gf_defrag_migrate_data] 0-rebal-dht: migrate data called on /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/tasks
[2012-10-03 07:02:36.657795] I [dht-rebalance.c:1619:gf_defrag_status_get] 0-glusterfs: Rebalance is completed
[2012-10-03 07:02:36.657823] I [dht-rebalance.c:1622:gf_defrag_status_get] 0-glusterfs: Files migrated: 11, size: 128849050259, lookups: 42, failures: 0
[2012-10-03 07:02:36.658403] W [glusterfsd.c:906:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3910ae5ccd] (-->/lib64/libpthread.so.0() [0x39112077f1] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x405d2d]))) 0-: received signum (15), shutting down

Comment 2 shishir gowda 2012-10-04 05:54:21 UTC

Not able to reproduce the issue.
Please update the bug, if you hit the issue again.
Also, attach the glusterd logs, along with the cli logs

Comment 3 shylesh 2012-10-04 09:36:49 UTC

Created attachment 621494 [details]
glusterd logs

Comment 4 shishir gowda 2012-10-05 07:24:51 UTC

Not able to reproduce the issue, and bug is related to a stats being incorrect for rebalance. Reducing the severity and priority