Bug 862618 - Mismatch in failure counts between rebalance logs and status
Mismatch in failure counts between rebalance logs and status
Status: CLOSED WORKSFORME
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute (Show other bugs)
unspecified
x86_64 Linux
low Severity low
: ---
: ---
Assigned To: Nithya Balachandran
storage-qa-internal@redhat.com
triaged, dht-rebalance-usability
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-03 07:53 EDT by shylesh
Modified: 2017-08-29 02:08 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-29 02:08:57 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
rebalance fail counts (7.16 KB, application/x-gzip)
2012-10-03 07:53 EDT, shylesh
no flags Details
glusterd logs (131.65 KB, text/x-log)
2012-10-04 05:36 EDT, shylesh
no flags Details

  None (edit)
Description shylesh 2012-10-03 07:53:40 EDT
Created attachment 620777 [details]
rebalance fail counts

Description of problem:
There is a mismatch in failure counts between status and logs of rebalance

Version-Release number of selected component (if applicable):
[root@rhs-gp-srv4 glusterfs]# rpm -qa | grep gluster
glusterfs-fuse-3.3.0rhsvirt1-6.el6rhs.x86_64
vdsm-gluster-4.9.6-14.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-container-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-server-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-rdma-3.3.0rhsvirt1-6.el6rhs.x86_64
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-geo-replication-3.3.0rhsvirt1-6.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch


How reproducible:


Steps to Reproduce:
1. created a single brick distribute volume 
2. had some VM images on this volume 
3. added a new brick and started rebalance
4. while rebalance is running re-started glusterd on one of the node
5. On that node rebalance status command shows failure count as 1

Actual results:

If we look at the status failure count is 1 but log says failure count as 0

Additional info:
Volume Name: rebal
Type: Distribute
Volume ID: 0952e193-a12c-420a-b752-a77c54b3bf98
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: rhs-gp-srv4.lab.eng.blr.redhat.com:/rebal
Brick2: rhs-gp-srv11.lab.eng.blr.redhat.com:/rebal
Options Reconfigured:
cluster.eager-lock: enable
storage.linux-aio: off
performance.read-ahead: disable
performance.stat-prefetch: disable
performance.io-cache: disable
performance.quick-read: disable



[root@rhs-gp-srv4 glusterfs]# gluster v rebalance rebal status
                                    Node Rebalanced-files          size       scanned      failures         status
                               ---------      -----------   -----------   -----------   -----------   ------------
                               localhost               11 128849050259           42            1      completed
     rhs-gp-srv12.lab.eng.blr.redhat.com                0            0           32            0      completed
     rhs-gp-srv11.lab.eng.blr.redhat.com                0            0           32            0      completed
     rhs-gp-srv15.lab.eng.blr.redhat.com                0            0           32            0      completed




where as log on the peer where we can see the failure says 
====================================
[2012-10-03 07:02:36.639436] I [dht-rebalance.c:1063:gf_defrag_migrate_data] 0-rebal-dht: migrate data called on /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/vms/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd
[2012-10-03 07:02:36.642296] I [dht-rebalance.c:647:dht_migrate_file] 0-rebal-dht: /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/vms/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd.ovf: attempting to move from rebal-client-0 to rebal-client-1
[2012-10-03 07:02:36.647204] I [dht-rebalance.c:856:dht_migrate_file] 0-rebal-dht: completed migration of /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/vms/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd.ovf from subvolume rebal-client-0 to rebal-client-1
[2012-10-03 07:02:36.652056] I [dht-common.c:2337:dht_setxattr] 0-rebal-dht: fixing the layout of /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/tasks
[2012-10-03 07:02:36.652578] I [dht-rebalance.c:1063:gf_defrag_migrate_data] 0-rebal-dht: migrate data called on /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/tasks
[2012-10-03 07:02:36.657795] I [dht-rebalance.c:1619:gf_defrag_status_get] 0-glusterfs: Rebalance is completed
[2012-10-03 07:02:36.657823] I [dht-rebalance.c:1622:gf_defrag_status_get] 0-glusterfs: Files migrated: 11, size: 128849050259, lookups: 42, failures: 0
[2012-10-03 07:02:36.658403] W [glusterfsd.c:906:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3910ae5ccd] (-->/lib64/libpthread.so.0() [0x39112077f1] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x405d2d]))) 0-: received signum (15), shutting down
Comment 2 shishir gowda 2012-10-04 01:54:21 EDT
Not able to reproduce the issue.
Please update the bug, if you hit the issue again.
Also, attach the glusterd logs, along with the cli logs
Comment 3 shylesh 2012-10-04 05:36:49 EDT
Created attachment 621494 [details]
glusterd logs
Comment 4 shishir gowda 2012-10-05 03:24:51 EDT
Not able to reproduce the issue, and bug is related to a stats being incorrect for rebalance. Reducing the severity and priority

Note You need to log in before you can comment on or make changes to this bug.