Created attachment 620777 [details] rebalance fail counts Description of problem: There is a mismatch in failure counts between status and logs of rebalance Version-Release number of selected component (if applicable): [root@rhs-gp-srv4 glusterfs]# rpm -qa | grep gluster glusterfs-fuse-3.3.0rhsvirt1-6.el6rhs.x86_64 vdsm-gluster-4.9.6-14.el6rhs.noarch gluster-swift-plugin-1.0-5.noarch gluster-swift-container-1.4.8-4.el6.noarch org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch glusterfs-3.3.0rhsvirt1-6.el6rhs.x86_64 glusterfs-server-3.3.0rhsvirt1-6.el6rhs.x86_64 glusterfs-rdma-3.3.0rhsvirt1-6.el6rhs.x86_64 gluster-swift-proxy-1.4.8-4.el6.noarch gluster-swift-account-1.4.8-4.el6.noarch gluster-swift-doc-1.4.8-4.el6.noarch glusterfs-geo-replication-3.3.0rhsvirt1-6.el6rhs.x86_64 gluster-swift-1.4.8-4.el6.noarch gluster-swift-object-1.4.8-4.el6.noarch How reproducible: Steps to Reproduce: 1. created a single brick distribute volume 2. had some VM images on this volume 3. added a new brick and started rebalance 4. while rebalance is running re-started glusterd on one of the node 5. On that node rebalance status command shows failure count as 1 Actual results: If we look at the status failure count is 1 but log says failure count as 0 Additional info: Volume Name: rebal Type: Distribute Volume ID: 0952e193-a12c-420a-b752-a77c54b3bf98 Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: rhs-gp-srv4.lab.eng.blr.redhat.com:/rebal Brick2: rhs-gp-srv11.lab.eng.blr.redhat.com:/rebal Options Reconfigured: cluster.eager-lock: enable storage.linux-aio: off performance.read-ahead: disable performance.stat-prefetch: disable performance.io-cache: disable performance.quick-read: disable [root@rhs-gp-srv4 glusterfs]# gluster v rebalance rebal status Node Rebalanced-files size scanned failures status --------- ----------- ----------- ----------- ----------- ------------ localhost 11 128849050259 42 1 completed rhs-gp-srv12.lab.eng.blr.redhat.com 0 0 32 0 completed rhs-gp-srv11.lab.eng.blr.redhat.com 0 0 32 0 completed rhs-gp-srv15.lab.eng.blr.redhat.com 0 0 32 0 completed where as log on the peer where we can see the failure says ==================================== [2012-10-03 07:02:36.639436] I [dht-rebalance.c:1063:gf_defrag_migrate_data] 0-rebal-dht: migrate data called on /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/vms/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd [2012-10-03 07:02:36.642296] I [dht-rebalance.c:647:dht_migrate_file] 0-rebal-dht: /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/vms/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd.ovf: attempting to move from rebal-client-0 to rebal-client-1 [2012-10-03 07:02:36.647204] I [dht-rebalance.c:856:dht_migrate_file] 0-rebal-dht: completed migration of /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/vms/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd/59a94ec0-bdf2-4df5-ade0-0812e1ec6ecd.ovf from subvolume rebal-client-0 to rebal-client-1 [2012-10-03 07:02:36.652056] I [dht-common.c:2337:dht_setxattr] 0-rebal-dht: fixing the layout of /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/tasks [2012-10-03 07:02:36.652578] I [dht-rebalance.c:1063:gf_defrag_migrate_data] 0-rebal-dht: migrate data called on /89d20fdd-e22f-4ee5-92a5-2e6540cbcae5/master/tasks [2012-10-03 07:02:36.657795] I [dht-rebalance.c:1619:gf_defrag_status_get] 0-glusterfs: Rebalance is completed [2012-10-03 07:02:36.657823] I [dht-rebalance.c:1622:gf_defrag_status_get] 0-glusterfs: Files migrated: 11, size: 128849050259, lookups: 42, failures: 0 [2012-10-03 07:02:36.658403] W [glusterfsd.c:906:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3910ae5ccd] (-->/lib64/libpthread.so.0() [0x39112077f1] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x405d2d]))) 0-: received signum (15), shutting down
Not able to reproduce the issue. Please update the bug, if you hit the issue again. Also, attach the glusterd logs, along with the cli logs
Created attachment 621494 [details] glusterd logs
Not able to reproduce the issue, and bug is related to a stats being incorrect for rebalance. Reducing the severity and priority