Bug 916226

Summary: Rebalance on replicate volume does not complete
Product: [Community] GlusterFS Reporter: Pranith Kumar K <pkarampu>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs, yinyin2010
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 917325 (view as bug list) Environment:
Last Closed: 2013-07-24 18:03:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 917325, 952693    

Description Pranith Kumar K 2013-02-27 15:08:16 UTC
Description of problem:
With gluster test frame work if we execute the following script the rebalance never completes.
#!/bin/bash

. $(dirname $0)/../include.rc
. $(dirname $0)/../volume.rc

cleanup;

TEST glusterd
TEST pidof glusterd

TEST $CLI volume create $V0 replica 2 $H0:$B0/${V0}0 $H0:$B0/${V0}1 $H0:$B0/${V0}2 $H0:$B0/${V0}3
TEST $CLI volume set $V0 cluster.eager-lock on
TEST $CLI volume start $V0

## Mount FUSE
TEST glusterfs -s $H0 --volfile-id $V0 $M0;

TEST mkdir $M0/dir{1..10};
TEST touch $M0/dir{1..10}/files{1..10};

# add a brick process
TEST $CLI volume add-brick $V0 $H0:$B0/${V0}4 $H0:/$B0/${V0}5

TEST $CLI volume rebalance $V0 start force


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Vijay Bellur 2013-03-01 22:51:35 UTC
CHANGE: http://review.gluster.org/4588 (cluster/afr: Turn on eager-lock for fd DATA transactions) merged in master by Anand Avati (avati)

Comment 2 yin.yin 2013-03-04 08:17:09 UTC
bug-916226 check rebalance status not correct.

EXPECT_WITHIN 60 "success:" rebalance_status_field $V0
it is still in progress, not finished, but it print success.
should use another method to check rebalance staus, to check the status.

[root@cc02 tests]# gluster volume rebalance patchy status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                0        0Bytes             1             0    in progress            56.00
volume rebalance: patchy: success:

Comment 3 Pranith Kumar K 2013-03-04 10:52:43 UTC
(In reply to comment #2)
> bug-916226 check rebalance status not correct.
> 
> EXPECT_WITHIN 60 "success:" rebalance_status_field $V0
> it is still in progress, not finished, but it print success.
> should use another method to check rebalance staus, to check the status.
> 
> [root@cc02 tests]# gluster volume rebalance patchy status
>                                     Node Rebalanced-files          size     
> scanned      failures         status run time in secs
>                                ---------      -----------   -----------  
> -----------   -----------   ------------   --------------
>                                localhost                0        0Bytes     
> 1             0    in progress            56.00
> volume rebalance: patchy: success:

Thanks a lot for pointing this out. The following patch should address this.
http://review.gluster.org/4614

Comment 4 Anand Avati 2013-04-29 15:17:56 UTC
REVIEW: http://review.gluster.org/4899 (cluster/afr: Turn on eager-lock for fd DATA transactions) posted (#1) for review on release-3.4 by Emmanuel Dreyfus (manu)

Comment 5 Anand Avati 2013-05-07 12:00:43 UTC
COMMIT: http://review.gluster.org/4899 committed in release-3.4 by Vijay Bellur (vbellur) 
------
commit eaa3cdcb80befe3fe7c6b181672bface9d4ff539
Author: Emmanuel Dreyfus <manu>
Date:   Mon Apr 29 17:15:56 2013 +0200

    cluster/afr: Turn on eager-lock for fd DATA transactions
    
    Problem:
    With the present implementation, eager-lock is issued for
    any fd fop. eager-lock is being transferred to metadata
    transactions. But the lk-owner is set to local->fd address
    only for DATA transactions, but for METADATA transactions
    it is frame->root. Because of this unlock on the eager-lock fails
    and rebalance hangs.
    
    Fix:
    Enable eager-lock for fd DATA transactions
    
    This is a backport of change If30df7486a0b2f5e4150d3259d1261f81473ce8a
    http://review.gluster.org/#/c/4588/
    
    BUG: 916226
    Change-Id: Id41ac17f467c37e7fd8863e0c19932d7b16344f8
    Signed-off-by: Emmanuel Dreyfus <manu>
    Reviewed-on: http://review.gluster.org/4899
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>