Bug 764116 (GLUSTER-2384)

Summary: volume rebalance is unsuccessful
Product: [Community] GlusterFS Reporter: Saurabh <saurabh>
Component: replicateAssignee: Amar Tumballi <amarts>
Status: CLOSED DUPLICATE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.1.2CC: gluster-bugs, stsmith, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
attachment of glusterd.vol.log file from the brick none

Description Saurabh 2011-02-07 09:44:08 UTC
Created attachment 432

Comment 1 Saurabh 2011-02-07 09:45:06 UTC
assigning to Amar, as discussed with him

Comment 2 Saurabh 2011-02-07 12:41:05 UTC
Hello,
 

   I have a dist-rep volume and I added bricks to it, but the rebalance after addition is unsuccessful,


  ---- it is a distribute-replicate volume-------


gluster> volume info

Volume Name: repdist
Type: Distributed-Replicate
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: domU-12-31-39-02-75-92.compute-1.internal:/mnt/repdist
Brick2: domU-12-31-39-02-9A-DC.compute-1.internal:/mnt/repdist1
Brick3: domU-12-31-39-03-B4-C0.compute-1.internal:/mnt/repdist
Brick4: domU-12-31-39-03-6D-DE.compute-1.internal:/mnt/repdist1
Options Reconfigured:
diagnostics.brick-log-level: WARNING
diagnostics.client-log-level: DEBUG


----- added the bricks here------------

gluster> volume info repdist

Volume Name: repdist
Type: Distributed-Replicate
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: domU-12-31-39-02-75-92.compute-1.internal:/mnt/repdist
Brick2: domU-12-31-39-02-9A-DC.compute-1.internal:/mnt/repdist1
Brick3: domU-12-31-39-03-B4-C0.compute-1.internal:/mnt/repdist
Brick4: domU-12-31-39-03-6D-DE.compute-1.internal:/mnt/repdist1
Brick5: domU-12-31-39-02-75-92.compute-1.internal:/mnt/repdist1
Brick6: domU-12-31-39-03-B4-C0.compute-1.internal:/mnt/repdist2
Options Reconfigured:
diagnostics.brick-log-level: WARNING
diagnostics.client-log-level: DEBUG
gluster> volume rebalance repdist start
starting rebalance on volume repdist has been unsuccessful
Rebalance already started on volume repdist
gluster> volume rebalance repdist status
rebalance not started
gluster>



gluster> volume rebalance repdist stop
stopped rebalance process of volume repdist 

-------rebalace fails-------------

gluster> volume remove-brick repdist domU-12-31-39-02-75-92.compute-1.internal:/mnt/repdist1 domU-12-31-39-03-B4-C0.compute-1.internal:/mnt/repdist2
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
Remove Brick successful
gluster> volume info repdist

Volume Name: repdist
Type: Distributed-Replicate
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: domU-12-31-39-02-75-92.compute-1.internal:/mnt/repdist
Brick2: domU-12-31-39-02-9A-DC.compute-1.internal:/mnt/repdist1
Brick3: domU-12-31-39-03-B4-C0.compute-1.internal:/mnt/repdist
Brick4: domU-12-31-39-03-6D-DE.compute-1.internal:/mnt/repdist1
gluster> volume rebalance repdist start
starting rebalance on volume repdist has been unsuccessful
Rebalance already started on volume repdist
gluster> volume rebalance repdist status
rebalance not started
gluster>



  I have already tried stopping/restarting the glusterd, but the issue still remains same.


   As per the discussion with Amar just recently , he pointed that it may issue similar to bug-1922, though for further debugging in to the present scenario filing this bug.

Comment 3 stsmith 2011-02-15 15:44:29 UTC
I encountered a similar issue:

root@iss01:~# gluster volume rebalance image-service start
starting rebalance on volume image-service has been unsuccessful

I didn't notice any relevant messages in the logs.  After googling I guessed that the issue might be caused by time not being in sync across my peers.  I then ran ntpdate manually and installed ntp on each peer.  After restarting glusterd on each peer I was able to rebalance:

root@iss01:~# gluster volume rebalance image-service start
starting rebalance on volume image-service has been unsuccessful
Rebalance already started on volume image-service
root@iss01:~# /etc/init.d/glusterd restart
 * Stopping glusterd service glusterd
   ...done.
 * Starting glusterd service glusterd
   ...done.
root@iss01:~# gluster volume rebalance image-service status
rebalance not started
root@iss01:~# gluster volume rebalance image-service start
starting rebalance on volume image-service has been successful

Thanks,
Stephen

Comment 4 Amar Tumballi 2011-02-23 09:31:46 UTC
With the rebalance enhancements, this issue will be solved.

*** This bug has been marked as a duplicate of bug 2258 ***