1031981 – DHT:Rebalance: Rebalance is taking too long to migrate (around 271GB data , running from 4+ days not completed)

Bug 1031981 - DHT:Rebalance: Rebalance is taking too long to migrate (around 271GB data , running from 4+ days not completed)

Summary: DHT:Rebalance: Rebalance is taking too long to migrate (around 271GB data , r...

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	distribute
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Nithya Balachandran
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1286175
TreeView+	depends on / blocked

Reported:	2013-11-19 10:03 UTC by shylesh
Modified:	2015-11-27 12:18 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1286175 (view as bug list)
Environment:
Last Closed:	2015-11-27 12:18:25 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description shylesh 2013-11-19 10:03:44 UTC

Description of problem:
Had a dht volume having 6 bricks and created around 271GB data (from NFS and FUSE mount) , added one brick and started rebalance without force option.
After 4 days rebalance is still running and status is 
[root@7-VM1 ~]# gluster volume rebalance master1 status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost           158034         1.6GB       8942232             3         32504    in progress        364096.00
                            10.70.36.133                0        0Bytes       8894301             0             0    in progress        364096.00
                            10.70.36.132          1034130        12.6GB       8477720             9       1002432    in progress        364096.00
                            10.70.36.131          1266891        13.5GB       8120146             1       1158550    in progress        364095.00
volume rebalance: master1: success:


Version-Release number of selected component (if applicable):
[root@7-VM4 ~]# rpm -qa| grep glus
glusterfs-libs-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-devel-3.4.0.44rhs-1.el6rhs.x86_64
gluster-swift-1.8.0-6.11.el6rhs.noarch
gluster-swift-container-1.8.0-6.11.el6rhs.noarch
glusterfs-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.44rhs-1.el6rhs.x86_64
gluster-swift-plugin-1.8.0-7.el6rhs.noarch
vdsm-gluster-4.13.0-17.gitdbbbacd.el6_4.noarch
glusterfs-debuginfo-3.4.0.40rhs-1.el6rhs.x86_64
glusterfs-api-devel-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.44rhs-1.el6rhs.x86_64
samba-glusterfs-3.6.9-160.3.el6rhs.x86_64
gluster-swift-proxy-1.8.0-6.11.el6rhs.noarch
gluster-swift-account-1.8.0-6.11.el6rhs.noarch
glusterfs-geo-replication-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.44rhs-1.el6rhs.x86_64
gluster-swift-object-1.8.0-6.11.el6rhs.noarch


How reproducible:
always

Steps to Reproduce:
1. created a distributed volume of 6 bricks
2. using NFS and FUSE i created many directories and files with deep directory  depth of 100
3.add-brick and started rebalance

Actual results:


Expected results:


Additional info:

It has a deep directory structure around 100. Will try to come up with statistics directory depth 10 and will update the same bug soon

[root@7-VM1 ~]# gluster v info master1
 
Volume Name: master1
Type: Distribute
Volume ID: dde90f71-ca5e-47a3-a328-d3b53d11f70b
Status: Started
Number of Bricks: 7
Transport-type: tcp
Bricks:
Brick1: 10.70.36.130:/rhs/brick2
Brick2: 10.70.36.131:/rhs/brick2
Brick3: 10.70.36.132:/rhs/brick2
Brick4: 10.70.36.130:/rhs/brick3
Brick5: 10.70.36.131:/rhs/brick3
Brick6: 10.70.36.132:/rhs/brick3
Brick7: 10.70.36.131:/rhs/brick1
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on



[root@7-VM1 ~]# gluster volume rebalance master1 status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost           158034         1.6GB       8942232             3         32504    in progress        364096.00
                            10.70.36.133                0        0Bytes       8894301             0             0    in progress        364096.00
                            10.70.36.132          1034130        12.6GB       8477720             9       1002432    in progress        364096.00
                            10.70.36.131          1266891        13.5GB       8120146             1       1158550    in progress        364095.00
volume rebalance: master1: success:


usage from the mount point
---------------------------
root@rhs-client22 master1]# df -h .
Filesystem            Size  Used Avail Use% Mounted on
10.70.36.131:/master1
                      350G  271G   79G  78% /mnt/master1



cluster info
------------
10.70.36.133
10.70.36.132
10.70.36.131
10.70.36.130


Mount point
------------
10.70.36.46:/mnt/master1




attached the sosreports

Comment 5 Susant Kumar Palai 2015-11-27 12:18:25 UTC

Cloning this to 3.1. To be fixed in future release.

Note You need to log in before you can comment on or make changes to this bug.