Bug 1031981

Summary:	DHT:Rebalance: Rebalance is taking too long to migrate (around 271GB data , running from 4+ days not completed)
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	shylesh <shmohan>
Component:	distribute	Assignee:	Nithya Balachandran <nbalacha>
Status:	CLOSED DEFERRED	QA Contact:	storage-qa-internal <storage-qa-internal>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	2.1	CC:	spalai, vbellur
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1286175 (view as bug list)		Environment:
Last Closed:	2015-11-27 12:18:25 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1286175

Description shylesh 2013-11-19 10:03:44 UTC

Description of problem:
Had a dht volume having 6 bricks and created around 271GB data (from NFS and FUSE mount) , added one brick and started rebalance without force option.
After 4 days rebalance is still running and status is 
[root@7-VM1 ~]# gluster volume rebalance master1 status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost           158034         1.6GB       8942232             3         32504    in progress        364096.00
                            10.70.36.133                0        0Bytes       8894301             0             0    in progress        364096.00
                            10.70.36.132          1034130        12.6GB       8477720             9       1002432    in progress        364096.00
                            10.70.36.131          1266891        13.5GB       8120146             1       1158550    in progress        364095.00
volume rebalance: master1: success:


Version-Release number of selected component (if applicable):
[root@7-VM4 ~]# rpm -qa| grep glus
glusterfs-libs-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-devel-3.4.0.44rhs-1.el6rhs.x86_64
gluster-swift-1.8.0-6.11.el6rhs.noarch
gluster-swift-container-1.8.0-6.11.el6rhs.noarch
glusterfs-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.44rhs-1.el6rhs.x86_64
gluster-swift-plugin-1.8.0-7.el6rhs.noarch
vdsm-gluster-4.13.0-17.gitdbbbacd.el6_4.noarch
glusterfs-debuginfo-3.4.0.40rhs-1.el6rhs.x86_64
glusterfs-api-devel-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.44rhs-1.el6rhs.x86_64
samba-glusterfs-3.6.9-160.3.el6rhs.x86_64
gluster-swift-proxy-1.8.0-6.11.el6rhs.noarch
gluster-swift-account-1.8.0-6.11.el6rhs.noarch
glusterfs-geo-replication-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.44rhs-1.el6rhs.x86_64
gluster-swift-object-1.8.0-6.11.el6rhs.noarch


How reproducible:
always

Steps to Reproduce:
1. created a distributed volume of 6 bricks
2. using NFS and FUSE i created many directories and files with deep directory  depth of 100
3.add-brick and started rebalance

Actual results:


Expected results:


Additional info:

It has a deep directory structure around 100. Will try to come up with statistics directory depth 10 and will update the same bug soon

[root@7-VM1 ~]# gluster v info master1
 
Volume Name: master1
Type: Distribute
Volume ID: dde90f71-ca5e-47a3-a328-d3b53d11f70b
Status: Started
Number of Bricks: 7
Transport-type: tcp
Bricks:
Brick1: 10.70.36.130:/rhs/brick2
Brick2: 10.70.36.131:/rhs/brick2
Brick3: 10.70.36.132:/rhs/brick2
Brick4: 10.70.36.130:/rhs/brick3
Brick5: 10.70.36.131:/rhs/brick3
Brick6: 10.70.36.132:/rhs/brick3
Brick7: 10.70.36.131:/rhs/brick1
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on



[root@7-VM1 ~]# gluster volume rebalance master1 status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost           158034         1.6GB       8942232             3         32504    in progress        364096.00
                            10.70.36.133                0        0Bytes       8894301             0             0    in progress        364096.00
                            10.70.36.132          1034130        12.6GB       8477720             9       1002432    in progress        364096.00
                            10.70.36.131          1266891        13.5GB       8120146             1       1158550    in progress        364095.00
volume rebalance: master1: success:


usage from the mount point
---------------------------
root@rhs-client22 master1]# df -h .
Filesystem            Size  Used Avail Use% Mounted on
10.70.36.131:/master1
                      350G  271G   79G  78% /mnt/master1



cluster info
------------
10.70.36.133
10.70.36.132
10.70.36.131
10.70.36.130


Mount point
------------
10.70.36.46:/mnt/master1




attached the sosreports

Comment 5 Susant Kumar Palai 2015-11-27 12:18:25 UTC

Cloning this to 3.1. To be fixed in future release.