Bug 1031981

Summary: DHT:Rebalance: Rebalance is taking too long to migrate (around 271GB data , running from 4+ days not completed)
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: shylesh <shmohan>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED DEFERRED QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.1CC: spalai, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1286175 (view as bug list) Environment:
Last Closed: 2015-11-27 12:18:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1286175    

Description shylesh 2013-11-19 10:03:44 UTC
Description of problem:
Had a dht volume having 6 bricks and created around 271GB data (from NFS and FUSE mount) , added one brick and started rebalance without force option.
After 4 days rebalance is still running and status is 
[root@7-VM1 ~]# gluster volume rebalance master1 status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost           158034         1.6GB       8942232             3         32504    in progress        364096.00
                            10.70.36.133                0        0Bytes       8894301             0             0    in progress        364096.00
                            10.70.36.132          1034130        12.6GB       8477720             9       1002432    in progress        364096.00
                            10.70.36.131          1266891        13.5GB       8120146             1       1158550    in progress        364095.00
volume rebalance: master1: success:


Version-Release number of selected component (if applicable):
[root@7-VM4 ~]# rpm -qa| grep glus
glusterfs-libs-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-devel-3.4.0.44rhs-1.el6rhs.x86_64
gluster-swift-1.8.0-6.11.el6rhs.noarch
gluster-swift-container-1.8.0-6.11.el6rhs.noarch
glusterfs-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.44rhs-1.el6rhs.x86_64
gluster-swift-plugin-1.8.0-7.el6rhs.noarch
vdsm-gluster-4.13.0-17.gitdbbbacd.el6_4.noarch
glusterfs-debuginfo-3.4.0.40rhs-1.el6rhs.x86_64
glusterfs-api-devel-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.44rhs-1.el6rhs.x86_64
samba-glusterfs-3.6.9-160.3.el6rhs.x86_64
gluster-swift-proxy-1.8.0-6.11.el6rhs.noarch
gluster-swift-account-1.8.0-6.11.el6rhs.noarch
glusterfs-geo-replication-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.44rhs-1.el6rhs.x86_64
gluster-swift-object-1.8.0-6.11.el6rhs.noarch


How reproducible:
always

Steps to Reproduce:
1. created a distributed volume of 6 bricks
2. using NFS and FUSE i created many directories and files with deep directory  depth of 100
3.add-brick and started rebalance

Actual results:


Expected results:


Additional info:

It has a deep directory structure around 100. Will try to come up with statistics directory depth 10 and will update the same bug soon

[root@7-VM1 ~]# gluster v info master1
 
Volume Name: master1
Type: Distribute
Volume ID: dde90f71-ca5e-47a3-a328-d3b53d11f70b
Status: Started
Number of Bricks: 7
Transport-type: tcp
Bricks:
Brick1: 10.70.36.130:/rhs/brick2
Brick2: 10.70.36.131:/rhs/brick2
Brick3: 10.70.36.132:/rhs/brick2
Brick4: 10.70.36.130:/rhs/brick3
Brick5: 10.70.36.131:/rhs/brick3
Brick6: 10.70.36.132:/rhs/brick3
Brick7: 10.70.36.131:/rhs/brick1
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on



[root@7-VM1 ~]# gluster volume rebalance master1 status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost           158034         1.6GB       8942232             3         32504    in progress        364096.00
                            10.70.36.133                0        0Bytes       8894301             0             0    in progress        364096.00
                            10.70.36.132          1034130        12.6GB       8477720             9       1002432    in progress        364096.00
                            10.70.36.131          1266891        13.5GB       8120146             1       1158550    in progress        364095.00
volume rebalance: master1: success:


usage from the mount point
---------------------------
root@rhs-client22 master1]# df -h .
Filesystem            Size  Used Avail Use% Mounted on
10.70.36.131:/master1
                      350G  271G   79G  78% /mnt/master1



cluster info
------------
10.70.36.133
10.70.36.132
10.70.36.131
10.70.36.130


Mount point
------------
10.70.36.46:/mnt/master1




attached the sosreports

Comment 5 Susant Kumar Palai 2015-11-27 12:18:25 UTC
Cloning this to 3.1. To be fixed in future release.