Bug 1027106 - rebalance: even after 48hours the rebalance status says rebalanceds files is "0" and rebalance still in progress
Summary: rebalance: even after 48hours the rebalance status says rebalanceds files is ...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute
Version: 2.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Nithya Balachandran
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1286199 1286206
TreeView+ depends on / blocked
 
Reported: 2013-11-06 07:35 UTC by Saurabh
Modified: 2016-01-19 06:15 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1286199 1286206 (view as bug list)
Environment:
Last Closed: 2015-11-27 12:25:53 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Saurabh 2013-11-06 07:35:48 UTC
Description of problem:
I have rebalance operation going on a volume, after 48 hours and still going
the status says that "0"files as rebalanced. 
This rebalance was triggred, after adding new disks to the nodes in the clusters. As, the earlier the disks on all the nodes were found to be full.
On new disk added, I created bricks using gluster add-brick command.

To be noted the volume is having quota enabled and quota limit set on different subdirs.

Version-Release number of selected component (if applicable):
glusterfs-3.4.038rhs-1

How reproducible:
on one cluster

Steps to Reproduce:
1. fill up the volume, so that disks become full on each node of cluster. 
2. now add new disks to each node of the cluster.
3. create brick on the new disks, using gluster volume add-brick command
4. start rebalance operation
5. take status of rebalane operation after some intervals.

Actual results:
[root@quota5 ~]# gluster volume rebalance dist-rep status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                0        0Bytes             0             0             0    in progress        167416.00
                            10.70.35.191                0        0Bytes             0             0             0    in progress        167415.00
                            10.70.35.108                0        0Bytes             0             0             0    in progress        167416.00
                            10.70.35.144                0        0Bytes             0             0             0    in progress        167415.00
volume rebalance: dist-rep: success: 

Disk that became full, similar df stat can be seen on all nodes.
[root@quota5 ~]# df -h /rhs/brick1
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/RHS_vgvdb-RHS_lv1
                      1.5T  1.5T   16K 100% /rhs/brick1


Disk that is added new, similarly on all nodes
[root@quota5 ~]# df -h /rhs/brick2
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/RHS_vgvdc-RHS_lv1
                      1.6T  403M  1.6T   1% /rhs/brick2


some snippet of rebalance logs from one of the node,

[2013-11-06 07:35:19.143255] I [afr-lk-common.c:1088:afr_lock_blocking] 0-dist-rep-replicate-6: unable to lock on even one child
[2013-11-06 07:35:19.143264] I [afr-transaction.c:1169:afr_post_blocking_entrylk_cbk] 0-dist-rep-replicate-6: Blocking entrylks failed.
[2013-11-06 07:35:19.143273] W [dht-selfheal.c:419:dht_selfheal_dir_mkdir_cbk] 0-dist-rep-dht: selfhealing directory /qa1/linux_untar1382802662/linux-2.6.31.1/arch/mips/include/asm/sgi failed: No such file or directory
[2013-11-06 07:35:19.535273] I [dht-common.c:2650:dht_setxattr] 0-dist-rep-dht: fixing the layout of /qa1/linux_untar1382802662/linux-2.6.31.1/arch/mips/include/asm/sgi


Expected results:
rebalance should progress, "0" rebalance is unexpected.


Additional info:

quota related stats,

[root@quota5 ~]# gluster volume quota dist-rep list
                  Path                   Hard-limit Soft-limit   Used  Available
--------------------------------------------------------------------------------
/                                          2.9TB       80%       2.8TB  59.8GB
/qa1                                     512.0GB       80%     512.0GB  26.5MB
/qa2                                     512.0GB       80%     512.0GB  0Bytes
/qa3                                     100.0GB       80%     100.0GB  0Bytes
/qa4                                     100.0GB       80%     100.0GB  0Bytes
/qa1/dir1                                500.0GB       80%     412.0GB  88.0GB
/qa2/dir1                                500.0GB       80%     412.0GB  88.0GB
/qa5                                     500.0GB       80%     500.0GB  0Bytes
/qa6                                     500.0GB       80%     500.0GB  0Bytes
/qa7                                     500.0GB       80%     298.3GB 201.7GB
/qa8                                     800.0GB       80%     401.6GB 398.4GB


[root@quota5 ~]# gluster volume info
 
Volume Name: dist-rep
Type: Distributed-Replicate
Volume ID: 6df89010-7892-47d4-b897-ba863b31bba3
Status: Started
Number of Bricks: 9 x 2 = 18
Transport-type: tcp
Bricks:
Brick1: 10.70.35.188:/rhs/brick1/d1r1
Brick2: 10.70.35.108:/rhs/brick1/d1r2
Brick3: 10.70.35.191:/rhs/brick1/d2r1
Brick4: 10.70.35.144:/rhs/brick1/d2r2
Brick5: 10.70.35.188:/rhs/brick1/d3r1
Brick6: 10.70.35.108:/rhs/brick1/d3r2
Brick7: 10.70.35.191:/rhs/brick1/d4r1
Brick8: 10.70.35.144:/rhs/brick1/d4r2
Brick9: 10.70.35.188:/rhs/brick1/d5r1
Brick10: 10.70.35.108:/rhs/brick1/d5r2
Brick11: 10.70.35.191:/rhs/brick1/d6r1
Brick12: 10.70.35.144:/rhs/brick1/d6r2
Brick13: 10.70.35.188:/rhs/brick1/d1r1-add
Brick14: 10.70.35.108:/rhs/brick1/d1r2-add
Brick15: 10.70.35.188:/rhs/brick2/d1r1
Brick16: 10.70.35.108:/rhs/brick2/d1r2
Brick17: 10.70.35.191:/rhs/brick2/d2r1
Brick18: 10.70.35.144:/rhs/brick2/d2r2
Options Reconfigured:
features.quota: on
features.quota-deem-statfs: on

Comment 3 Saurabh 2013-11-06 08:12:43 UTC
after effect of it is this BZ is that,
I tried to create directory, it got created but "ls" does not report it.
As well as a retry to create the same directory fails as it says directory exists.

see the logs over here,

[root@rhslong03 ~]# cd /mnt/nfs-test
[root@rhslong03 nfs-test]# ls
qa1  qa2  qa3  qa4  qa5  qa6  qa7  qa8
[root@rhslong03 nfs-test]# pwd
/mnt/nfs-test
[root@rhslong03 nfs-test]# mkdir dir1
[root@rhslong03 nfs-test]# ls
qa1  qa2  qa3  qa4  qa5  qa6  qa7  qa8
[root@rhslong03 nfs-test]# mkdir dir1
mkdir: cannot create directory `dir1': File exists
[root@rhslong03 nfs-test]# ls
qa1  qa2  qa3  qa4  qa5  qa6  qa7  qa8
[root@rhslong03 nfs-test]# ls dir1
ls: cannot open directory dir1: No such file or directory


Note You need to log in before you can comment on or make changes to this bug.