| Summary: | rebalance: even after 48hours the rebalance status says rebalanceds files is "0" and rebalance still in progress | |||
|---|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Saurabh <saujain> | |
| Component: | distribute | Assignee: | Nithya Balachandran <nbalacha> | |
| Status: | CLOSED DEFERRED | QA Contact: | storage-qa-internal <storage-qa-internal> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 2.1 | CC: | mzywusko, spalai, vbellur | |
| Target Milestone: | --- | |||
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1286199 1286206 (view as bug list) | Environment: | ||
| Last Closed: | 2015-11-27 12:25:53 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | ||||
| Bug Blocks: | 1286199, 1286206 | |||
after effect of it is this BZ is that, I tried to create directory, it got created but "ls" does not report it. As well as a retry to create the same directory fails as it says directory exists. see the logs over here, [root@rhslong03 ~]# cd /mnt/nfs-test [root@rhslong03 nfs-test]# ls qa1 qa2 qa3 qa4 qa5 qa6 qa7 qa8 [root@rhslong03 nfs-test]# pwd /mnt/nfs-test [root@rhslong03 nfs-test]# mkdir dir1 [root@rhslong03 nfs-test]# ls qa1 qa2 qa3 qa4 qa5 qa6 qa7 qa8 [root@rhslong03 nfs-test]# mkdir dir1 mkdir: cannot create directory `dir1': File exists [root@rhslong03 nfs-test]# ls qa1 qa2 qa3 qa4 qa5 qa6 qa7 qa8 [root@rhslong03 nfs-test]# ls dir1 ls: cannot open directory dir1: No such file or directory |
Description of problem: I have rebalance operation going on a volume, after 48 hours and still going the status says that "0"files as rebalanced. This rebalance was triggred, after adding new disks to the nodes in the clusters. As, the earlier the disks on all the nodes were found to be full. On new disk added, I created bricks using gluster add-brick command. To be noted the volume is having quota enabled and quota limit set on different subdirs. Version-Release number of selected component (if applicable): glusterfs-3.4.038rhs-1 How reproducible: on one cluster Steps to Reproduce: 1. fill up the volume, so that disks become full on each node of cluster. 2. now add new disks to each node of the cluster. 3. create brick on the new disks, using gluster volume add-brick command 4. start rebalance operation 5. take status of rebalane operation after some intervals. Actual results: [root@quota5 ~]# gluster volume rebalance dist-rep status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 0 in progress 167416.00 10.70.35.191 0 0Bytes 0 0 0 in progress 167415.00 10.70.35.108 0 0Bytes 0 0 0 in progress 167416.00 10.70.35.144 0 0Bytes 0 0 0 in progress 167415.00 volume rebalance: dist-rep: success: Disk that became full, similar df stat can be seen on all nodes. [root@quota5 ~]# df -h /rhs/brick1 Filesystem Size Used Avail Use% Mounted on /dev/mapper/RHS_vgvdb-RHS_lv1 1.5T 1.5T 16K 100% /rhs/brick1 Disk that is added new, similarly on all nodes [root@quota5 ~]# df -h /rhs/brick2 Filesystem Size Used Avail Use% Mounted on /dev/mapper/RHS_vgvdc-RHS_lv1 1.6T 403M 1.6T 1% /rhs/brick2 some snippet of rebalance logs from one of the node, [2013-11-06 07:35:19.143255] I [afr-lk-common.c:1088:afr_lock_blocking] 0-dist-rep-replicate-6: unable to lock on even one child [2013-11-06 07:35:19.143264] I [afr-transaction.c:1169:afr_post_blocking_entrylk_cbk] 0-dist-rep-replicate-6: Blocking entrylks failed. [2013-11-06 07:35:19.143273] W [dht-selfheal.c:419:dht_selfheal_dir_mkdir_cbk] 0-dist-rep-dht: selfhealing directory /qa1/linux_untar1382802662/linux-2.6.31.1/arch/mips/include/asm/sgi failed: No such file or directory [2013-11-06 07:35:19.535273] I [dht-common.c:2650:dht_setxattr] 0-dist-rep-dht: fixing the layout of /qa1/linux_untar1382802662/linux-2.6.31.1/arch/mips/include/asm/sgi Expected results: rebalance should progress, "0" rebalance is unexpected. Additional info: quota related stats, [root@quota5 ~]# gluster volume quota dist-rep list Path Hard-limit Soft-limit Used Available -------------------------------------------------------------------------------- / 2.9TB 80% 2.8TB 59.8GB /qa1 512.0GB 80% 512.0GB 26.5MB /qa2 512.0GB 80% 512.0GB 0Bytes /qa3 100.0GB 80% 100.0GB 0Bytes /qa4 100.0GB 80% 100.0GB 0Bytes /qa1/dir1 500.0GB 80% 412.0GB 88.0GB /qa2/dir1 500.0GB 80% 412.0GB 88.0GB /qa5 500.0GB 80% 500.0GB 0Bytes /qa6 500.0GB 80% 500.0GB 0Bytes /qa7 500.0GB 80% 298.3GB 201.7GB /qa8 800.0GB 80% 401.6GB 398.4GB [root@quota5 ~]# gluster volume info Volume Name: dist-rep Type: Distributed-Replicate Volume ID: 6df89010-7892-47d4-b897-ba863b31bba3 Status: Started Number of Bricks: 9 x 2 = 18 Transport-type: tcp Bricks: Brick1: 10.70.35.188:/rhs/brick1/d1r1 Brick2: 10.70.35.108:/rhs/brick1/d1r2 Brick3: 10.70.35.191:/rhs/brick1/d2r1 Brick4: 10.70.35.144:/rhs/brick1/d2r2 Brick5: 10.70.35.188:/rhs/brick1/d3r1 Brick6: 10.70.35.108:/rhs/brick1/d3r2 Brick7: 10.70.35.191:/rhs/brick1/d4r1 Brick8: 10.70.35.144:/rhs/brick1/d4r2 Brick9: 10.70.35.188:/rhs/brick1/d5r1 Brick10: 10.70.35.108:/rhs/brick1/d5r2 Brick11: 10.70.35.191:/rhs/brick1/d6r1 Brick12: 10.70.35.144:/rhs/brick1/d6r2 Brick13: 10.70.35.188:/rhs/brick1/d1r1-add Brick14: 10.70.35.108:/rhs/brick1/d1r2-add Brick15: 10.70.35.188:/rhs/brick2/d1r1 Brick16: 10.70.35.108:/rhs/brick2/d1r2 Brick17: 10.70.35.191:/rhs/brick2/d2r1 Brick18: 10.70.35.144:/rhs/brick2/d2r2 Options Reconfigured: features.quota: on features.quota-deem-statfs: on