Hide Forgot
Description of problem: By running rebalance on a distributed-replicat volume , if any of the file migration fails due to space issue it's shown in the "failed" list rather than "skipped" Version-Release number of selected component (if applicable): Nightly BVT How reproducible: Always in BVT runs Steps to Reproduce: 1.4x2 distributed-replicate volume on which automated sanity was running 2. it creates symlinks on the mount point 3. add a brick pair and invoke rebalance gluster v rebalance <vol> start Actual results: :: [ PASS ] :: Running 'rhts-sync-block -s rebal_run.70 rhsauto019.lab.eng.blr.redhat.com rhsauto008.lab.eng.blr.redhat.com rhsauto021.lab.eng.blr.redhat.com rhsauto022.lab.eng.blr.redhat.com' (Expected 0, got 0) :: [ 22:41:42 ] :: rebal_get_status - Check status of rebalance and looks for errors. :: [ 22:41:42 ] :: Machine in recipe is MASTERNODE rhsauto019.lab.eng.blr.redhat.com :: [ 22:41:42 ] :: Logging initial status of rebalance: Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 32 717Bytes 233 33 0 in progress 1.00 rhsauto008.lab.eng.blr.redhat.com 30 674Bytes 240 27 0 in progress 1.00 rhsauto022.lab.eng.blr.redhat.com 0 0Bytes 325 2 0 completed 0.00 volume rebalance: rebalvol: success: from rebalance logs ------------------- [2013-12-03 03:16:22.993932] I [dht-rebalance.c:672:dht_migrate_file] 0-hosdu-dht: /90: attempting to move from hosdu-replicate-0 to hosdu-replicate-2 [2013-12-03 03:16:23.008892] W [dht-rebalance.c:374:__dht_check_free_space] 0-hosdu-dht: data movement attempted from node (hosdu-replicate-0) with higher disk space to a node (hosdu-replicate-2) with lesser disk space (/90) [2013-12-03 03:16:23.025123] I [dht-rebalance.c:672:dht_migrate_file] 0-hosdu-dht: /11: attempting to move from hosdu-replicate-1 to hosdu-replicate-3 [2013-12-03 03:16:23.038697] W [dht-rebalance.c:374:__dht_check_free_space] 0-hosdu-dht: data movement attempted from node (hosdu-replicate-1) with higher disk space to a node (hosdu-replicate-3) with lesser disk space (/11) [2013-12-03 03:16:23.047018] I [dht-rebalance.c:672:dht_migrate_file] 0-hosdu-dht: /20: attempting to move from hosdu-replicate-1 to hosdu-replicate-3 [2013-12-03 03:16:23.060844] W [dht-rebalance.c:374:__dht_check_free_space] 0-hosdu-dht: data movement attempted from node (hosdu-replicate-1) with higher disk space to a node (hosdu-replicate-3) with lesser disk space (/20) [2013-12-03 03:16:23.071069] I [dht-rebalance.c:672:dht_migrate_file] 0-hosdu-dht: /35: attempting to move from hosdu-replicate-1 to hosdu-replicate-3 [2013-12-03 03:16:23.086840] W [dht-rebalance.c:374:__dht_check_free_space] 0-hosdu-dht: data movement attempted from node (hosdu-replicate-1) with higher disk space to a node (hosdu-replicate-3) with lesser disk space (/35) [2013-12-03 03:16:23.094162] I [dht-rebalance.c:672:dht_migrate_file] 0-hosdu-dht: /46: attempting to move from hosdu-replicate-1 to hosdu-replicate-3 [2013-12-03 03:16:23.113757] W [dht-rebalance.c:374:__dht_check_free_space] 0-hosdu-dht: data movement attempted from node (hosdu-replicate-1) with higher disk space to a node (hosdu-replicate-3) with lesser disk space (/46) [2013-12-03 03:16:23.125096] I [dht-rebalance.c:672:dht_migrate_file] 0-hosdu-dht: /49: attempting to move from hosdu-replicate-1 to hosdu-replicate-3 [2013-12-03 03:16:23.141371] W [dht-rebalance.c:374:__dht_check_free_space] 0-hosdu-dht: data movement attempted from node (hosdu-replicate-1) with higher disk space to a node (hosdu-replicate-3) with lesse Above warning messages are supposed to be considered as "skipped" instead they are considered as "failed"
This issue came as git branch for BVT changed from origin/rhs-2.1-u1 to origin/rhs-2.1 in downstream code
Created attachment 832061 [details] Rebalance logs Rebalance log during the test
I see the following in the logs, when I re-created it. [2013-12-15 17:37:50.710296] D [dht-rebalance.c:1290:gf_defrag_migrate_data] 0-r2-dht: migrate-data skipped for /96 due to space constraints [2013-12-15 17:37:50.729342] D [dht-rebalance.c:1290:gf_defrag_migrate_data] 0-r2-dht: migrate-data skipped for /97 due to space constraints [2013-12-15 17:37:50.749216] D [dht-rebalance.c:1290:gf_defrag_migrate_data] 0-r2-dht: migrate-data skipped for /98 due to space constraints [2013-12-15 17:37:50.768598] D [dht-rebalance.c:1290:gf_defrag_migrate_data] 0-r2-dht: migrate-data skipped for /99 due to space constraints root@pranithk-vm1 - ~ 17:46:19 :) ⚡ grep "space constraints" /usr/local/var/log/glusterfs/r2-rebalance.log | wc -l 51 With the fix: root@pranithk-vm1 - /mnt/r2 17:54:04 :) ⚡ gluster volume rebalance r2 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 36 67Bytes 236 0 51 completed 2.00 10.70.42.237 0 0Bytes 200 0 0 completed 2.00 10.70.43.148 0 0Bytes 200 0 0 completed 2.00 volume rebalance: r2: success:
Verified on 3.4.0.52rhs-1.el6rhs.x86_64. Now skipped count will be shown properly in case of failures due to space constraints.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html