| Summary: | BVT: Rebalance skipped files are counted as failures in the status | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | shylesh <shmohan> | ||||
| Component: | glusterfs | Assignee: | Pranith Kumar K <pkarampu> | ||||
| Status: | CLOSED ERRATA | QA Contact: | shylesh <shmohan> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 2.1 | CC: | jturner, lmohanty, vagarwal, vbellur | ||||
| Target Milestone: | --- | Keywords: | Regression, ZStream | ||||
| Target Release: | RHGS 2.1.2 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | glusterfs-3.4.0.50rhs | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-02-25 08:07:16 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
This issue came as git branch for BVT changed from origin/rhs-2.1-u1 to origin/rhs-2.1 in downstream code Created attachment 832061 [details]
Rebalance logs
Rebalance log during the test
I see the following in the logs, when I re-created it.
[2013-12-15 17:37:50.710296] D [dht-rebalance.c:1290:gf_defrag_migrate_data] 0-r2-dht: migrate-data skipped for /96 due to space constraints
[2013-12-15 17:37:50.729342] D [dht-rebalance.c:1290:gf_defrag_migrate_data] 0-r2-dht: migrate-data skipped for /97 due to space constraints
[2013-12-15 17:37:50.749216] D [dht-rebalance.c:1290:gf_defrag_migrate_data] 0-r2-dht: migrate-data skipped for /98 due to space constraints
[2013-12-15 17:37:50.768598] D [dht-rebalance.c:1290:gf_defrag_migrate_data] 0-r2-dht: migrate-data skipped for /99 due to space constraints
root@pranithk-vm1 - ~
17:46:19 :) ⚡ grep "space constraints" /usr/local/var/log/glusterfs/r2-rebalance.log | wc -l
51
With the fix:
root@pranithk-vm1 - /mnt/r2
17:54:04 :) ⚡ gluster volume rebalance r2 status
Node Rebalanced-files size scanned failures skipped status run time in secs
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 36 67Bytes 236 0 51 completed 2.00
10.70.42.237 0 0Bytes 200 0 0 completed 2.00
10.70.43.148 0 0Bytes 200 0 0 completed 2.00
volume rebalance: r2: success:
Verified on 3.4.0.52rhs-1.el6rhs.x86_64. Now skipped count will be shown properly in case of failures due to space constraints. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html |
Description of problem: By running rebalance on a distributed-replicat volume , if any of the file migration fails due to space issue it's shown in the "failed" list rather than "skipped" Version-Release number of selected component (if applicable): Nightly BVT How reproducible: Always in BVT runs Steps to Reproduce: 1.4x2 distributed-replicate volume on which automated sanity was running 2. it creates symlinks on the mount point 3. add a brick pair and invoke rebalance gluster v rebalance <vol> start Actual results: :: [ PASS ] :: Running 'rhts-sync-block -s rebal_run.70 rhsauto019.lab.eng.blr.redhat.com rhsauto008.lab.eng.blr.redhat.com rhsauto021.lab.eng.blr.redhat.com rhsauto022.lab.eng.blr.redhat.com' (Expected 0, got 0) :: [ 22:41:42 ] :: rebal_get_status - Check status of rebalance and looks for errors. :: [ 22:41:42 ] :: Machine in recipe is MASTERNODE rhsauto019.lab.eng.blr.redhat.com :: [ 22:41:42 ] :: Logging initial status of rebalance: Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 32 717Bytes 233 33 0 in progress 1.00 rhsauto008.lab.eng.blr.redhat.com 30 674Bytes 240 27 0 in progress 1.00 rhsauto022.lab.eng.blr.redhat.com 0 0Bytes 325 2 0 completed 0.00 volume rebalance: rebalvol: success: from rebalance logs ------------------- [2013-12-03 03:16:22.993932] I [dht-rebalance.c:672:dht_migrate_file] 0-hosdu-dht: /90: attempting to move from hosdu-replicate-0 to hosdu-replicate-2 [2013-12-03 03:16:23.008892] W [dht-rebalance.c:374:__dht_check_free_space] 0-hosdu-dht: data movement attempted from node (hosdu-replicate-0) with higher disk space to a node (hosdu-replicate-2) with lesser disk space (/90) [2013-12-03 03:16:23.025123] I [dht-rebalance.c:672:dht_migrate_file] 0-hosdu-dht: /11: attempting to move from hosdu-replicate-1 to hosdu-replicate-3 [2013-12-03 03:16:23.038697] W [dht-rebalance.c:374:__dht_check_free_space] 0-hosdu-dht: data movement attempted from node (hosdu-replicate-1) with higher disk space to a node (hosdu-replicate-3) with lesser disk space (/11) [2013-12-03 03:16:23.047018] I [dht-rebalance.c:672:dht_migrate_file] 0-hosdu-dht: /20: attempting to move from hosdu-replicate-1 to hosdu-replicate-3 [2013-12-03 03:16:23.060844] W [dht-rebalance.c:374:__dht_check_free_space] 0-hosdu-dht: data movement attempted from node (hosdu-replicate-1) with higher disk space to a node (hosdu-replicate-3) with lesser disk space (/20) [2013-12-03 03:16:23.071069] I [dht-rebalance.c:672:dht_migrate_file] 0-hosdu-dht: /35: attempting to move from hosdu-replicate-1 to hosdu-replicate-3 [2013-12-03 03:16:23.086840] W [dht-rebalance.c:374:__dht_check_free_space] 0-hosdu-dht: data movement attempted from node (hosdu-replicate-1) with higher disk space to a node (hosdu-replicate-3) with lesser disk space (/35) [2013-12-03 03:16:23.094162] I [dht-rebalance.c:672:dht_migrate_file] 0-hosdu-dht: /46: attempting to move from hosdu-replicate-1 to hosdu-replicate-3 [2013-12-03 03:16:23.113757] W [dht-rebalance.c:374:__dht_check_free_space] 0-hosdu-dht: data movement attempted from node (hosdu-replicate-1) with higher disk space to a node (hosdu-replicate-3) with lesser disk space (/46) [2013-12-03 03:16:23.125096] I [dht-rebalance.c:672:dht_migrate_file] 0-hosdu-dht: /49: attempting to move from hosdu-replicate-1 to hosdu-replicate-3 [2013-12-03 03:16:23.141371] W [dht-rebalance.c:374:__dht_check_free_space] 0-hosdu-dht: data movement attempted from node (hosdu-replicate-1) with higher disk space to a node (hosdu-replicate-3) with lesse Above warning messages are supposed to be considered as "skipped" instead they are considered as "failed"