Bug 1054703
| Summary: | DHT: Wrong condition checking for ENOSPC in gf_defrag_migrate_data in dht-rebalance.c | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Susant Kumar Palai <spalai> |
| Component: | distribute | Assignee: | Susant Kumar Palai <spalai> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | pre-release | CC: | bugs, gluster-bugs, rgowdapp, rwheeler |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | glusterfs-3.6.0beta1 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-11-11 08:27:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
REVIEW: http://review.gluster.org/6727 (DHT: ENOSPC error leads to failure of rebalance) posted (#1) for review on master by susant palai (spalai) I put the following log messages in "rebalance_task_completion" function
if (op_ret == -1) {
/* Failure of migration process, mostly due to write process.
as we can't preserve the exact errno, lets say there was
no space to migrate-data
*/
gf_log (this->name, GF_LOG_WARNING,"SPACE_rebalance_task_completion(GOING FOR ENOSPC) op_ret=%d : %s",op_ret, local->loc.path);
op_errno = ENOSPC;
}
if (op_ret == 1) {
/* migration didn't happen, but is not a failure, let the user
understand that he doesn't have permission to migrate the
file.
*/
gf_log (this->name, GF_LOG_WARNING,"SPACE_rebalance_task_completion(GOING FOR EPERM) op_ret=%d : %s",op_ret, local->loc.path);
op_ret = -1;
op_errno = EPERM;
}
Created a scneario where one brick has no space and rebalance will try to migrate the data to this brick.
Got the following in the log and we are getting EPERM here:
[2014-02-19 04:36:06.843330] W [dht-rebalance.c:1017:rebalance_task_completion] 0-test1-dht: SPACE_rebalance_task_completion(GOING FOR EPERM) op_ret=1 : /file
Some points to note:
---------------------------------
1. So rebalance_task_completion is setting the op_errno to EPERM instead of ENOSPC
2. In gf_defrag_migrate_data errno is always zero.
[2014-02-19 03:09:12.441541] E [dht-rebalance.c:1335:gf_defrag_migrate_data] 0-test1-dht: TRACKER_Value of op_errno= 0 and errno=0 and ret=-1
The error check magically works for ENOSPC because our comparision is (err == ENOSPC ) and err=op_errno=0 always.
REVIEW: http://review.gluster.org/6727 (DHT: Wrong error handling in gf_defrag_migrate_data) posted (#2) for review on master by susant palai (spalai) REVIEW: http://review.gluster.org/6727 (DHT: Wrong error handling in gf_defrag_migrate_data) posted (#3) for review on master by susant palai (spalai) COMMIT: http://review.gluster.org/6727 committed in master by Vijay Bellur (vbellur) ------ commit deaaaec82a0aae1edfeb0688e20498fafa896c83 Author: Susant Palai <spalai> Date: Wed Feb 19 05:47:07 2014 +0000 DHT: Wrong error handling in gf_defrag_migrate_data Problem: Because of the condition (err = op_errno), err was set to zero always and ENOSPC error will be logged always. "dht_check_free_ space" was returning 1 and it was mapped to EPERM in "rebalance_task _completion". Solution: Changed the return value in dht_check_free_space to -1 as in rebalance_task_completion op_ret value -1 is mapped to ENOSPC. And fixed the wrong error condition after syncop_setxattr in gf_defrag_migrate_data. Change-Id: I474ea1bef3b1e34c89814ed0091dd02bd5b63737 BUG: 1054703 Signed-off-by: Susant Palai <spalai> Reviewed-on: http://review.gluster.org/6727 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Raghavendra G <rgowdapp> Reviewed-by: Vijay Bellur <vbellur> A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED. Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html [2] http://supercolony.gluster.org/pipermail/gluster-users/ This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report. glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html [2] http://supercolony.gluster.org/mailman/listinfo/gluster-users |
Description of problem: While reading the code found the following piece of code in file dht-rebalance.c in function gf_defrag_migrate_data (line no: 1303) ret = syncop_setxattr (this, &entry_loc, migrate_data, 0); if (ret) { err = op_errno; /* errno is overloaded. See * rebalance_task_completion () */ if (err != ENOSPC) { gf_log (this->name, GF_LOG_DEBUG, "migrate-data skipped for %s" " due to space constraints", entry_loc.path); defrag->skipped +=1; } else{ gf_log (this->name, GF_LOG_ERROR, "migrate-data failed for %s", entry_loc.path); defrag->total_failures +=1; } } The condition for the 2nd if statment should be (err == ENOSPC) instead of (err != ENOSPC). Otherwise we will get failure message for file migration for the correct case.