Description of problem: While reading the code found the following piece of code in file dht-rebalance.c in function gf_defrag_migrate_data (line no: 1303) ret = syncop_setxattr (this, &entry_loc, migrate_data, 0); if (ret) { err = op_errno; /* errno is overloaded. See * rebalance_task_completion () */ if (err != ENOSPC) { gf_log (this->name, GF_LOG_DEBUG, "migrate-data skipped for %s" " due to space constraints", entry_loc.path); defrag->skipped +=1; } else{ gf_log (this->name, GF_LOG_ERROR, "migrate-data failed for %s", entry_loc.path); defrag->total_failures +=1; } } The condition for the 2nd if statment should be (err == ENOSPC) instead of (err != ENOSPC). Otherwise we will get failure message for file migration for the correct case.
REVIEW: http://review.gluster.org/6727 (DHT: ENOSPC error leads to failure of rebalance) posted (#1) for review on master by susant palai (spalai)
I put the following log messages in "rebalance_task_completion" function if (op_ret == -1) { /* Failure of migration process, mostly due to write process. as we can't preserve the exact errno, lets say there was no space to migrate-data */ gf_log (this->name, GF_LOG_WARNING,"SPACE_rebalance_task_completion(GOING FOR ENOSPC) op_ret=%d : %s",op_ret, local->loc.path); op_errno = ENOSPC; } if (op_ret == 1) { /* migration didn't happen, but is not a failure, let the user understand that he doesn't have permission to migrate the file. */ gf_log (this->name, GF_LOG_WARNING,"SPACE_rebalance_task_completion(GOING FOR EPERM) op_ret=%d : %s",op_ret, local->loc.path); op_ret = -1; op_errno = EPERM; } Created a scneario where one brick has no space and rebalance will try to migrate the data to this brick. Got the following in the log and we are getting EPERM here: [2014-02-19 04:36:06.843330] W [dht-rebalance.c:1017:rebalance_task_completion] 0-test1-dht: SPACE_rebalance_task_completion(GOING FOR EPERM) op_ret=1 : /file Some points to note: --------------------------------- 1. So rebalance_task_completion is setting the op_errno to EPERM instead of ENOSPC 2. In gf_defrag_migrate_data errno is always zero. [2014-02-19 03:09:12.441541] E [dht-rebalance.c:1335:gf_defrag_migrate_data] 0-test1-dht: TRACKER_Value of op_errno= 0 and errno=0 and ret=-1 The error check magically works for ENOSPC because our comparision is (err == ENOSPC ) and err=op_errno=0 always.
REVIEW: http://review.gluster.org/6727 (DHT: Wrong error handling in gf_defrag_migrate_data) posted (#2) for review on master by susant palai (spalai)
REVIEW: http://review.gluster.org/6727 (DHT: Wrong error handling in gf_defrag_migrate_data) posted (#3) for review on master by susant palai (spalai)
COMMIT: http://review.gluster.org/6727 committed in master by Vijay Bellur (vbellur) ------ commit deaaaec82a0aae1edfeb0688e20498fafa896c83 Author: Susant Palai <spalai> Date: Wed Feb 19 05:47:07 2014 +0000 DHT: Wrong error handling in gf_defrag_migrate_data Problem: Because of the condition (err = op_errno), err was set to zero always and ENOSPC error will be logged always. "dht_check_free_ space" was returning 1 and it was mapped to EPERM in "rebalance_task _completion". Solution: Changed the return value in dht_check_free_space to -1 as in rebalance_task_completion op_ret value -1 is mapped to ENOSPC. And fixed the wrong error condition after syncop_setxattr in gf_defrag_migrate_data. Change-Id: I474ea1bef3b1e34c89814ed0091dd02bd5b63737 BUG: 1054703 Signed-off-by: Susant Palai <spalai> Reviewed-on: http://review.gluster.org/6727 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Raghavendra G <rgowdapp> Reviewed-by: Vijay Bellur <vbellur>
A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED. Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html [2] http://supercolony.gluster.org/pipermail/gluster-users/
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report. glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html [2] http://supercolony.gluster.org/mailman/listinfo/gluster-users