Bug 1054703 - DHT: Wrong condition checking for ENOSPC in gf_defrag_migrate_data in dht-rebalance.c
Summary: DHT: Wrong condition checking for ENOSPC in gf_defrag_migrate_data in dht-r...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: pre-release
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Susant Kumar Palai
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-17 10:11 UTC by Susant Kumar Palai
Modified: 2014-11-11 08:27 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.6.0beta1
Clone Of:
Environment:
Last Closed: 2014-11-11 08:27:10 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Susant Kumar Palai 2014-01-17 10:11:54 UTC
Description of problem:

While reading the code found the following piece of code in file dht-rebalance.c in function gf_defrag_migrate_data (line no: 1303)


ret = syncop_setxattr (this, &entry_loc, migrate_data, 0);

if (ret) {
        err = op_errno;
        /* errno is overloaded. See
        * rebalance_task_completion () */
        if (err != ENOSPC) {
        gf_log (this->name, GF_LOG_DEBUG,
                "migrate-data skipped for %s"
                " due to space constraints",
                entry_loc.path);

        defrag->skipped +=1;
        } else{
        gf_log (this->name, GF_LOG_ERROR,
                "migrate-data failed for %s",
                entry_loc.path);

        defrag->total_failures +=1;
        }                                                                                                                                                                                                          
}
          


The condition for the 2nd if statment should be (err == ENOSPC) instead of (err != ENOSPC). Otherwise we will get failure message for file migration for the correct case.

Comment 1 Anand Avati 2014-01-17 10:28:16 UTC
REVIEW: http://review.gluster.org/6727 (DHT: ENOSPC error leads to failure of rebalance) posted (#1) for review on master by susant palai (spalai)

Comment 2 Susant Kumar Palai 2014-02-19 05:07:51 UTC
I put the following log messages in "rebalance_task_completion" function

if (op_ret == -1) {
                /* Failure of migration process, mostly due to write process.
                   as we can't preserve the exact errno, lets say there was
                   no space to migrate-data
                */
             gf_log (this->name, GF_LOG_WARNING,"SPACE_rebalance_task_completion(GOING FOR ENOSPC) op_ret=%d : %s",op_ret, local->loc.path);
             op_errno = ENOSPC;
        }

        if (op_ret == 1) {
                /* migration didn't happen, but is not a failure, let the user
                   understand that he doesn't have permission to migrate the
                   file.
                */
            gf_log (this->name, GF_LOG_WARNING,"SPACE_rebalance_task_completion(GOING FOR EPERM) op_ret=%d : %s",op_ret, local->loc.path);
            op_ret = -1;
            op_errno = EPERM;
        }


Created a scneario where one brick has no space and rebalance will try to migrate the data to this brick.

Got the following in the log and we are getting EPERM here: 
[2014-02-19 04:36:06.843330] W [dht-rebalance.c:1017:rebalance_task_completion] 0-test1-dht: SPACE_rebalance_task_completion(GOING FOR EPERM) op_ret=1 : /file

Some points to note:
---------------------------------
1. So rebalance_task_completion is setting the op_errno to EPERM instead of ENOSPC

2. In gf_defrag_migrate_data errno is always zero. 

[2014-02-19 03:09:12.441541] E [dht-rebalance.c:1335:gf_defrag_migrate_data] 0-test1-dht: TRACKER_Value of op_errno= 0 and errno=0 and ret=-1

The error check magically works for ENOSPC because our comparision is (err == ENOSPC ) and err=op_errno=0 always.

Comment 3 Anand Avati 2014-02-19 06:34:07 UTC
REVIEW: http://review.gluster.org/6727 (DHT: Wrong error handling in gf_defrag_migrate_data) posted (#2) for review on master by susant palai (spalai)

Comment 4 Anand Avati 2014-02-22 03:38:41 UTC
REVIEW: http://review.gluster.org/6727 (DHT: Wrong error handling in gf_defrag_migrate_data) posted (#3) for review on master by susant palai (spalai)

Comment 5 Anand Avati 2014-05-28 17:39:30 UTC
COMMIT: http://review.gluster.org/6727 committed in master by Vijay Bellur (vbellur) 
------
commit deaaaec82a0aae1edfeb0688e20498fafa896c83
Author: Susant Palai <spalai>
Date:   Wed Feb 19 05:47:07 2014 +0000

    DHT: Wrong error handling in gf_defrag_migrate_data
    
    Problem:
             Because of the condition (err = op_errno), err was set to
    zero always and ENOSPC error will be logged always. "dht_check_free_
    space" was returning 1 and it was mapped to EPERM in "rebalance_task
    _completion".
    
    Solution: Changed the return value in dht_check_free_space to -1
    as in rebalance_task_completion op_ret value -1 is mapped to ENOSPC.
    
    And fixed the wrong error condition after syncop_setxattr in
    gf_defrag_migrate_data.
    
    Change-Id: I474ea1bef3b1e34c89814ed0091dd02bd5b63737
    BUG: 1054703
    Signed-off-by: Susant Palai <spalai>
    Reviewed-on: http://review.gluster.org/6727
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Raghavendra G <rgowdapp>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 6 Niels de Vos 2014-09-22 12:35:11 UTC
A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 7 Niels de Vos 2014-11-11 08:27:10 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users


Note You need to log in before you can comment on or make changes to this bug.