Bug 1040356

Summary: BVT: " failed to get trusted.distribute.linkinfo key" errors are seen in rebalance logs
Product: [Community] GlusterFS Reporter: Pranith Kumar K <pkarampu>
Component: distributeAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: gluster-bugs, jturner, lmohanty, shaines, shmohan, vagarwal, vbellur
Target Milestone: ---Keywords: Regression, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.5.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1037515 Environment:
Last Closed: 2014-04-17 11:52:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1037515    
Bug Blocks:    

Comment 1 Anand Avati 2013-12-11 09:58:59 UTC
REVIEW: http://review.gluster.org/6481 (cluster/dht: Make sure gf_defrag_migrate_data is not optimized) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 2 Anand Avati 2013-12-11 10:00:36 UTC
REVIEW: http://review.gluster.org/6481 (cluster/dht: Make sure gf_defrag_migrate_data is not optimized) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Anand Avati 2013-12-12 02:23:36 UTC
REVIEW: http://review.gluster.org/6481 (cluster/dht: Make sure gf_defrag_migrate_data is not optimized) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 4 Anand Avati 2013-12-12 06:23:08 UTC
COMMIT: http://review.gluster.org/6481 committed in master by Anand Avati (avati) 
------
commit 493008a299cd1197df0caee72eacd12c1a54606b
Author: Pranith Kumar K <pkarampu>
Date:   Wed Dec 11 15:19:25 2013 +0530

    cluster/dht: Make sure gf_defrag_migrate_data is not optimized
    
    Problem:
    Whenever there syncop_xxx() is used inside a synctask and gcc
    optimizes it when compiled with -O2 there is a problem where
    'errno' would not work as expected.
    
    Fix:
    Until http://review.gluster.com/6475 is reviewed and merged
    we are making sure the function is not going to be optimized.
    
    Change-Id: I504c18c8a7789f0c776a56f0aa60db3618b21601
    BUG: 1040356
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/6481
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Shyamsundar Ranganathan <srangana>
    Reviewed-by: Anand Avati <avati>

Comment 5 Anand Avati 2013-12-23 12:01:06 UTC
REVIEW: http://review.gluster.org/6475 (syncop: Change return value of syncop) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 6 Anand Avati 2013-12-23 12:40:57 UTC
REVIEW: http://review.gluster.org/6475 (syncop: Change return value of syncop) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 7 Anand Avati 2014-01-16 11:50:27 UTC
REVIEW: http://review.gluster.org/6475 (syncop: Change return value of syncop) posted (#5) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 8 Anand Avati 2014-01-20 07:05:24 UTC
COMMIT: http://review.gluster.org/6475 committed in master by Anand Avati (avati) 
------
commit 8d55c25f158921b508bff0e7f25158991913f922
Author: Pranith Kumar K <pkarampu>
Date:   Wed Dec 11 09:33:53 2013 +0530

    syncop: Change return value of syncop
    
    Problem:
    We found a day-1 bug when syncop_xxx() infra is used inside a synctask with
    compilation optimization (CFLAGS -O2).
    
    Detailed explanation of the Root cause:
    We found the bug in 'gf_defrag_migrate_data' in rebalance operation:
    
    Lets look at interesting parts of the function:
    
    int
    gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t *defrag, loc_t *loc,
                            dict_t *migrate_data)
    {
    .....
    code section - [ Loop ]
            while ((ret = syncop_readdirp (this, fd, 131072, offset, NULL,
                                           &entries)) != 0) {
    .....
    code section - [ ERRNO-1 ] (errno of readdirp is stored in readdir_operrno by a
    thread)
                    /* Need to keep track of ENOENT errno, that means, there is no
                       need to send more readdirp() */
                    readdir_operrno = errno;
    .....
    code section - [ SYNCOP-1 ] (syncop_getxattr is called by a thread)
                            ret = syncop_getxattr (this, &entry_loc, &dict,
                                                   GF_XATTR_LINKINFO_KEY);
    code section - [ ERRNO-2]   (checking for failures of syncop_getxattr(). This
    may not always be executed in same thread which executed [SYNCOP-1])
                            if (ret < 0) {
                                    if (errno != ENODATA) {
                                            loglevel = GF_LOG_ERROR;
                                            defrag->total_failures += 1;
    .....
    }
    
    the function above could be executed by thread(t1) till [SYNCOP-1] and code
    from [ERRNO-2] can be executed by a different thread(t2) because of the way
    syncop-infra schedules the tasks.
    
    when the code is compiled with -O2 optimization this is the assembly code that
    is generated:
     [ERRNO-1]
    1165                        readdir_operrno = errno; <<---- errno gets expanded
    as *(__errno_location())
       0x00007fd149d48b60 <+496>:        callq  0x7fd149d410c0 <address@hidden>
       0x00007fd149d48b72 <+514>:        mov    %rax,0x50(%rsp) <<------ Address
    returned by __errno_location() is stored in a special location in stack for
    later use.
       0x00007fd149d48b77 <+519>:        mov    (%rax),%eax
       0x00007fd149d48b79 <+521>:        mov    %eax,0x78(%rsp)
    ....
     [ERRNO-2]
    1281                                        if (errno != ENODATA) {
       0x00007fd149d492ae <+2366>:        mov    0x50(%rsp),%rax <<-----  Because
    it already stored the address returned by __errno_location(), it just
    dereferences the address to get the errno value. BUT THIS CODE NEED NOT BE
    EXECUTED BY SAME THREAD!!!
       0x00007fd149d492b3 <+2371>:        mov    $0x9,%ebp
       0x00007fd149d492b8 <+2376>:        mov    (%rax),%edi
       0x00007fd149d492ba <+2378>:        cmp    $0x3d,%edi
    
    The problem is that __errno_location() value of t1 and t2 are different. So
    [ERRNO-2] ends up reading errno of t1 instead of errno of t2 even though t2 is
    executing [ERRNO-2] code section.
    
    When code is compiled without any optimization for [ERRNO-2]:
    1281                                        if (errno != ENODATA) {
       0x00007fd58e7a326f <+2237>:        callq  0x7fd58e797300
    <address@hidden><<--- As it is calling __errno_location() again it gets the
    location from t2 so it works as intended.
       0x00007fd58e7a3274 <+2242>:        mov    (%rax),%eax
       0x00007fd58e7a3276 <+2244>:        cmp    $0x3d,%eax
       0x00007fd58e7a3279 <+2247>:        je     0x7fd58e7a32a1
    <gf_defrag_migrate_data+2287>
    
    Fix:
    Make syncop_xxx() return (-errno) value as the return value in
    case of errors and all the functions which make syncop_xxx() will need to use
    (-ret) to figure out the reason for failure in case of syncop_xxx() failures.
    
    Change-Id: I314d20dabe55d3e62ff66f3b4adb1cac2eaebb57
    BUG: 1040356
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/6475
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Anand Avati <avati>

Comment 9 Anand Avati 2014-01-21 07:32:16 UTC
REVIEW: http://review.gluster.org/6745 (cluster/dht: Make sure gf_defrag_migrate_data is not optimized) posted (#1) for review on release-3.5 by Lalatendu Mohanty (lmohanty)

Comment 10 Anand Avati 2014-01-21 11:06:45 UTC
COMMIT: http://review.gluster.org/6745 committed in release-3.5 by Vijay Bellur (vbellur) 
------
commit 9b13d17f27d65c3bff68b6c6a69cf0f02c972e73
Author: Pranith Kumar K <pkarampu>
Date:   Wed Dec 11 15:19:25 2013 +0530

    cluster/dht: Make sure gf_defrag_migrate_data is not optimized
    
    Problem:
    Whenever there syncop_xxx() is used inside a synctask and gcc
    optimizes it when compiled with -O2 there is a problem where
    'errno' would not work as expected.
    
    Fix:
    Until http://review.gluster.com/6475 is reviewed and merged
    we are making sure the function is not going to be optimized.
    
    Change-Id: I504c18c8a7789f0c776a56f0aa60db3618b21601
    BUG: 1040356
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/6481
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Shyamsundar Ranganathan <srangana>
    Reviewed-by: Anand Avati <avati>
    Reviewed-on: http://review.gluster.org/6745
    Reviewed-by: Vijay Bellur <vbellur>

Comment 11 Niels de Vos 2014-04-17 11:52:16 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report.

glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user