Bug 1360152 - IO error seen with Rolling or non-disruptive upgrade of an distribute-disperse(EC) volume from 3.7.5 to 3.7.9
Summary: IO error seen with Rolling or non-disruptive upgrade of an distribute-dispers...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: disperse
Version: 3.7.13
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: Ashish Pandey
QA Contact:
URL:
Whiteboard:
Depends On: 1347251 1347686 1360174
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-26 06:29 UTC by Ashish Pandey
Modified: 2017-02-15 13:57 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.7.14
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1347686
Environment:
Last Closed: 2016-08-02 07:25:06 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Comment 1 Ashish Pandey 2016-07-26 06:50:57 UTC
For glusterfs 3.7.5, feature/lock was not returning the lock count in xdata which ec requested.

To solve a hang issue we modified the code in such a way that if there is any request of inodelk count in xdata, feature/lock will return the same using xdata.

Now for glusterfs 3.7.9 ec is getting inodelk count in xdata from feature/lock.

This issue arises when we do a rolling update from 3.7.5 to 3.7.9.
For 4+2 volume running 3.7.5, if we update 2 nodes and after heal completion  kill 2 older nodes, this problem can be seen.
After update and killing of bricks, 2 nodes will return inodelk count while 2 older nodes will not contain it.

During dictionary match , ec_dict_compare, this will lead to mismatch of answers and the file operation on mount point will fail with IO error.

Comment 2 Vijay Bellur 2016-07-26 07:21:14 UTC
REVIEW: http://review.gluster.org/15012 (cluster/ec: Handle absence of keys in some callback dict) posted (#1) for review on release-3.7 by Ashish Pandey (aspandey)

Comment 3 Vijay Bellur 2016-07-27 07:00:07 UTC
COMMIT: http://review.gluster.org/15012 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit 1e3a8f47cd88c39c41519d143b001d45387eb4b8
Author: Ashish Pandey <aspandey>
Date:   Fri Jun 17 17:52:56 2016 +0530

    cluster/ec: Handle absence of keys in some callback dict
    
    Problem: This issue arises when we do a rolling update
    from 3.7.5 to 3.7.9.
    For 4+2 volume running 3.7.5, if we update 2 nodes
    and after heal completion  kill 2 older nodes, this
    problem can be seen. After update and killing of
    bricks, 2 nodes will return inodelk count key in dict
    while other 2 nodes will not have inodelk count in dict.
    This is also true for get-link-count.
    During dictionary match , ec_dict_compare, this will
    lead to mismatch of answers and the file operation
    on mount point will fail with IO error.
    
    Solution:
    Don't match inode, entry and link count keys while
    comparing two dictionaries. However, while combining the
    data in ec_dict_combine, go through all the dictionaries
    and select the maximum values received in different dicts
    for these keys.
    
    master-
    http://review.gluster.org/#/c/14761/
    
    Change-Id: I33546e3619fe8f909286ee48fb0df2009cd3d22f
    BUG: 1360152
    Signed-off-by: Ashish Pandey <aspandey>
    Reviewed-on: http://review.gluster.org/14761
    Reviewed-by: Xavier Hernandez <xhernandez>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    CentOS-regression: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Signed-off-by: Ashish Pandey <aspandey>
    Reviewed-on: http://review.gluster.org/15012

Comment 4 Kaushal 2016-08-02 07:25:06 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.14, please open a new bug report.

glusterfs-3.7.14 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-devel/2016-August/050319.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.