Bug 1360152

Summary: IO error seen with Rolling or non-disruptive upgrade of an distribute-disperse(EC) volume from 3.7.5 to 3.7.9
Product: [Community] GlusterFS Reporter: Ashish Pandey <aspandey>
Component: disperseAssignee: Ashish Pandey <aspandey>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.7.13CC: aspandey, bugs, byarlaga, nchilaka, pkarampu
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.14 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1347686 Environment:
Last Closed: 2016-08-02 07:25:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1347251, 1347686, 1360174    
Bug Blocks:    

Comment 1 Ashish Pandey 2016-07-26 06:50:57 UTC
For glusterfs 3.7.5, feature/lock was not returning the lock count in xdata which ec requested.

To solve a hang issue we modified the code in such a way that if there is any request of inodelk count in xdata, feature/lock will return the same using xdata.

Now for glusterfs 3.7.9 ec is getting inodelk count in xdata from feature/lock.

This issue arises when we do a rolling update from 3.7.5 to 3.7.9.
For 4+2 volume running 3.7.5, if we update 2 nodes and after heal completion  kill 2 older nodes, this problem can be seen.
After update and killing of bricks, 2 nodes will return inodelk count while 2 older nodes will not contain it.

During dictionary match , ec_dict_compare, this will lead to mismatch of answers and the file operation on mount point will fail with IO error.

Comment 2 Vijay Bellur 2016-07-26 07:21:14 UTC
REVIEW: http://review.gluster.org/15012 (cluster/ec: Handle absence of keys in some callback dict) posted (#1) for review on release-3.7 by Ashish Pandey (aspandey)

Comment 3 Vijay Bellur 2016-07-27 07:00:07 UTC
COMMIT: http://review.gluster.org/15012 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit 1e3a8f47cd88c39c41519d143b001d45387eb4b8
Author: Ashish Pandey <aspandey>
Date:   Fri Jun 17 17:52:56 2016 +0530

    cluster/ec: Handle absence of keys in some callback dict
    
    Problem: This issue arises when we do a rolling update
    from 3.7.5 to 3.7.9.
    For 4+2 volume running 3.7.5, if we update 2 nodes
    and after heal completion  kill 2 older nodes, this
    problem can be seen. After update and killing of
    bricks, 2 nodes will return inodelk count key in dict
    while other 2 nodes will not have inodelk count in dict.
    This is also true for get-link-count.
    During dictionary match , ec_dict_compare, this will
    lead to mismatch of answers and the file operation
    on mount point will fail with IO error.
    
    Solution:
    Don't match inode, entry and link count keys while
    comparing two dictionaries. However, while combining the
    data in ec_dict_combine, go through all the dictionaries
    and select the maximum values received in different dicts
    for these keys.
    
    master-
    http://review.gluster.org/#/c/14761/
    
    Change-Id: I33546e3619fe8f909286ee48fb0df2009cd3d22f
    BUG: 1360152
    Signed-off-by: Ashish Pandey <aspandey>
    Reviewed-on: http://review.gluster.org/14761
    Reviewed-by: Xavier Hernandez <xhernandez>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    CentOS-regression: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Signed-off-by: Ashish Pandey <aspandey>
    Reviewed-on: http://review.gluster.org/15012

Comment 4 Kaushal 2016-08-02 07:25:06 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.14, please open a new bug report.

glusterfs-3.7.14 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-devel/2016-August/050319.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user