Bug 1334164 - Worker dies with [Errno 5] Input/output error upon creation of entries at slave
Summary: Worker dies with [Errno 5] Input/output error upon creation of entries at slave
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On: 1325760
Blocks: 1336284 1336285 1348060
TreeView+ depends on / blocked
 
Reported: 2016-05-09 04:55 UTC by Raghavendra G
Modified: 2017-03-27 18:22 UTC (History)
11 users (show)

Fixed In Version: glusterfs-3.9.0
Clone Of: 1325760
: 1336284 (view as bug list)
Environment:
Last Closed: 2017-03-27 18:22:53 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Comment 1 Raghavendra G 2016-05-09 05:01:44 UTC
Pasting the discussion we had on mail below:

> Me, Aravinda and Kotresh had a discussion on this.
>
> RCA: non-linked inodes being used in dht-selfheal codepath. Issue got
> uncovered in geo-rep setup because of http://review.gluster.org/12539, as we
> now heal in discover codepath too.
>
> Proposed solution: Link inode in dht-selfheal and use the linked inode for
> all selfheal functionality. One question I ran into with this solution is
> about lookup not being complete for xlators between dht and xlator managing
> itable (fuse-bridge here). So, if there is a non-lookup fop it might use an
> inode which might be invalid for these xlators. However, there are two
> aspects to this question, when dht is trying to link inode object - inode1 -
> before it attempts healing.
>
> 1. A valid inode is already linked with gfid in itable by the time dht does
> inode_link. In this case dht gets this inode (instead of the one it got
> during lookup) and somebody else has completed the lookup already, the above
> said problem wont be there.
> 2. There is no valid inode linked with gfid in itable. In this case, it might
> so happen that a racing lookup (with a different inode object - say inode2)
> might complete and post _that_ there can be non-lookup fops on inode1 on
> which lookup is still in progress. So, the inode1 is still invalid for
> xlators b/w dht and fuse-bridge. We can solve this problem easily if dht
> always unwinds lookup call with "linked_inode" instead of the inode object
> passed down as part of request. So, the racing lookup here which got inode2
> as argument in request path will unwind with inode1. This means dht should
> do inode_find for the associated gfid if healing is not required and unwind
> lookup with result of inode_find if it finds a valid inode object. Note that
> this algo works if all translators use inode passed in the lookup_cbk,
> instead of using the one stored in their locals. This can be enforced for
> atleast xlators b/w dht and fuse-bridge.

This is the "ugly" aspect of this design - introducing restrictions on how other xlators should behave. So, I am suggesting a variant of the "clean" solution proposed below.

>
> To summarize the algorithm:
> 1. If dht decides a directory needs self-heal,
>     a. it links the inode.
>     b. use the return value of inode_link for healing.
> 2. When no healing is required, dht does an inode_find (gfid, itable) and
> unwind with the result (instead of inode passed to it as argument).
>
> I am yet to think through whether this algo has any caveats with NFS. This
> bug has a fuse-mount.
>
> There is one "clean" solution of separating lookup in to two fops - one does
> plain lookup and the other does healing if required. But, that's a huge
> change and going to take time.

A variant of this solution can be implemented without requiring too much of code-change.

1. Introduce methods in itable that can be used by xlators to guide inode/dentry management. In this case, lets have a method - say itable_set_need_lookup. These methods _should_ be implemented by xlators doing resolution/itable-management (like fuse-bridge, gfapi, protocol/server, nfs etc). Note that the xlator doing itable management and the xlator associated with itable (table->xl) need not be same. So, this helps for other xlators like dht to have a say in how inodes are managed.

2. dht, when it decides that a heal has to be done
    a. links the inode into itable
    b. instructs the resolver code that a "resolution" (essentially a lookup in this case, as we have not broken lookup yet) has to be done before resuming any non-lookup fops on this inode.

3. The xlator managing inodes (fuse-bridge in this specific case) sends a lookup on the inode if need lookup is set. This behavior is very much similar to fuse_readdirp_cbk setting need-lookup and related behavior.

There are some implementation details - like dht treating revalidates differently from fresh lookup etc. But, these can be worked out while coding.

Comment 2 Vijay Bellur 2016-05-11 12:24:00 UTC
REVIEW: http://review.gluster.org/14295 (cluster/distribute: use a linked inode in directory heal codepath) posted (#1) for review on master by Raghavendra G (rgowdapp)

Comment 3 Vijay Bellur 2016-05-11 13:04:35 UTC
REVIEW: http://review.gluster.org/14295 (cluster/distribute: use a linked inode in directory heal codepath) posted (#2) for review on master by Raghavendra G (rgowdapp)

Comment 4 Vijay Bellur 2016-05-16 05:54:40 UTC
REVIEW: http://review.gluster.org/14319 (cluster/distribute: heal layout in discover codepath too) posted (#3) for review on master by Raghavendra G (rgowdapp)

Comment 5 Vijay Bellur 2016-05-16 16:10:23 UTC
COMMIT: http://review.gluster.org/14295 committed in master by Raghavendra G (rgowdapp) 
------
commit 06f92634d9ad8aa5c56d786e5248016c283e5c5b
Author: Raghavendra G <rgowdapp>
Date:   Wed May 11 17:49:10 2016 +0530

    cluster/distribute: use a linked inode in directory heal codepath
    
    This is needed for following reasons:
    * healing is done in lookup and mkdir codepath where inode is not
      linked _yet_ as normally linking is done in interface layers
      (fuse-bridge, gfapi, nfsv3 etc).
    
    * healing consists of non-lookup fops like inodelk, setattr, setxattr
      etc. All non-lookup fops expect a linked inode.
    
    Change-Id: I1bd8157abbae58431b7f6f6fffee0abfe5225342
    BUG: 1334164
    Signed-off-by: Raghavendra G <rgowdapp>
    Reviewed-on: http://review.gluster.org/14295
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Smoke: Gluster Build System <jenkins.com>
    Reviewed-by: Susant Palai <spalai>
    Reviewed-by: mohammed rafi  kc <rkavunga>

Comment 6 Vijay Bellur 2016-05-16 18:34:24 UTC
REVIEW: http://review.gluster.org/14364 (cluster/distribute: heal layout in discover codepath too) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 7 Vijay Bellur 2016-05-16 18:34:57 UTC
REVIEW: http://review.gluster.org/14365 (cluster/distribute: heal layout in discover codepath too) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 8 Vijay Bellur 2016-05-16 18:35:51 UTC
REVIEW: http://review.gluster.org/14366 (cluster/distribute: heal layout in discover codepath too) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 9 Vijay Bellur 2016-05-17 03:13:11 UTC
COMMIT: http://review.gluster.org/14365 committed in master by Raghavendra G (rgowdapp) 
------
commit a74f8cf4e7edc2ce9f045317a18dacddf25adb8a
Author: Raghavendra G <rgowdapp>
Date:   Fri May 13 11:40:57 2016 +0530

    cluster/distribute: heal layout in discover codepath too
    
    BUG: 1334164
    Change-Id: I4259d88f2b6e4f9d4ad689bc4e438f1db9cfd177
    Signed-off-by: Raghavendra G <rgowdapp>
    Reviewed-on: http://review.gluster.org/14365
    Tested-by: Pranith Kumar Karampuri <pkarampu>
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>

Comment 10 Shyamsundar 2017-03-27 18:22:53 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.0, please open a new bug report.

glusterfs-3.9.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2016-November/029281.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.