Bug 1258197

Summary: gNFSd: NFS mount fails with "Remote I/O error"
Product: [Community] GlusterFS Reporter: Niels de Vos <ndevos>
Component: nfsAssignee: Niels de Vos <ndevos>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.7.3CC: bugs, gluster-bugs
Target Milestone: ---Keywords: Patch, Triaged
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: glusterfs-3.7.6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1258196 Environment:
Last Closed: 2015-11-17 05:57:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1258196    
Bug Blocks: 1275914    

Description Niels de Vos 2015-08-30 07:52:08 UTC
+++ This bug was initially created as a clone of Bug #1258196 +++

Description of problem:
gNFSd throws Remote IO error for mounts to directories which have changed OOB from the target gNFSd (say from a FUSE mount).  Internally this is due to ESTALE (op_errno == 116) being returned to mnt3_resolve_subdir_cbk, this causes the code path to unroll with an error.  Per the AFR2 code comments, the correct behavior is for gNFSd to purge the inode from it's inode table and do a fresh lookup on the inode.

The question might follow why does mnt3_resolve_subdir_cbk get ESTALE?  This is because the LOOKUP request is actually sent to the bricks via gfid vs a full path lookup, and this optimization happens because the path successfully grep's the gNFSd inode table for the GFID.  This isn't incorrect behavior, but is the root cause of the ESTALE.

Version-Release number of selected component (if applicable):
v3.6.x (verified), probably 3.7.x but unverified.

How reproducible:
100%, see prove test.


Steps to Reproduce:
See prove test.

Actual results:
Mount returns with "Remote I/O error"

Expected results:
The mount should succeed.


Additional info:
See attached prove test and patch which resolves the bug.

--- Additional comment from  on 2015-08-28 22:57:54 CEST ---

Patch based off of FB GlusterFS v3.6.3, might not line up exactly but patching mnt3_resolve_subdir_cbk of mounts3.c per this patch should do the trick.

--- Additional comment from  on 2015-08-29 00:37:58 CEST ---

Comment 1 Anand Avati 2015-08-30 07:54:36 UTC
REVIEW: http://review.gluster.org/12047 (nfs: Fixes "Remote I/O error" mount failures) posted (#1) for review on release-3.7 by Niels de Vos (ndevos)

Comment 2 Anand Avati 2015-08-30 18:53:58 UTC
REVIEW: http://review.gluster.org/12047 (nfs: Fixes "Remote I/O error" mount failures) posted (#2) for review on release-3.7 by Niels de Vos (ndevos)

Comment 3 Anand Avati 2015-08-31 06:34:48 UTC
REVIEW: http://review.gluster.org/12047 (nfs: Fixes "Remote I/O error" mount failures) posted (#3) for review on release-3.7 by Vijay Bellur (vbellur)

Comment 4 Anand Avati 2015-08-31 14:45:28 UTC
REVIEW: http://review.gluster.org/12047 (nfs: Fixes "Remote I/O error" mount failures) posted (#4) for review on release-3.7 by Vijay Bellur (vbellur)

Comment 5 Anand Avati 2015-08-31 19:02:57 UTC
REVIEW: http://review.gluster.org/12047 (nfs: Fixes "Remote I/O error" mount failures) posted (#5) for review on release-3.7 by Niels de Vos (ndevos)

Comment 6 Vijay Bellur 2015-10-30 16:21:26 UTC
COMMIT: http://review.gluster.org/12047 committed in release-3.7 by Kaleb KEITHLEY (kkeithle) 
------
commit 7d4f7bbe4b3f38a47fa467cd8ec6b0408aced0b6
Author: Richard Wareing <rwareing>
Date:   Thu Aug 27 21:06:37 2015 -0700

    nfs: Fixes "Remote I/O error" mount failures
    
    - Fixes issue where NFS mount fail with "Remove I/O error" after the
      target directory has been deleted and re-created after the gNFSd has
      already cached the inode of the first generation of the target
      directory.
    - The solution is to follow the guidance of the AFR2 comments and
      refresh the inode by deleting it from cache and looking it up
      again.
    
    BUG: 1258197
    Change-Id: I9c7d8bd460ee9e5ea0b5b47d23886b1afcdcd563
    Reported-by: Richard Wareing <rwareing>
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: http://review.gluster.org/12047
    Tested-by: Gluster Build System <jenkins.com>

Comment 7 Raghavendra Talur 2015-11-17 05:57:59 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.6, please open a new bug report.

glusterfs-3.7.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://www.gluster.org/pipermail/gluster-users/2015-November/024359.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user