Bug 1258069

Summary: gNFSd: NFS mount fails with "Remote I/O error"
Product: [Community] GlusterFS Reporter: rwareing
Component: nfsAssignee: Niels de Vos <ndevos>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.6.5CC: bugs, gluster-bugs, ndevos, rabhat
Target Milestone: ---Keywords: Patch, Triaged
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: glusterfs-3.6.6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1258196 (view as bug list) Environment:
Last Closed: 2015-09-30 12:15:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1258196    
Bug Blocks: 1260420    
Attachments:
Description Flags
Repro & patch for bug 1258069.
none
New patch - refactored + correct error handling none

Description rwareing 2015-08-28 20:50:26 UTC
Description of problem:
gNFSd throws Remote IO error for mounts to directories which have changed OOB from the target gNFSd (say from a FUSE mount).  Internally this is due to ESTALE (op_errno == 116) being returned to mnt3_resolve_subdir_cbk, this causes the code path to unroll with an error.  Per the AFR2 code comments, the correct behavior is for gNFSd to purge the inode from it's inode table and do a fresh lookup on the inode.

The question might follow why does mnt3_resolve_subdir_cbk get ESTALE?  This is because the LOOKUP request is actually sent to the bricks via gfid vs a full path lookup, and this optimization happens because the path successfully grep's the gNFSd inode table for the GFID.  This isn't incorrect behavior, but is the root cause of the ESTALE.

Version-Release number of selected component (if applicable):
v3.6.x (verified), probably 3.7.x but unverified.

How reproducible:
100%, see prove test.


Steps to Reproduce:
See prove test.

Actual results:
Mount returns with "Remote I/O error"

Expected results:
The mount should succeed.


Additional info:
See attached prove test and patch which resolves the bug.

Comment 2 rwareing 2015-08-28 20:57:54 UTC
Created attachment 1068138 [details]
Repro & patch for bug 1258069.

Patch based off of FB GlusterFS v3.6.3, might not line up exactly but patching mnt3_resolve_subdir_cbk of mounts3.c per this patch should do the trick.

Comment 3 rwareing 2015-08-28 22:37:58 UTC
Created attachment 1068177 [details]
New patch - refactored + correct error handling

Comment 4 Niels de Vos 2015-08-30 07:55:37 UTC
Thanks for the patch! I've posted for review: http://review.gluster.org/12045

Comment 5 Anand Avati 2015-08-30 18:59:34 UTC
REVIEW: http://review.gluster.org/12045 (nfs: Fixes "Remote I/O error" mount failures) posted (#2) for review on release-3.6 by Niels de Vos (ndevos)

Comment 6 Raghavendra Bhat 2015-09-30 12:15:13 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.6, please open a new bug report.

glusterfs-3.6.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://www.gluster.org/pipermail/gluster-devel/2015-September/046821.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user