Bug 1019095

Summary: Inconsistent errno returned by glusterfs client when bricks are not online
Product: [Community] GlusterFS Reporter: Kaushal <kaushal>
Component: distributeAssignee: Kaushal <kaushal>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-04-17 13:14:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1060259    

Description Kaushal 2013-10-15 07:20:36 UTC
Description of problem:

Two glusterfs clients returned inconsistent errnos when the bricks of the volume were down. Consider two gluster mounts. Mount 1 was done when the bricks were online. Mount 2 was done after the bricks were killed, (using the 'glusterfs' command instead of the mount script).

For any request, mount 1 will return ENOTCONN, where as mount 2 will return ENOENT.

This happens because for the 2nd mount, a fuse would send a lookup on '/' for any request, as it hadn't been done yet. The client xlator returns ENOTCONN, but the dht_lookup_dir_cbk changed this to ENOENT unconditionally when aggregating. So, fuse returned ENOENT, even though the errno should have been ENOTCONN.

How reproducible:
Always

Steps to Reproduce:
1. Create and start a volume.
2. Perform mount 1.
3. Kill the bricks.
4. Perform mount 2 (need to perform mount using the 'glusterfs' command directly)
5. Do the same request on both mounts.

Actual results:
Errno returned on mount 2 is ENOENT instead of ENOTCONN

Expected results:
Errno returned is ENOTCONN in both places.

Additional info:
Found this while investigating a quota listing problem.

Comment 1 Anand Avati 2013-10-15 07:26:58 UTC
REVIEW: http://review.gluster.org/6072 (dht: dht_lookup_dir_cbk should set op_errno as local->op_errno) posted (#2) for review on master by Kaushal M (kaushal)

Comment 2 Anand Avati 2013-10-15 07:38:14 UTC
COMMIT: http://review.gluster.org/6072 committed in master by Anand Avati (avati) 
------
commit 1cf925670768383044588fa162d65be8545224ce
Author: Kaushal M <kaushal>
Date:   Fri Oct 11 12:46:06 2013 +0530

    dht: dht_lookup_dir_cbk should set op_errno as local->op_errno
    
    Two glusterfs clients return inconsistent errnos when the bricks of the volume
    were down. Consider two gluster mounts. Mount 1 was done when the bricks were
    online. Mount 2 was done after the bricks were killed, (using the 'glusterfs'
    command instead of the mount script).
    
    For any request, mount 1 will return ENOTCONN, where as mount 2 will return
    ENOENT.
    
    This happens because for the 2nd mount, a fuse would send a lookup on '/' for
    any request, as it hadn't been done yet. The client xlator returns ENOTCONN,
    but the dht_lookup_dir_cbk changed this to ENOENT unconditionally when
    aggregating. So, fuse returned ENOENT, even though the errno should have been
    ENOTCONN.
    
    Change-Id: I4b7a6d84ce5153045a807fccc01485afe0377117
    BUG: 1019095
    Signed-off-by: Kaushal M <kaushal>
    Reviewed-on: http://review.gluster.org/6072
    Reviewed-by: Anand Avati <avati>
    Tested-by: Anand Avati <avati>

Comment 3 Anand Avati 2013-12-10 09:35:22 UTC
REVIEW: http://review.gluster.org/6471 (dht: dht_lookup_dir_cbk should set op_errno as local->op_errno) posted (#1) for review on release-3.4 by Shishir Gowda (gowda.shishir)

Comment 4 Anand Avati 2014-03-11 16:32:11 UTC
COMMIT: http://review.gluster.org/6471 committed in release-3.4 by Anand Avati (avati) 
------
commit 010a9a7867c7135dfedf52e5d2b34122a9cb1984
Author: shishir gowda <gowda.shishir>
Date:   Tue Dec 10 15:02:49 2013 +0530

    dht: dht_lookup_dir_cbk should set op_errno as local->op_errno
    
    Two glusterfs clients return inconsistent errnos when the bricks of the volume
    were down. Consider two gluster mounts. Mount 1 was done when the bricks were
    online. Mount 2 was done after the bricks were killed, (using the 'glusterfs'
    command instead of the mount script).
    
    For any request, mount 1 will return ENOTCONN, where as mount 2 will return
    ENOENT.
    
    This happens because for the 2nd mount, a fuse would send a lookup on '/' for
    any request, as it hadn't been done yet. The client xlator returns ENOTCONN,
    but the dht_lookup_dir_cbk changed this to ENOENT unconditionally when
    aggregating. So, fuse returned ENOENT, even though the errno should have been
    ENOTCONN.
    
    backporting  http://review.gluster.org/6072
    
    BUG: 1019095
    Change-Id: Iaa40dffefddfcaf1ab7736f5423d7f9d2ece1363
    Original-author: Kaushal M <kaushal>
    Signed-off-by: shishir gowda <gowda.shishir>
    Reviewed-on: http://review.gluster.org/6471
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Harshavardhana <harsha>
    Reviewed-by: Anand Avati <avati>

Comment 5 Niels de Vos 2014-04-17 13:14:54 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.4.3, please reopen this bug report.

glusterfs-3.4.3 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should already be or become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

The fix for this bug likely to be included in all future GlusterFS releases i.e. release > 3.4.3. In the same line the recent release i.e. glusterfs-3.5.0 [3] likely to have the fix. You can verify this by reading the comments in this bug report and checking for comments mentioning "committed in release-3.5".

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/5978
[2] http://news.gmane.org/gmane.comp.file-systems.gluster.user
[3] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137