Bug 1019095 - Inconsistent errno returned by glusterfs client when bricks are not online
Inconsistent errno returned by glusterfs client when bricks are not online
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: distribute (Show other bugs)
mainline
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Kaushal
:
Depends On:
Blocks: 1060259
  Show dependency treegraph
 
Reported: 2013-10-15 03:20 EDT by Kaushal
Modified: 2014-04-17 09:14 EDT (History)
1 user (show)

See Also:
Fixed In Version: glusterfs-3.4.3
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-04-17 09:14:54 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Kaushal 2013-10-15 03:20:36 EDT
Description of problem:

Two glusterfs clients returned inconsistent errnos when the bricks of the volume were down. Consider two gluster mounts. Mount 1 was done when the bricks were online. Mount 2 was done after the bricks were killed, (using the 'glusterfs' command instead of the mount script).

For any request, mount 1 will return ENOTCONN, where as mount 2 will return ENOENT.

This happens because for the 2nd mount, a fuse would send a lookup on '/' for any request, as it hadn't been done yet. The client xlator returns ENOTCONN, but the dht_lookup_dir_cbk changed this to ENOENT unconditionally when aggregating. So, fuse returned ENOENT, even though the errno should have been ENOTCONN.

How reproducible:
Always

Steps to Reproduce:
1. Create and start a volume.
2. Perform mount 1.
3. Kill the bricks.
4. Perform mount 2 (need to perform mount using the 'glusterfs' command directly)
5. Do the same request on both mounts.

Actual results:
Errno returned on mount 2 is ENOENT instead of ENOTCONN

Expected results:
Errno returned is ENOTCONN in both places.

Additional info:
Found this while investigating a quota listing problem.
Comment 1 Anand Avati 2013-10-15 03:26:58 EDT
REVIEW: http://review.gluster.org/6072 (dht: dht_lookup_dir_cbk should set op_errno as local->op_errno) posted (#2) for review on master by Kaushal M (kaushal@redhat.com)
Comment 2 Anand Avati 2013-10-15 03:38:14 EDT
COMMIT: http://review.gluster.org/6072 committed in master by Anand Avati (avati@redhat.com) 
------
commit 1cf925670768383044588fa162d65be8545224ce
Author: Kaushal M <kaushal@redhat.com>
Date:   Fri Oct 11 12:46:06 2013 +0530

    dht: dht_lookup_dir_cbk should set op_errno as local->op_errno
    
    Two glusterfs clients return inconsistent errnos when the bricks of the volume
    were down. Consider two gluster mounts. Mount 1 was done when the bricks were
    online. Mount 2 was done after the bricks were killed, (using the 'glusterfs'
    command instead of the mount script).
    
    For any request, mount 1 will return ENOTCONN, where as mount 2 will return
    ENOENT.
    
    This happens because for the 2nd mount, a fuse would send a lookup on '/' for
    any request, as it hadn't been done yet. The client xlator returns ENOTCONN,
    but the dht_lookup_dir_cbk changed this to ENOENT unconditionally when
    aggregating. So, fuse returned ENOENT, even though the errno should have been
    ENOTCONN.
    
    Change-Id: I4b7a6d84ce5153045a807fccc01485afe0377117
    BUG: 1019095
    Signed-off-by: Kaushal M <kaushal@redhat.com>
    Reviewed-on: http://review.gluster.org/6072
    Reviewed-by: Anand Avati <avati@redhat.com>
    Tested-by: Anand Avati <avati@redhat.com>
Comment 3 Anand Avati 2013-12-10 04:35:22 EST
REVIEW: http://review.gluster.org/6471 (dht: dht_lookup_dir_cbk should set op_errno as local->op_errno) posted (#1) for review on release-3.4 by Shishir Gowda (gowda.shishir@gmail.com)
Comment 4 Anand Avati 2014-03-11 12:32:11 EDT
COMMIT: http://review.gluster.org/6471 committed in release-3.4 by Anand Avati (avati@redhat.com) 
------
commit 010a9a7867c7135dfedf52e5d2b34122a9cb1984
Author: shishir gowda <gowda.shishir@gmail.com>
Date:   Tue Dec 10 15:02:49 2013 +0530

    dht: dht_lookup_dir_cbk should set op_errno as local->op_errno
    
    Two glusterfs clients return inconsistent errnos when the bricks of the volume
    were down. Consider two gluster mounts. Mount 1 was done when the bricks were
    online. Mount 2 was done after the bricks were killed, (using the 'glusterfs'
    command instead of the mount script).
    
    For any request, mount 1 will return ENOTCONN, where as mount 2 will return
    ENOENT.
    
    This happens because for the 2nd mount, a fuse would send a lookup on '/' for
    any request, as it hadn't been done yet. The client xlator returns ENOTCONN,
    but the dht_lookup_dir_cbk changed this to ENOENT unconditionally when
    aggregating. So, fuse returned ENOENT, even though the errno should have been
    ENOTCONN.
    
    backporting  http://review.gluster.org/6072
    
    BUG: 1019095
    Change-Id: Iaa40dffefddfcaf1ab7736f5423d7f9d2ece1363
    Original-author: Kaushal M <kaushal@redhat.com>
    Signed-off-by: shishir gowda <gowda.shishir@gmail.com>
    Reviewed-on: http://review.gluster.org/6471
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Harshavardhana <harsha@harshavardhana.net>
    Reviewed-by: Anand Avati <avati@redhat.com>
Comment 5 Niels de Vos 2014-04-17 09:14:54 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.4.3, please reopen this bug report.

glusterfs-3.4.3 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should already be or become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

The fix for this bug likely to be included in all future GlusterFS releases i.e. release > 3.4.3. In the same line the recent release i.e. glusterfs-3.5.0 [3] likely to have the fix. You can verify this by reading the comments in this bug report and checking for comments mentioning "committed in release-3.5".

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/5978
[2] http://news.gmane.org/gmane.comp.file-systems.gluster.user
[3] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137

Note You need to log in before you can comment on or make changes to this bug.