1019095 – Inconsistent errno returned by glusterfs client when bricks are not online

Bug 1019095 - Inconsistent errno returned by glusterfs client when bricks are not online

Summary: Inconsistent errno returned by glusterfs client when bricks are not online

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Kaushal
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1060259
TreeView+	depends on / blocked

Reported:	2013-10-15 07:20 UTC by Kaushal
Modified:	2014-04-17 13:14 UTC (History)
CC List:	1 user (show)
Fixed In Version:	glusterfs-3.4.3
Clone Of:
Environment:
Last Closed:	2014-04-17 13:14:54 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Kaushal 2013-10-15 07:20:36 UTC

Description of problem:

Two glusterfs clients returned inconsistent errnos when the bricks of the volume were down. Consider two gluster mounts. Mount 1 was done when the bricks were online. Mount 2 was done after the bricks were killed, (using the 'glusterfs' command instead of the mount script).

For any request, mount 1 will return ENOTCONN, where as mount 2 will return ENOENT.

This happens because for the 2nd mount, a fuse would send a lookup on '/' for any request, as it hadn't been done yet. The client xlator returns ENOTCONN, but the dht_lookup_dir_cbk changed this to ENOENT unconditionally when aggregating. So, fuse returned ENOENT, even though the errno should have been ENOTCONN.

How reproducible:
Always

Steps to Reproduce:
1. Create and start a volume.
2. Perform mount 1.
3. Kill the bricks.
4. Perform mount 2 (need to perform mount using the 'glusterfs' command directly)
5. Do the same request on both mounts.

Actual results:
Errno returned on mount 2 is ENOENT instead of ENOTCONN

Expected results:
Errno returned is ENOTCONN in both places.

Additional info:
Found this while investigating a quota listing problem.

Comment 1 Anand Avati 2013-10-15 07:26:58 UTC

REVIEW: http://review.gluster.org/6072 (dht: dht_lookup_dir_cbk should set op_errno as local->op_errno) posted (#2) for review on master by Kaushal M (kaushal)

Comment 2 Anand Avati 2013-10-15 07:38:14 UTC

COMMIT: http://review.gluster.org/6072 committed in master by Anand Avati (avati) 
------
commit 1cf925670768383044588fa162d65be8545224ce
Author: Kaushal M <kaushal>
Date:   Fri Oct 11 12:46:06 2013 +0530

    dht: dht_lookup_dir_cbk should set op_errno as local->op_errno
    
    Two glusterfs clients return inconsistent errnos when the bricks of the volume
    were down. Consider two gluster mounts. Mount 1 was done when the bricks were
    online. Mount 2 was done after the bricks were killed, (using the 'glusterfs'
    command instead of the mount script).
    
    For any request, mount 1 will return ENOTCONN, where as mount 2 will return
    ENOENT.
    
    This happens because for the 2nd mount, a fuse would send a lookup on '/' for
    any request, as it hadn't been done yet. The client xlator returns ENOTCONN,
    but the dht_lookup_dir_cbk changed this to ENOENT unconditionally when
    aggregating. So, fuse returned ENOENT, even though the errno should have been
    ENOTCONN.
    
    Change-Id: I4b7a6d84ce5153045a807fccc01485afe0377117
    BUG: 1019095
    Signed-off-by: Kaushal M <kaushal>
    Reviewed-on: http://review.gluster.org/6072
    Reviewed-by: Anand Avati <avati>
    Tested-by: Anand Avati <avati>

Comment 3 Anand Avati 2013-12-10 09:35:22 UTC

REVIEW: http://review.gluster.org/6471 (dht: dht_lookup_dir_cbk should set op_errno as local->op_errno) posted (#1) for review on release-3.4 by Shishir Gowda (gowda.shishir)

Comment 4 Anand Avati 2014-03-11 16:32:11 UTC

COMMIT: http://review.gluster.org/6471 committed in release-3.4 by Anand Avati (avati) 
------
commit 010a9a7867c7135dfedf52e5d2b34122a9cb1984
Author: shishir gowda <gowda.shishir>
Date:   Tue Dec 10 15:02:49 2013 +0530

    dht: dht_lookup_dir_cbk should set op_errno as local->op_errno
    
    Two glusterfs clients return inconsistent errnos when the bricks of the volume
    were down. Consider two gluster mounts. Mount 1 was done when the bricks were
    online. Mount 2 was done after the bricks were killed, (using the 'glusterfs'
    command instead of the mount script).
    
    For any request, mount 1 will return ENOTCONN, where as mount 2 will return
    ENOENT.
    
    This happens because for the 2nd mount, a fuse would send a lookup on '/' for
    any request, as it hadn't been done yet. The client xlator returns ENOTCONN,
    but the dht_lookup_dir_cbk changed this to ENOENT unconditionally when
    aggregating. So, fuse returned ENOENT, even though the errno should have been
    ENOTCONN.
    
    backporting  http://review.gluster.org/6072
    
    BUG: 1019095
    Change-Id: Iaa40dffefddfcaf1ab7736f5423d7f9d2ece1363
    Original-author: Kaushal M <kaushal>
    Signed-off-by: shishir gowda <gowda.shishir>
    Reviewed-on: http://review.gluster.org/6471
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Harshavardhana <harsha>
    Reviewed-by: Anand Avati <avati>

Comment 5 Niels de Vos 2014-04-17 13:14:54 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.4.3, please reopen this bug report.

glusterfs-3.4.3 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should already be or become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

The fix for this bug likely to be included in all future GlusterFS releases i.e. release > 3.4.3. In the same line the recent release i.e. glusterfs-3.5.0 [3] likely to have the fix. You can verify this by reading the comments in this bug report and checking for comments mentioning "committed in release-3.5".

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/5978
[2] http://news.gmane.org/gmane.comp.file-systems.gluster.user
[3] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137

Note You need to log in before you can comment on or make changes to this bug.