Bug 1246987

Summary: Deceiving log messages like "Failing STAT on gfid : split-brain observed. [Input/output error]" reported
Product: [Community] GlusterFS Reporter: Krutika Dhananjay <kdhananj>
Component: replicateAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.7.3CC: bugs, gluster-bugs, ndevos, pkarampu, saujain, storage-qa-internal
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1246052 Environment:
Last Closed: 2015-07-30 09:47:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1240657, 1246052    
Bug Blocks:    

Description Krutika Dhananjay 2015-07-27 06:10:09 UTC
+++ This bug was initially created as a clone of Bug #1246052 +++

+++ This bug was initially created as a clone of Bug #1240657 +++

Description of problem:
I try to delete a directory and I the error messages in ganesha-gfapi.log, like these ones,

[2015-07-07 18:04:34.786903] W [MSGID: 114031] [client-rpc-fops.c:531:client3_3_stat_cbk] 0-vol3-client-8: remote operation failed [No such file or directory]
[2015-07-07 18:04:34.787612] E [MSGID: 108008] [afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-3: Failing STAT on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed. [Input/output error]
[2015-07-07 18:04:34.787954] E [MSGID: 108008] [afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-1: Failing STAT on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed. [Input/output error]
[2015-07-07 18:04:34.788090] E [MSGID: 108008] [afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-5: Failing STAT on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed. [Input/output error]
[2015-07-07 18:04:34.788191] E [MSGID: 108008] [afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-0: Failing STAT on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed. [Input/output error]
[2015-07-07 18:04:34.788240] E [MSGID: 108008] [afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-2: Failing STAT on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed. [Input/output error]
[2015-07-07 18:04:34.788478] E [MSGID: 108008] [afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-4: Failing STAT on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed. [Input/output error]


Though the directory deletion is successful, test was done on vers=4

Version-Release number of selected component (if applicable):
nfs-ganesha-2.2.0-4.el6rhs.x86_64
glusterfs-3.7.1-7.el6rhs.x86_64

How reproducible:
always

Actual results:
as described above

Expected results:
The above logs may be confusing while debugging the issue, hence we should try to avoid these kind of confusing logs.

Additional info:

--- Additional comment from Saurabh on 2015-07-07 08:49:18 EDT ---



--- Additional comment from Soumya Koduri on 2015-07-08 06:48:54 EDT ---

Could you please provide the steps which led to this issue. Normal directory removal operations work for us.

Also please CC the nfs team so that we do not miss out the bugs if needed. Thanks!

--- Additional comment from Saurabh on 2015-07-08 07:03:37 EDT ---

rm -rf /mount-point/dir-name
or rmdir /mount-point/dir-name

--- Additional comment from Soumya Koduri on 2015-07-08 07:05:40 EDT ---

Please provide the tests you have been running before you hit the issue and if its consistently reproducible and also the volume setup details (if in case any other features are on or any bricks unavailable?)

--- Additional comment from Saurabh on 2015-07-08 07:20:52 EDT ---

It is pretty staright forward hence I just wrote the description.

1. create a volume of type 6x2, start it
2. mount the volume with vers=4, post configuring nfs-ganesha
3. mkdir /mount-point/<dirname>
4. rmdir /mount-point/<dirname>

--- Additional comment from Soumya Koduri on 2015-07-08 07:55:14 EDT ---

Thanks Saurabh. Have changed the bug summary to reflect that.

--- Additional comment from Niels de Vos on 2015-07-20 08:45:10 EDT ---

These messages are related to AFR, changing the component.

When a directory (or file) over NFS gets removed, a stat() on the filehandle gets done afterwards. This is needed for updating the inode-cache that could still be valid for hardlinks.

It is not clear to me what a stat() on a GFID could return EIO instead of ENOENT.

--- Additional comment from Anand Avati on 2015-07-24 04:39:20 EDT ---

REVIEW: http://review.gluster.org/11756 (cluster/afr: Fix incorrect logging in read transactions) posted (#1) for review on master by Krutika Dhananjay (kdhananj)

--- Additional comment from Anand Avati on 2015-07-27 02:05:53 EDT ---

COMMIT: http://review.gluster.org/11756 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 7bec717f8850135368609fccf1b1c697af60c546
Author: Krutika Dhananjay <kdhananj>
Date:   Thu Jul 23 18:08:34 2015 +0530

    cluster/afr: Fix incorrect logging in read transactions
    
    afr_read_txn_refresh_done() at its entry point can fail for
    reasons like ENOENT/ESTALE but seldom due to EIO, which is something
    _AFR_ would internally generate and not receive in response from
    a child translator. AFR is reporting "split-brain" for _any_
    kind of failure in read txn, of the following kind:
    
    [2015-07-07 18:04:34.787612] E [MSGID: 108008]
    [afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-3:
    Failing STAT on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4:
    split-brain observed. [Input/output error]
    
    This patch fixes such misleading errors.
    
    To-Do:
    Avoid logging EIO if/when split-brain choice is set.
    Will do that as part of a separate commit.
    
    Change-Id: Ib513c75168f7026118ad5b3f0b35e9dd498cfe1e
    BUG: 1246052
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: http://review.gluster.org/11756
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Reviewed-by: Anuradha Talur <atalur>
    Reviewed-by: Ravishankar N <ravishankar>
    Tested-by: Gluster Build System <jenkins.com>

Comment 1 Anand Avati 2015-07-27 06:20:49 UTC
REVIEW: http://review.gluster.org/11764 (cluster/afr: Fix incorrect logging in read transactions) posted (#1) for review on release-3.7 by Krutika Dhananjay (kdhananj)

Comment 2 Kaushal 2015-07-30 09:47:24 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.3, please open a new bug report.

glusterfs-3.7.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12078
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user