1240657 – Deceiving log messages like "Failing STAT on gfid : split-brain observed. [Input/output error]" reported

Bug 1240657 - Deceiving log messages like "Failing STAT on gfid : split-brain observed. [Input/output error]" reported

Summary: Deceiving log messages like "Failing STAT on gfid : split-brain observed. [In...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.1.1
Assignee:	Krutika Dhananjay
QA Contact:	Shruti Sampat
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1216951 1246052 1246987 1251815
TreeView+	depends on / blocked

Reported:	2015-07-07 12:47 UTC by Saurabh
Modified:	2016-09-17 12:15 UTC (History)
CC List:	17 users (show)
Fixed In Version:	glusterfs-3.7.1-12
Doc Type:	Bug Fix
Doc Text:	Previously, AFR was logging messages about files and directories going into split-brain even in case of failures that were unrelated to split-brain. As a consequence, for each stat on a file and directory that fails, AFR would wrongly report that it is in split-brain. With this fix, AFR logs messages about split-brain only in case of a true split-brain.
Clone Of:
Clones:	1246052 (view as bug list)
Environment:
Last Closed:	2015-10-05 07:18:43 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
nfs11 ganesha-gfapi.log (604.38 KB, text/plain) 2015-07-07 12:49 UTC, Saurabh	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1845	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.1 update	2015-10-05 11:06:22 UTC

Description Saurabh 2015-07-07 12:47:53 UTC

Description of problem:
I try to delete a directory and I the error messages in ganesha-gfapi.log, like these ones,

[2015-07-07 18:04:34.786903] W [MSGID: 114031] [client-rpc-fops.c:531:client3_3_stat_cbk] 0-vol3-client-8: remote operation failed [No such file or directory]
[2015-07-07 18:04:34.787612] E [MSGID: 108008] [afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-3: Failing STAT on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed. [Input/output error]
[2015-07-07 18:04:34.787954] E [MSGID: 108008] [afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-1: Failing STAT on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed. [Input/output error]
[2015-07-07 18:04:34.788090] E [MSGID: 108008] [afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-5: Failing STAT on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed. [Input/output error]
[2015-07-07 18:04:34.788191] E [MSGID: 108008] [afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-0: Failing STAT on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed. [Input/output error]
[2015-07-07 18:04:34.788240] E [MSGID: 108008] [afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-2: Failing STAT on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed. [Input/output error]
[2015-07-07 18:04:34.788478] E [MSGID: 108008] [afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-4: Failing STAT on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed. [Input/output error]


Though the directory deletion is successful, test was done on vers=4

Version-Release number of selected component (if applicable):
nfs-ganesha-2.2.0-4.el6rhs.x86_64
glusterfs-3.7.1-7.el6rhs.x86_64

How reproducible:
always

Actual results:
as described above

Expected results:
The above logs may be confusing while debugging the issue, hence we should try to avoid these kind of confusing logs.

Additional info:

Comment 2 Saurabh 2015-07-07 12:49:18 UTC

Created attachment 1049276 [details]
nfs11 ganesha-gfapi.log

Comment 4 Saurabh 2015-07-08 11:03:37 UTC

rm -rf /mount-point/dir-name
or rmdir /mount-point/dir-name

Comment 5 Soumya Koduri 2015-07-08 11:05:40 UTC

Please provide the tests you have been running before you hit the issue and if its consistently reproducible and also the volume setup details (if in case any other features are on or any bricks unavailable?)

Comment 6 Saurabh 2015-07-08 11:20:52 UTC

It is pretty staright forward hence I just wrote the description.

1. create a volume of type 6x2, start it
2. mount the volume with vers=4, post configuring nfs-ganesha
3. mkdir /mount-point/<dirname>
4. rmdir /mount-point/<dirname>

Comment 7 Soumya Koduri 2015-07-08 11:55:14 UTC

Thanks Saurabh. Have changed the bug summary to reflect that.

Comment 9 Niels de Vos 2015-07-20 12:45:10 UTC

These messages are related to AFR, changing the component.

When a directory (or file) over NFS gets removed, a stat() on the filehandle gets done afterwards. This is needed for updating the inode-cache that could still be valid for hardlinks.

It is not clear to me what a stat() on a GFID could return ENXIO instead of ENOENT.

Comment 10 monti lawrence 2015-07-22 19:48:28 UTC

Doc text is edited. Please sign off to be included in Known Issues.

Comment 11 Krutika Dhananjay 2015-07-27 09:09:15 UTC

https://code.engineering.redhat.com/gerrit/#/c/53813/

Comment 12 Krutika Dhananjay 2015-07-28 06:24:46 UTC

Here are simpler steps to recreate the issue (the one that doesn't require you to set up NFS Ganesha):

1. Create a 1x2 replicate volume and start it.
2. Disable md-cache on the volume. (#gluster volume set <VOL> performance.stat-prefetch off)
2. Create two FUSE mounts with {entry,attribute} timeout set to 0, and readdirp disabled.

#glusterfs --volfile-server=kritika --volfile-id=rep --attribute-timeout=0 --entry-timeout=0 --use-readdirp=no /mnt/glusterfs/0

and ..
#glusterfs --volfile-server=kritika --volfile-id=rep --attribute-timeout=0 --entry-timeout=0 --use-readdirp=no /mnt/glusterfs/1

(This is to prevent STAT() from being served off the cache and in turn force calls to afr_stat()).

3. From the first mount point, create a directory.
        #mkdir /mnt/glusterfs/0/dir
4. From the second mount point, remove this directory.
        #rmdir /mnt/glusterfs/1/dir
5. Issue stat on this directory's path from the first mount point.
        #stat /mnt/glusterfs/0/dir.

6. stat will fail with ENOENT (expected).
   Check the log file of the first mount process and you'll find a message indicating split-brain of "dir" (not expected).

Comment 15 Krutika Dhananjay 2015-08-13 07:23:29 UTC

Patch merged.

Comment 16 Shruti Sampat 2015-09-09 07:06:08 UTC

Verified as fixed in glusterfs-3.7.1-14.el7rhgs.x86_64.

Unable to reproduce the issue in 3.1.0 (without nfs-ganesha) using the test case described in comment #12.

Tried on a volume exported via nfs-ganesha with the steps described in comment #6. Found to be fixed.

Comment 18 errata-xmlrpc 2015-10-05 07:18:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1845.html

Note You need to log in before you can comment on or make changes to this bug.