Bug 1542475

Summary: Random failures in tests/bugs/nfs/bug-974972.t
Product: [Community] GlusterFS Reporter: Karthik U S <ksubrahm>
Component: nfsAssignee: Karthik U S <ksubrahm>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.12CC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.12.6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-05 07:14:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Karthik U S 2018-02-06 12:21:57 UTC
Description of problem:
Test case bug-974972.t is randomly failing with the below logs.

[2018-02-05 05:54:53.488934]:++++++++++ G_LOG:./bugs/nfs/bug-974972.t: TEST: 30 1 afr_child_up_status_in_nfs patchy 0 ++++++++++
[2018-02-05 05:54:54.541079]:++++++++++ G_LOG:./bugs/nfs/bug-974972.t: TEST: 31 1 afr_child_up_status_in_nfs patchy 1 ++++++++++
[2018-02-05 05:54:55.607606]:++++++++++ G_LOG:./bugs/nfs/bug-974972.t: TEST: 34 ls /mnt/nfs/0/1 ++++++++++
[2018-02-05 05:54:55.612783] W [MSGID: 108027] [afr-common.c:2737:afr_discover_done] 0-patchy-replicate-0: no read subvols for <gfid:ba721bb9-8a3c-44d6-8f26-4e591cafb246>
[2018-02-05 05:54:55.620246]:++++++++++ G_LOG:./bugs/nfs/bug-974972.t: TEST: 35 ! cat /mnt/nfs/0/1 ++++++++++
[2018-02-05 05:54:55.623576] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-patchy-replicate-0: Failing STAT on gfid ba721bb9-8a3c-44d6-8f26-4e591cafb246: split-brain observed. [Input/output error]
[2018-02-05 05:54:55.624789] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-patchy-replicate-0: Failing GETXATTR on gfid ba721bb9-8a3c-44d6-8f26-4e591cafb246: split-brain observed. [Input/output error]
[2018-02-05 05:54:55.627529] W [MSGID: 108027] [afr-common.c:2737:afr_discover_done] 0-patchy-replicate-0: no read subvols for <gfid:ba721bb9-8a3c-44d6-8f26-4e591cafb246>
[2018-02-05 05:54:55.629329] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-patchy-replicate-0: Failing STAT on gfid ba721bb9-8a3c-44d6-8f26-4e591cafb246: split-brain observed. [Input/output error]
[2018-02-05 05:54:55.629382] W [MSGID: 112199] [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: <gfid:ba721bb9-8a3c-44d6-8f26-4e591cafb246> => (XID: 445d2a50, GETATTR: NFS: 5(I/O error), POSIX: 5(Input/output error))
[2018-02-05 05:54:55.635471] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-patchy-replicate-0: performing metadata selfheal on ba721bb9-8a3c-44d6-8f26-4e591cafb246
[2018-02-05 05:54:55.637240] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-patchy-replicate-0: Failing ACCESS on gfid ba721bb9-8a3c-44d6-8f26-4e591cafb246: split-brain observed. [Input/output error]
[2018-02-05 05:54:55.638991] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-patchy-replicate-0: Failing GETXATTR on gfid ba721bb9-8a3c-44d6-8f26-4e591cafb246: split-brain observed. [Input/output error]
[2018-02-05 05:54:55.648613] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-patchy-replicate-0: Completed metadata selfheal on ba721bb9-8a3c-44d6-8f26-4e591cafb246. sources=[0]  sinks=1 
[2018-02-05 05:54:55.665468] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-patchy-replicate-0: Completed data selfheal on ba721bb9-8a3c-44d6-8f26-4e591cafb246. sources=[1]  sinks=0 

Version-Release number of selected component (if applicable):


How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Read on a split-brain file is allowed.

Expected results:
Read on a split-brain file should fail.

Additional info:

Comment 1 Worker Ant 2018-02-06 12:34:43 UTC
REVIEW: https://review.gluster.org/19510 (nfs: Adding check to make sure write is wound on the file) posted (#1) for review on master by Karthik U S

Comment 2 Worker Ant 2018-02-08 09:26:55 UTC
REVIEW: https://review.gluster.org/19526 (nfs: Fixing the failure in bug-974972.t test case) posted (#1) for review on release-3.12 by Karthik U S

Comment 3 Worker Ant 2018-02-12 10:13:57 UTC
COMMIT: https://review.gluster.org/19526 committed in release-3.12 by "jiffin tony Thottan" <jthottan> with a commit message- nfs: Fixing the failure in bug-974972.t test case

Problem:
gNFS servers is restarted even before the pending marker is set
becuase of eager lock being on, leading to a file in split-brain
to get healed.

Fix:
Switch off the eager lock.
This is fixed in the later versions by the patch:
https://review.gluster.org/#/c/13075/

Change-Id: If59e5ede2de1cbfcdeac01cca38dc0d046f1993c
BUG: 1542475
Signed-off-by: karthik-us <ksubrahm>

Comment 4 Jiffin 2018-03-05 07:14:45 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.6, please open a new bug report.

glusterfs-3.12.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2018-February/033552.html
[2] https://www.gluster.org/pipermail/gluster-users/