Bug 1412915 - Spurious split-brain error messages are seen in rebalance logs
Summary: Spurious split-brain error messages are seen in rebalance logs
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On: 1411617 1411625
Blocks: 1412914
TreeView+ depends on / blocked
 
Reported: 2017-01-13 06:09 UTC by Krutika Dhananjay
Modified: 2017-02-20 12:33 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.8.9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1411625
Environment:
Last Closed: 2017-02-20 12:33:40 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Krutika Dhananjay 2017-01-13 06:09:42 UTC
+++ This bug was initially created as a clone of Bug #1411625 +++

+++ This bug was initially created as a clone of Bug #1411617 +++

Description of problem:
=======================
On a nfs-ganesha setup, while rm -rf and remove-brick operation are in-progress, we are seeing spurious split-brain observed error messages in rebalance logs.

Rebalance logs error snippet:
=============================
[2017-01-09 06:50:36.232738] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-6: Failing GETXATTR on gfid 5ab6a290-3127-4662-86e7-c52d32949c67: split-brain observed. [Input/output error]
[2017-01-09 06:50:36.244473] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-6: Failing STAT on gfid 5ab6a290-3127-4662-86e7-c52d32949c67: split-brain observed. [Input/output error]
[2017-01-09 06:50:38.930970] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-8: Failing GETXATTR on gfid 000feb2a-2a8f-40f1-ae9e-926f0d0ae323: split-brain observed. [Input/output error]
[2017-01-09 06:50:38.944043] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-8: Failing STAT on gfid 000feb2a-2a8f-40f1-ae9e-926f0d0ae323: split-brain observed. [Input/output error]
[2017-01-09 06:50:43.595767] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-8: Failing GETXATTR on gfid a6f9d15e-969b-4630-867d-d7a402f242b2: split-brain observed. [Input/output error]
[2017-01-09 06:50:43.611669] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-8: Failing STAT on gfid a6f9d15e-969b-4630-867d-d7a402f242b2: split-brain observed. [Input/output error]
[2017-01-09 06:50:46.798033] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-6: Failing GETXATTR on gfid b0a4fef7-bd4c-472f-9027-eb6aef268e29: split-brain observed. [Input/output error]
[2017-01-09 06:50:46.810447] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-6: Failing STAT on gfid b0a4fef7-bd4c-472f-9027-eb6aef268e29: split-brain observed. [Input/output error]


Version-Release number of selected component (if applicable):
3.8.4-10.el7rhgs.x86_64

Steps to Reproduce:
===================
1) Create ganesha cluster and create a distributed-replicate volume.
2) Enable nfs-ganesha on the volume with mdcache settings.
3) Mount the volume.
4) Create files and folders.
5) From mount point, issue rm -rf * and start removing bricks.

We can see split-brain error messages in rebalance logs.

Actual results:
===============
During rebalance, spurious split-brain error messages are seen in rebalance logs.

Expected results:
=================
There should not be any split-brain error messages as actually no split-brain has occurred.

--- Additional comment from Worker Ant on 2017-01-10 22:22:31 EST ---

REVIEW: http://review.gluster.org/16362 (cluster/afr: Do not log of split-brain when there isn't one) posted (#2) for review on master by Krutika Dhananjay (kdhananj@redhat.com)

--- Additional comment from Worker Ant on 2017-01-10 23:09:32 EST ---

REVIEW: http://review.gluster.org/16362 (cluster/afr: Do not log of split-brain when there isn't one) posted (#3) for review on master by Krutika Dhananjay (kdhananj@redhat.com)

--- Additional comment from Worker Ant on 2017-01-12 01:42:06 EST ---

COMMIT: http://review.gluster.org/16362 committed in master by Pranith Kumar Karampuri (pkarampu@redhat.com) 
------
commit 5b24934668adb89e1dcd3888ac19555056508f06
Author: Krutika Dhananjay <kdhananj@redhat.com>
Date:   Tue Jan 10 13:26:02 2017 +0530

    cluster/afr: Do not log of split-brain when there isn't one
    
    * Even on errors like ENOENT, AFR logs split-brain after
      read-txn refresh, introduced by commit a07ddd8f.
      This can be a cause of much panic and confusion and needs to be fixed.
    
    * Also fixed this issue in write-txns.
    
    * Fixed afr read txns to log about split-brain only after knowing that
      there is no split-brain choice configured.
    
    * Removed code duplication
    
    * Fixed incorrect passing of error code in afr_write_txn_refresh_done()
      (the function was passing -0 as errno to gf_msg().
    
    Change-Id: I354f454ce5bf0e5f00bc27916eb597367cb7d927
    BUG: 1411625
    Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
    Reviewed-on: http://review.gluster.org/16362
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Ravishankar N <ravishankar@redhat.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>

Comment 1 Worker Ant 2017-01-13 06:53:41 UTC
REVIEW: http://review.gluster.org/16393 (cluster/afr: Do not log of split-brain when there isn't one) posted (#1) for review on release-3.8 by Krutika Dhananjay (kdhananj@redhat.com)

Comment 2 Worker Ant 2017-01-17 06:22:13 UTC
COMMIT: http://review.gluster.org/16393 committed in release-3.8 by Pranith Kumar Karampuri (pkarampu@redhat.com) 
------
commit 29f15ec5a43bb7df58457df8cad28df994b8b1f5
Author: Krutika Dhananjay <kdhananj@redhat.com>
Date:   Tue Jan 10 13:26:02 2017 +0530

    cluster/afr: Do not log of split-brain when there isn't one
    
    	Backport of: http://review.gluster.org/16362
    
    * Even on errors like ENOENT, AFR logs split-brain after
      read-txn refresh, introduced by commit a07ddd8f.
      This can be a cause of much panic and confusion and needs to be fixed.
    
    * Also fixed this issue in write-txns.
    
    * Fixed afr read txns to log about split-brain only after knowing that
      there is no split-brain choice configured.
    
    * Removed code duplication
    
    * Fixed incorrect passing of error code in afr_write_txn_refresh_done()
      (the function was passing -0 as errno to gf_msg().
    
    Change-Id: I21ac7f6e31840fe3da2f9eecccc495056ab46ece
    BUG: 1412915
    Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
    Reviewed-on: http://review.gluster.org/16393
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Ravishankar N <ravishankar@redhat.com>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>

Comment 3 Niels de Vos 2017-02-20 12:33:40 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.9, please open a new bug report.

glusterfs-3.8.9 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2017-February/000066.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.