Bug 1411625 - Spurious split-brain error messages are seen in rebalance logs
Summary: Spurious split-brain error messages are seen in rebalance logs
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On: 1411617
Blocks: 1412914 1412915
TreeView+ depends on / blocked
 
Reported: 2017-01-10 07:03 UTC by Krutika Dhananjay
Modified: 2017-03-06 17:42 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.10.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1411617
: 1412914 1412915 (view as bug list)
Environment:
Last Closed: 2017-03-06 17:42:42 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Krutika Dhananjay 2017-01-10 07:03:11 UTC
+++ This bug was initially created as a clone of Bug #1411617 +++

Description of problem:
=======================
On a nfs-ganesha setup, while rm -rf and remove-brick operation are in-progress, we are seeing spurious split-brain observed error messages in rebalance logs.

Rebalance logs error snippet:
=============================
[2017-01-09 06:50:36.232738] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-6: Failing GETXATTR on gfid 5ab6a290-3127-4662-86e7-c52d32949c67: split-brain observed. [Input/output error]
[2017-01-09 06:50:36.244473] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-6: Failing STAT on gfid 5ab6a290-3127-4662-86e7-c52d32949c67: split-brain observed. [Input/output error]
[2017-01-09 06:50:38.930970] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-8: Failing GETXATTR on gfid 000feb2a-2a8f-40f1-ae9e-926f0d0ae323: split-brain observed. [Input/output error]
[2017-01-09 06:50:38.944043] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-8: Failing STAT on gfid 000feb2a-2a8f-40f1-ae9e-926f0d0ae323: split-brain observed. [Input/output error]
[2017-01-09 06:50:43.595767] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-8: Failing GETXATTR on gfid a6f9d15e-969b-4630-867d-d7a402f242b2: split-brain observed. [Input/output error]
[2017-01-09 06:50:43.611669] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-8: Failing STAT on gfid a6f9d15e-969b-4630-867d-d7a402f242b2: split-brain observed. [Input/output error]
[2017-01-09 06:50:46.798033] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-6: Failing GETXATTR on gfid b0a4fef7-bd4c-472f-9027-eb6aef268e29: split-brain observed. [Input/output error]
[2017-01-09 06:50:46.810447] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-6: Failing STAT on gfid b0a4fef7-bd4c-472f-9027-eb6aef268e29: split-brain observed. [Input/output error]


Version-Release number of selected component (if applicable):
3.8.4-10.el7rhgs.x86_64

Steps to Reproduce:
===================
1) Create ganesha cluster and create a distributed-replicate volume.
2) Enable nfs-ganesha on the volume with mdcache settings.
3) Mount the volume.
4) Create files and folders.
5) From mount point, issue rm -rf * and start removing bricks.

We can see split-brain error messages in rebalance logs.

Actual results:
===============
During rebalance, spurious split-brain error messages are seen in rebalance logs.

Expected results:
=================
There should not be any split-brain error messages as actually no split-brain has occurred.

Comment 1 Worker Ant 2017-01-11 03:22:31 UTC
REVIEW: http://review.gluster.org/16362 (cluster/afr: Do not log of split-brain when there isn't one) posted (#2) for review on master by Krutika Dhananjay (kdhananj)

Comment 2 Worker Ant 2017-01-11 04:09:32 UTC
REVIEW: http://review.gluster.org/16362 (cluster/afr: Do not log of split-brain when there isn't one) posted (#3) for review on master by Krutika Dhananjay (kdhananj)

Comment 3 Worker Ant 2017-01-12 06:42:06 UTC
COMMIT: http://review.gluster.org/16362 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 5b24934668adb89e1dcd3888ac19555056508f06
Author: Krutika Dhananjay <kdhananj>
Date:   Tue Jan 10 13:26:02 2017 +0530

    cluster/afr: Do not log of split-brain when there isn't one
    
    * Even on errors like ENOENT, AFR logs split-brain after
      read-txn refresh, introduced by commit a07ddd8f.
      This can be a cause of much panic and confusion and needs to be fixed.
    
    * Also fixed this issue in write-txns.
    
    * Fixed afr read txns to log about split-brain only after knowing that
      there is no split-brain choice configured.
    
    * Removed code duplication
    
    * Fixed incorrect passing of error code in afr_write_txn_refresh_done()
      (the function was passing -0 as errno to gf_msg().
    
    Change-Id: I354f454ce5bf0e5f00bc27916eb597367cb7d927
    BUG: 1411625
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: http://review.gluster.org/16362
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Ravishankar N <ravishankar>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 4 Shyamsundar 2017-03-06 17:42:42 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.