Bug 1294452 - splitbrain error messages are seen in client logs when files are renamed and one of the replica bricks is taken down
Summary: splitbrain error messages are seen in client logs when files are renamed and ...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Ravishankar N
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard: dht-rca-unknown, dht-rename-file, dht...
: 1282378 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-12-28 09:19 UTC by krishnaram Karthick
Modified: 2018-02-14 09:40 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-14 09:40:20 UTC
Embargoed:


Attachments (Terms of Use)

Description krishnaram Karthick 2015-12-28 09:19:42 UTC
Description of problem:

On a tiered volume, 2x2 cold tier and 2x3 dis-rep hot tier, with IO operations such as new file creation and file renames, when one brick from each cold and hot tier was taken down, the following error messages were observed.

[2015-12-28 05:16:45.845145] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-bv-1291560-replicate-3: Failing SETATTR on gfid 0c53a08c-1e36-4484-a5ec-e4e430717d49: split-brain observed. [Input/output error]

 - There were no disruption to ongoing IOs or file renames. 
 - md5sum of the file from the mountpoint and  from all bricks (that are up) returns the same value, which means there is no data corruption or split brain.  
 - New writes to the file which reports split brain errors are successful

sosreports will be attached shortly

Version-Release number of selected component (if applicable):
glusterfs-3.7.5-13.el7rhgs.x86_64

How reproducible:
Occasionally

Steps to Reproduce:

1. Create 2x2 dis-rep cold-tier and 2x3 dis-rep hot-tier volume. Start and fuse mount the volume.
2. create files and rename few files after a while
3. while rename is in progress, kill one of the brick process on hot tier and cold tier
4. observe client logs

Actual results:

Error messages with 'split brain observed' are seen on the client logs

Expected results:
Need to identify if there is actually a split brain, (from my observation there is no stale data). If this is a false error, No such error messages should be seen.

Additional info:

vol info o/p on the volume under test.

[root@dhcp43-19 fb]# gluster vol info bv-1291560
 
Volume Name: bv-1291560
Type: Tier
Volume ID: 52752a44-fdcc-4704-a76a-f2f2f64c1d2f
Status: Started
Number of Bricks: 10
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 3 = 6
Brick1: 10.70.37.121:/rhs/brick13/leg2
Brick2: 10.70.37.140:/rhs/brick13/leg2
Brick3: 10.70.37.140:/rhs/brick12/leg2
Brick4: 10.70.37.77:/rhs/brick12/leg2
Brick5: 10.70.37.132:/rhs/brick12/leg2
Brick6: 10.70.37.121:/rhs/brick12/leg2
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick7: 10.70.37.121:/rhs/brick11/leg1
Brick8: 10.70.37.132:/rhs/brick11/leg1
Brick9: 10.70.37.77:/rhs/brick11/leg1
Brick10: 10.70.37.140:/rhs/brick11/leg1
Options Reconfigured:
cluster.tier-mode: cache
features.ctr-enabled: on
performance.readdir-ahead: on


log snippet from client logs:

[root@dhcp42-214 dd]# grep 'split-brain' /var/log/glusterfs/mnt.log
[2015-12-28 05:13:57.980648] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-bv-1291560-replicate-2: Failing SETATTR on gfid fefb3c7a-595b-412e-ba7c-09a7699ec755: split-brain observed. [Input/output error]
[2015-12-28 05:14:47.983932] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-bv-1291560-replicate-2: Failing SETATTR on gfid 95549e94-a9b3-4c76-bc9a-c3c579caa1ef: split-brain observed. [Input/output error]
[2015-12-28 05:15:36.824196] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-bv-1291560-replicate-3: Failing SETATTR on gfid d2fe9bff-8949-4fa6-8c11-23520b1498a8: split-brain observed. [Input/output error]
[2015-12-28 05:16:45.845145] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-bv-1291560-replicate-3: Failing SETATTR on gfid 0c53a08c-1e36-4484-a5ec-e4e430717d49: split-brain observed. [Input/output error]
[2015-12-28 05:16:56.382201] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-bv-1291560-replicate-3: Failing SETATTR on gfid 93c3f0e8-17c5-4918-8c06-8ad3d3b370af: split-brain observed. [Input/output error]
[2015-12-28 05:17:00.601881] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-bv-1291560-replicate-2: Failing SETATTR on gfid 8b6c2c7b-79ec-420e-9bea-596ef2223c09: split-brain observed. [Input/output error]

Comment 4 Pranith Kumar K 2016-01-12 15:41:08 UTC
*** Bug 1282378 has been marked as a duplicate of this bug. ***

Comment 5 Rahul Hinduja 2016-02-03 07:43:20 UTC
I see split-brain error messages in the geo-replication slave mount logs but info split-brain do not list any entries. Something similar to what is mentioned in BZ: 1282378 which is marked dup of this bug, hence updating here.

# Scenario: Create 10k files, create 10k hardlinks and do remove rm -rf * . No bricks were down
# Able to hit occasionally 

Build: glusterfs-3.7.5-18.el7rhgs.x86_64


[2016-02-02 19:18:52.497158] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-slave-replicate-0: Failing SETATTR on gfid 00000000-0000-0000-0000-000000000000: split-brain observed. [Input/output error]
[2016-02-02 19:18:52.498969] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-slave-replicate-3: Failing SETATTR on gfid 00000000-0000-0000-0000-000000000000: split-brain observed. [Input/output error]
^C
[root@dhcp37-52 geo-replication-slaves]# gluster volume heal slave info split-brain
Brick 10.70.37.52:/rhs/brick1/brick0
Number of entries in split-brain: 0

Brick 10.70.37.102:/rhs/brick1/brick1
Number of entries in split-brain: 0

Brick 10.70.37.56:/rhs/brick1/brick2
Number of entries in split-brain: 0

Brick 10.70.37.220:/rhs/brick1/brick3
Number of entries in split-brain: 0

Brick 10.70.37.182:/rhs/brick1/brick4
Number of entries in split-brain: 0

Brick 10.70.37.42:/rhs/brick1/brick5
Number of entries in split-brain: 0

Brick 10.70.37.52:/rhs/brick2/brick6
Number of entries in split-brain: 0

Brick 10.70.37.102:/rhs/brick2/brick7
Number of entries in split-brain: 0

Brick 10.70.37.56:/rhs/brick2/brick8
Number of entries in split-brain: 0

Brick 10.70.37.220:/rhs/brick2/brick9
Number of entries in split-brain: 0

Brick 10.70.37.182:/rhs/brick2/brick10
Number of entries in split-brain: 0

Brick 10.70.37.42:/rhs/brick2/brick11
Number of entries in split-brain: 0

[root@dhcp37-52 geo-replication-slaves]# 



[2016-02-03 01:58:58.317795] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-slave-replicate-2: Failing SETATTR on gfid d9ace461-1d80-4c22-aae9-8dbcdd6d715a: split-brain observed. [Input/output error]
[2016-02-03 01:58:58.317972] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-slave-replicate-5: Failing SETATTR on gfid d9ace461-1d80-4c22-aae9-8dbcdd6d715a: split-brain observed. [Input/output error]

Comment 6 Raghavendra G 2016-07-04 09:17:58 UTC
Can it be a duplicate of bz 1325760?

Comment 7 Nithya Balachandran 2017-12-22 08:10:57 UTC
Moving this to the AFR team to comment.


Note You need to log in before you can comment on or make changes to this bug.