Description of problem: On a tiered volume, 2x2 cold tier and 2x3 dis-rep hot tier, with IO operations such as new file creation and file renames, when one brick from each cold and hot tier was taken down, the following error messages were observed. [2015-12-28 05:16:45.845145] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-bv-1291560-replicate-3: Failing SETATTR on gfid 0c53a08c-1e36-4484-a5ec-e4e430717d49: split-brain observed. [Input/output error] - There were no disruption to ongoing IOs or file renames. - md5sum of the file from the mountpoint and from all bricks (that are up) returns the same value, which means there is no data corruption or split brain. - New writes to the file which reports split brain errors are successful sosreports will be attached shortly Version-Release number of selected component (if applicable): glusterfs-3.7.5-13.el7rhgs.x86_64 How reproducible: Occasionally Steps to Reproduce: 1. Create 2x2 dis-rep cold-tier and 2x3 dis-rep hot-tier volume. Start and fuse mount the volume. 2. create files and rename few files after a while 3. while rename is in progress, kill one of the brick process on hot tier and cold tier 4. observe client logs Actual results: Error messages with 'split brain observed' are seen on the client logs Expected results: Need to identify if there is actually a split brain, (from my observation there is no stale data). If this is a false error, No such error messages should be seen. Additional info: vol info o/p on the volume under test. [root@dhcp43-19 fb]# gluster vol info bv-1291560 Volume Name: bv-1291560 Type: Tier Volume ID: 52752a44-fdcc-4704-a76a-f2f2f64c1d2f Status: Started Number of Bricks: 10 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 3 = 6 Brick1: 10.70.37.121:/rhs/brick13/leg2 Brick2: 10.70.37.140:/rhs/brick13/leg2 Brick3: 10.70.37.140:/rhs/brick12/leg2 Brick4: 10.70.37.77:/rhs/brick12/leg2 Brick5: 10.70.37.132:/rhs/brick12/leg2 Brick6: 10.70.37.121:/rhs/brick12/leg2 Cold Tier: Cold Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick7: 10.70.37.121:/rhs/brick11/leg1 Brick8: 10.70.37.132:/rhs/brick11/leg1 Brick9: 10.70.37.77:/rhs/brick11/leg1 Brick10: 10.70.37.140:/rhs/brick11/leg1 Options Reconfigured: cluster.tier-mode: cache features.ctr-enabled: on performance.readdir-ahead: on log snippet from client logs: [root@dhcp42-214 dd]# grep 'split-brain' /var/log/glusterfs/mnt.log [2015-12-28 05:13:57.980648] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-bv-1291560-replicate-2: Failing SETATTR on gfid fefb3c7a-595b-412e-ba7c-09a7699ec755: split-brain observed. [Input/output error] [2015-12-28 05:14:47.983932] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-bv-1291560-replicate-2: Failing SETATTR on gfid 95549e94-a9b3-4c76-bc9a-c3c579caa1ef: split-brain observed. [Input/output error] [2015-12-28 05:15:36.824196] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-bv-1291560-replicate-3: Failing SETATTR on gfid d2fe9bff-8949-4fa6-8c11-23520b1498a8: split-brain observed. [Input/output error] [2015-12-28 05:16:45.845145] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-bv-1291560-replicate-3: Failing SETATTR on gfid 0c53a08c-1e36-4484-a5ec-e4e430717d49: split-brain observed. [Input/output error] [2015-12-28 05:16:56.382201] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-bv-1291560-replicate-3: Failing SETATTR on gfid 93c3f0e8-17c5-4918-8c06-8ad3d3b370af: split-brain observed. [Input/output error] [2015-12-28 05:17:00.601881] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-bv-1291560-replicate-2: Failing SETATTR on gfid 8b6c2c7b-79ec-420e-9bea-596ef2223c09: split-brain observed. [Input/output error]
*** Bug 1282378 has been marked as a duplicate of this bug. ***
I see split-brain error messages in the geo-replication slave mount logs but info split-brain do not list any entries. Something similar to what is mentioned in BZ: 1282378 which is marked dup of this bug, hence updating here. # Scenario: Create 10k files, create 10k hardlinks and do remove rm -rf * . No bricks were down # Able to hit occasionally Build: glusterfs-3.7.5-18.el7rhgs.x86_64 [2016-02-02 19:18:52.497158] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-slave-replicate-0: Failing SETATTR on gfid 00000000-0000-0000-0000-000000000000: split-brain observed. [Input/output error] [2016-02-02 19:18:52.498969] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-slave-replicate-3: Failing SETATTR on gfid 00000000-0000-0000-0000-000000000000: split-brain observed. [Input/output error] ^C [root@dhcp37-52 geo-replication-slaves]# gluster volume heal slave info split-brain Brick 10.70.37.52:/rhs/brick1/brick0 Number of entries in split-brain: 0 Brick 10.70.37.102:/rhs/brick1/brick1 Number of entries in split-brain: 0 Brick 10.70.37.56:/rhs/brick1/brick2 Number of entries in split-brain: 0 Brick 10.70.37.220:/rhs/brick1/brick3 Number of entries in split-brain: 0 Brick 10.70.37.182:/rhs/brick1/brick4 Number of entries in split-brain: 0 Brick 10.70.37.42:/rhs/brick1/brick5 Number of entries in split-brain: 0 Brick 10.70.37.52:/rhs/brick2/brick6 Number of entries in split-brain: 0 Brick 10.70.37.102:/rhs/brick2/brick7 Number of entries in split-brain: 0 Brick 10.70.37.56:/rhs/brick2/brick8 Number of entries in split-brain: 0 Brick 10.70.37.220:/rhs/brick2/brick9 Number of entries in split-brain: 0 Brick 10.70.37.182:/rhs/brick2/brick10 Number of entries in split-brain: 0 Brick 10.70.37.42:/rhs/brick2/brick11 Number of entries in split-brain: 0 [root@dhcp37-52 geo-replication-slaves]# [2016-02-03 01:58:58.317795] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-slave-replicate-2: Failing SETATTR on gfid d9ace461-1d80-4c22-aae9-8dbcdd6d715a: split-brain observed. [Input/output error] [2016-02-03 01:58:58.317972] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-slave-replicate-5: Failing SETATTR on gfid d9ace461-1d80-4c22-aae9-8dbcdd6d715a: split-brain observed. [Input/output error]
Can it be a duplicate of bz 1325760?
Moving this to the AFR team to comment.