Bug 1127362 - [REBALANCE]: Network disconnects during rebalance causes split brains
Summary: [REBALANCE]: Network disconnects during rebalance causes split brains
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: 2.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Pranith Kumar K
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-06 18:27 UTC by shylesh
Modified: 2016-09-17 12:11 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-03 17:12:12 UTC
Embargoed:


Attachments (Terms of Use)

Description shylesh 2014-08-06 18:27:53 UTC
Description of problem:
While rebalance is in progress if network disconnects happens split brain occurs

Version-Release number of selected component (if applicable):
3.4.0.59rhs-1.1.toyota.hotfix.el6rhs.x86_64

How reproducible:
Tried once

Steps to Reproduce:
1.created a 40 brick distributed-replicate volume
2.did kernel untar on the mount point, calculate the are-equal checksum
3.did add-brick of a pair
4. start rebalance
5. while migration is in progress stop and start network service on some of the nodes
6. after network comes back glusterd will be dead on that node
7. start glusterd
8. In the meantime rebalance status on these nodes shows "failed"
9 once rebalance completes restart rebalance again so that migration happens even from the remaining nodes as well (which were down during the first run)
10. calculate the are-equal checksum and check mount logs

Actual results:
mount logs
=========
[2014-08-06 17:58:37.483774] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-18:  metadata self heal  failed,   on /
[2014-08-06 17:58:37.609166] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-2: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 0 ] ]
[2014-08-06 17:58:37.609391] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-6: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 0 ] ]
[2014-08-06 17:58:37.609536] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-10: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 2 0 ] ]
[2014-08-06 17:58:37.609669] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-8: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 0 ] ]
[2014-08-06 17:58:37.609781] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-12: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 0 ] ]
[2014-08-06 17:58:37.610013] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-3: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 0 ] ]
[2014-08-06 17:58:37.610274] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-7: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 0 ] ]
[2014-08-06 17:58:37.610757] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-4: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 0 ] ]
[2014-08-06 17:58:37.610905] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-6:  metadata self heal  failed,   on /
[2014-08-06 17:58:37.611015] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-10:  metadata self heal  failed,   on /
[2014-08-06 17:58:37.611147] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-8:  metadata self heal  failed,   on /
[2014-08-06 17:58:37.611216] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-12:  metadata self heal  failed,   on /
[2014-08-06 17:58:37.611324] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-16: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 0 ] ]
[2014-08-06 17:58:37.611440] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-3:  metadata self heal  failed,   on /
[2014-08-06 17:58:37.611562] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-18: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 2 0 ] ]
[2014-08-06 17:58:37.611660] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-7:  metadata self heal  failed,   on /
[2014-08-06 17:58:37.611830] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-5: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 0 ] ]
[2014-08-06 17:58:37.612035] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-2:  metadata self heal  failed,   on /
[2014-08-06 17:58:37.612171] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-16:  metadata self heal  failed,   on /
[2014-08-06 17:58:37.612279] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-18:  metadata self heal  failed,   on /
[2014-08-06 17:58:37.612394] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-13: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 2 0 ] ]
[2014-08-06 17:58:37.612536] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-14: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 0 ] ]
[2014-08-06 17:58:37.612666] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-15: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 2 0 ] ]
[2014-08-06 17:58:37.612774] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-4:  metadata self heal  failed,   on /
[2014-08-06 17:58:37.612871] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-19: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 0 ] ]
[2014-08-06 17:58:37.613019] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-5:  metadata self heal  failed,   on /
[2014-08-06 17:58:37.613127] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-20: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 0 ] ]
[2014-08-06 17:58:37.613253] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-13:  metadata self heal  failed,   on /
[2014-08-06 17:58:37.613343] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-14:  metadata self heal  failed,   on /
[2014-08-06 17:58:37.613438] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-15:  metadata self heal  failed,   on /
[2014-08-06 17:58:37.613544] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-19:  metadata self heal  failed,   on /
[2014-08-06 17:58:37.613743] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-20:  metadata self heal  failed,   on /




arequal-checksum mismatch
==========================
BEFORE REBALANCE
----------------
[root@localhost ~]# ./arequal-checksum /shylesh/

Entry counts
Regular files   : 30493
Directories     : 1879
Symbolic links  : 0
Other           : 0
Total           : 32372

Metadata checksums
Regular files   : 478cd9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : f371a4f37e79dc78780c5a6f2284407c
Directories     : 3034929077a144b
Symbolic links  : 0
Other           : 0
Total           : 887eb7b55b87884f



AFTER REBALANCE
==============
[root@localhost ~]# ./arequal-checksum /shylesh/

Entry counts
Regular files   : 30494
Directories     : 1879
Symbolic links  : 0
Other           : 0
Total           : 32373

Metadata checksums
Regular files   : ee85
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : bbe20a53de1782ef31cb2def2c646fe
Directories     : 36b674264157828
Symbolic links  : 0
Other           : 0
Total           : fbc9f539ab3246f8



In fact it shows increase in the number of files.

Uploading the rebalance and mount logs

Comment 2 shylesh 2014-08-06 19:05:35 UTC
found the duplicate enry

[root@localhost shylesh]# find linux-2.6.32.63 | grep linux-2.6.32.63/arch/arm/mach-u300/clock.h |xargs ls -l
-rw-rw-r-- 1 root root 1320 Jun 18 18:26 linux-2.6.32.63/arch/arm/mach-u300/clock.h
-rw-rw-r-- 1 root root 1320 Jun 18 18:26 linux-2.6.32.63/arch/arm/mach-u300/clock.h

this file appears twice

[root@localhost shylesh]# getfattr -n trusted.glusterfs.pathinfo linux-2.6.32.63/arch/arm/mach-u300/clock.h
# file: linux-2.6.32.63/arch/arm/mach-u300/clock.h
trusted.glusterfs.pathinfo="(<DISTRIBUTE:shylesh-dht> (<REPLICATE:shylesh-replicate-2> <POSIX(/brick/shylesh5):localhost.localdomain:/brick/shylesh5/linux-2.6.32.63/arch/arm/mach-u300/clock.h> <POSIX(/brick/shylesh4):localhost.localdomain:/brick/shylesh4/linux-2.6.32.63/arch/arm/mach-u300/clock.h>))"

Comment 4 shylesh 2014-08-06 19:22:20 UTC
I could see file present on more than one subvolume (more than one replica pair in this case) 

192.168.12.17
-rw-rw-r-- 2 root root 1320 Jun 18 18:26 /brick/shylesh4/linux-2.6.32.63/arch/arm/mach-u300/clock.h
192.168.12.18
-rw-rw-r-- 2 root root 1320 Jun 18 18:26 /brick/shylesh5/linux-2.6.32.63/arch/arm/mach-u300/clock.h
--
-rw-r--r--  1 root root 9635840 Jul 31 09:14 rpm.tar
192.168.12.73
-rw-rw-r-- 2 root root 1320 Jun 18 18:26 /brick/shylesh40/linux-2.6.32.63/arch/arm/mach-u300/clock.h
192.168.12.74
-rw-rw-r-- 2 root root 1320 Jun 18 18:26 /brick/shylesh41/linux-2.6.32.63/arch/arm/mach-u300/clock.h


Volume Name: shylesh
Type: Distributed-Replicate
Volume ID: 96c5814f-8da0-4fcd-ad32-07135e3aa527
Status: Started
Number of Bricks: 22 x 2 = 44
Transport-type: tcp
Bricks:
Brick1: 192.168.12.13:/brick/shylesh0
Brick2: 192.168.12.14:/brick/shylesh1
Brick3: 192.168.12.15:/brick/shylesh2
Brick4: 192.168.12.16:/brick/shylesh3
Brick5: 192.168.12.17:/brick/shylesh4
Brick6: 192.168.12.18:/brick/shylesh5
Brick7: 192.168.12.19:/brick/shylesh6
Brick8: 192.168.12.22:/brick/shylesh7
Brick9: 192.168.12.23:/brick/shylesh8
Brick10: 192.168.12.24:/brick/shylesh9
Brick11: 192.168.12.25:/brick/shylesh10
Brick12: 192.168.12.26:/brick/shylesh11
Brick13: 192.168.12.27:/brick/shylesh12
Brick14: 192.168.12.28:/brick/shylesh13
Brick15: 192.168.12.29:/brick/shylesh14
Brick16: 192.168.12.32:/brick/shylesh15
Brick17: 192.168.12.33:/brick/shylesh16
Brick18: 192.168.12.34:/brick/shylesh17
Brick19: 192.168.12.35:/brick/shylesh18
Brick20: 192.168.12.36:/brick/shylesh19
Brick21: 192.168.12.37:/brick/shylesh20
Brick22: 192.168.12.38:/brick/shylesh21
Brick23: 192.168.12.39:/brick/shylesh22
Brick24: 192.168.12.42:/brick/shylesh23
Brick25: 192.168.12.43:/brick/shylesh24
Brick26: 192.168.12.44:/brick/shylesh25
Brick27: 192.168.12.45:/brick/shylesh26
Brick28: 192.168.12.46:/brick/shylesh27
Brick29: 192.168.12.47:/brick/shylesh28
Brick30: 192.168.12.48:/brick/shylesh29
Brick31: 192.168.12.49:/brick/shylesh30
Brick32: 192.168.12.62:/brick/shylesh31
Brick33: 192.168.12.63:/brick/shylesh32
Brick34: 192.168.12.64:/brick/shylesh33
Brick35: 192.168.12.65:/brick/shylesh34
Brick36: 192.168.12.66:/brick/shylesh35
Brick37: 192.168.12.67:/brick/shylesh36
Brick38: 192.168.12.68:/brick/shylesh37
Brick39: 192.168.12.69:/brick/shylesh38
Brick40: 192.168.12.72:/brick/shylesh39
Brick41: 192.168.12.73:/brick/shylesh40
Brick42: 192.168.12.74:/brick/shylesh41
Brick43: 192.168.12.75:/brick/shylesh42
Brick44: 192.168.12.76:/brick/shylesh43

Comment 5 shylesh 2014-08-06 19:32:48 UTC
parent xattrs
===============
[root@gqas003 ~]# ssh root.12.17 'getfattr -d -m . -e hex /brick/shylesh4/linux-2.6.32.63/arch/arm/mach-u300'
getfattr: Removing leading '/' from absolute path names
# file: brick/shylesh4/linux-2.6.32.63/arch/arm/mach-u300
trusted.afr.shylesh-client-4=0x000000000000000000000000
trusted.afr.shylesh-client-5=0x000000000000000000000000
trusted.gfid=0x7fe698bdfe44486e841c49030d637733
trusted.glusterfs.dht=0x00000001000000001745d17422e8ba2d

 
[root@gqas003 ~]# ssh root.12.18 'getfattr -d -m . -e hex /brick/shylesh5/linux-2.6.32.63/arch/arm/mach-u300'
getfattr: Removing leading '/' from absolute path names
# file: brick/shylesh5/linux-2.6.32.63/arch/arm/mach-u300
trusted.afr.shylesh-client-4=0x000000000000000000000000
trusted.afr.shylesh-client-5=0x000000000000000000000000
trusted.gfid=0x7fe698bdfe44486e841c49030d637733
trusted.glusterfs.dht=0x00000001000000001745d17422e8ba2d



[root@gqas003 ~]# ssh root.12.73 'getfattr -d -m . -e hex /brick/shylesh40/linux-2.6.32.63/arch/arm/mach-u300'
# file: brick/shylesh40/linux-2.6.32.63/arch/arm/mach-u300
trusted.gfid=0x7fe698bdfe44486e841c49030d637733
trusted.glusterfs.dht=0x000000010000000022e8ba2e2e8ba2e7

getfattr: Removing leading '/' from absolute path names
[root@gqas003 ~]# ssh root.12.74 'getfattr -d -m . -e hex /brick/shylesh41/linux-2.6.32.63/arch/arm/mach-u300'
# file: brick/shylesh41/linux-2.6.32.63/arch/arm/mach-u300
trusted.gfid=0x7fe698bdfe44486e841c49030d637733
trusted.glusterfs.dht=0x000000010000000022e8ba2e2e8ba2e7

getfattr: Removing leading '/' from absolute path names





xattrs from the file
=======================
[root@gqas003 ~]# ssh root.12.74 'getfattr -d -m . -e hex /brick/shylesh41/linux-2.6.32.63/arch/arm/mach-u300/clock.h'hex /brgetfattr: Removing leading '/' from absolute path names
# file: brick/shylesh41/linux-2.6.32.63/arch/arm/mach-u300/clock.h
trusted.afr.shylesh-client-40=0x000000000000000000000000
trusted.afr.shylesh-client-41=0x000000000000000000000000
trusted.gfid=0x4b4afc0e2d524c24a5cfdf583ca5ee0b

 
[root@gqas003 ~]# ssh root.12.73 'getfattr -d -m . -e hex /brick/shylesh40/linux-2.6.32.63/arch/arm/mach-u300/clock.h'
# file: brick/shylesh40/linux-2.6.32.63/arch/arm/mach-u300/clock.h
trusted.afr.shylesh-client-40=0x000000000000000000000000
trusted.afr.shylesh-client-41=0x000000000000000000000000
trusted.gfid=0x4b4afc0e2d524c24a5cfdf583ca5ee0b



 [root@gqas003 ~]# ssh root.12.18 'getfattr -d -m . -e hex /brick/shylesh5/linux-2.6.32.63/arch/arm/mach-u300/clock.h'
getfattr: Removing leading '/' from absolute path names
# file: brick/shylesh5/linux-2.6.32.63/arch/arm/mach-u300/clock.h
trusted.afr.shylesh-client-4=0x000000000000000000000000
trusted.afr.shylesh-client-5=0x000000000000000000000000
trusted.gfid=0x4b4afc0e2d524c24a5cfdf583ca5ee0b


[root@gqas003 ~]# ssh root.12.17 'getfattr -d -m . -e hex /brick/shylesh4/linux-2.6.32.63/arch/arm/mach-u300/clock.h'
getfattr: Removing leading '/' from absolute path names
# file: brick/shylesh4/linux-2.6.32.63/arch/arm/mach-u300/clock.h
trusted.afr.shylesh-client-4=0x000000000000000000000000
trusted.afr.shylesh-client-5=0x000000000000000000000000
trusted.gfid=0x4b4afc0e2d524c24a5cfdf583ca5ee0b

Comment 6 Pranith Kumar K 2014-08-07 06:16:38 UTC
Shylesh,
     Do you still have the setup? I see the following entries to have gone into split-brain. Are they all in metadata split-brain? Could you also update the bug with the nodes you have taken down. Could you give more information about what nodes' network interface is taken down? In what order they are taken down and in what order they are brought back up?

    714 1-shylesh-replicate-10: '/'
    587 1-shylesh-replicate-10: '/linux-2.6.32.63'
     19 1-shylesh-replicate-10: '/linux-2.6.32.63/Documentation'
      3 1-shylesh-replicate-10: '/linux-2.6.32.63/Documentation/ABI'
      2 1-shylesh-replicate-10: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      2 1-shylesh-replicate-10: '/linux-2.6.32.63/Documentation/ABI/removed'
      3 1-shylesh-replicate-10: '/linux-2.6.32.63/Documentation/ABI/stable'
      2 1-shylesh-replicate-10: '/linux-2.6.32.63/Documentation/ABI/testing'
      4 1-shylesh-replicate-10: '/linux-2.6.32.63/Documentation/DocBook'
      2 1-shylesh-replicate-10: '/linux-2.6.32.63/Documentation/DocBook/dvb'
      2 1-shylesh-replicate-10: '/linux-2.6.32.63/Documentation/DocBook/v4l'
    714 1-shylesh-replicate-12: '/'
    587 1-shylesh-replicate-12: '/linux-2.6.32.63'
     18 1-shylesh-replicate-12: '/linux-2.6.32.63/Documentation'
      2 1-shylesh-replicate-12: '/linux-2.6.32.63/Documentation/ABI'
    714 1-shylesh-replicate-13: '/'
    587 1-shylesh-replicate-13: '/linux-2.6.32.63'
     18 1-shylesh-replicate-13: '/linux-2.6.32.63/Documentation'
      2 1-shylesh-replicate-13: '/linux-2.6.32.63/Documentation/ABI'
      2 1-shylesh-replicate-13: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      2 1-shylesh-replicate-13: '/linux-2.6.32.63/Documentation/ABI/removed'
      2 1-shylesh-replicate-13: '/linux-2.6.32.63/Documentation/ABI/stable'
      2 1-shylesh-replicate-13: '/linux-2.6.32.63/Documentation/ABI/testing'
    714 1-shylesh-replicate-14: '/'
    587 1-shylesh-replicate-14: '/linux-2.6.32.63'
    714 1-shylesh-replicate-15: '/'
    587 1-shylesh-replicate-15: '/linux-2.6.32.63'
     18 1-shylesh-replicate-15: '/linux-2.6.32.63/Documentation'
      2 1-shylesh-replicate-15: '/linux-2.6.32.63/Documentation/ABI'
      2 1-shylesh-replicate-15: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      2 1-shylesh-replicate-15: '/linux-2.6.32.63/Documentation/ABI/removed'
      2 1-shylesh-replicate-15: '/linux-2.6.32.63/Documentation/ABI/stable'
      2 1-shylesh-replicate-15: '/linux-2.6.32.63/Documentation/ABI/testing'
      4 1-shylesh-replicate-15: '/linux-2.6.32.63/Documentation/DocBook'
    714 1-shylesh-replicate-16: '/'
    587 1-shylesh-replicate-16: '/linux-2.6.32.63'
     18 1-shylesh-replicate-16: '/linux-2.6.32.63/Documentation'
      2 1-shylesh-replicate-16: '/linux-2.6.32.63/Documentation/ABI'
      2 1-shylesh-replicate-16: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      2 1-shylesh-replicate-16: '/linux-2.6.32.63/Documentation/ABI/removed'
      2 1-shylesh-replicate-16: '/linux-2.6.32.63/Documentation/ABI/stable'
      2 1-shylesh-replicate-16: '/linux-2.6.32.63/Documentation/ABI/testing'
      4 1-shylesh-replicate-16: '/linux-2.6.32.63/Documentation/DocBook'
      2 1-shylesh-replicate-16: '/linux-2.6.32.63/Documentation/DocBook/dvb'
      2 1-shylesh-replicate-16: '/linux-2.6.32.63/Documentation/DocBook/v4l'
    714 1-shylesh-replicate-18: '/'
    587 1-shylesh-replicate-18: '/linux-2.6.32.63'
     18 1-shylesh-replicate-18: '/linux-2.6.32.63/Documentation'
      2 1-shylesh-replicate-18: '/linux-2.6.32.63/Documentation/ABI'
      2 1-shylesh-replicate-18: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      2 1-shylesh-replicate-18: '/linux-2.6.32.63/Documentation/ABI/removed'
      2 1-shylesh-replicate-18: '/linux-2.6.32.63/Documentation/ABI/stable'
      3 1-shylesh-replicate-18: '/linux-2.6.32.63/Documentation/ABI/testing'
    714 1-shylesh-replicate-19: '/'
    587 1-shylesh-replicate-19: '/linux-2.6.32.63'
     18 1-shylesh-replicate-19: '/linux-2.6.32.63/Documentation'
    714 1-shylesh-replicate-2: '/'
    714 1-shylesh-replicate-20: '/'
    589 1-shylesh-replicate-20: '/linux-2.6.32.63'
     18 1-shylesh-replicate-20: '/linux-2.6.32.63/Documentation'
      2 1-shylesh-replicate-20: '/linux-2.6.32.63/Documentation/ABI'
      2 1-shylesh-replicate-20: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      2 1-shylesh-replicate-20: '/linux-2.6.32.63/Documentation/ABI/removed'
    587 1-shylesh-replicate-2: '/linux-2.6.32.63'
     18 1-shylesh-replicate-2: '/linux-2.6.32.63/Documentation'
      2 1-shylesh-replicate-2: '/linux-2.6.32.63/Documentation/ABI'
      2 1-shylesh-replicate-2: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      2 1-shylesh-replicate-2: '/linux-2.6.32.63/Documentation/ABI/removed'
      2 1-shylesh-replicate-2: '/linux-2.6.32.63/Documentation/ABI/stable'
      2 1-shylesh-replicate-2: '/linux-2.6.32.63/Documentation/ABI/testing'
      4 1-shylesh-replicate-2: '/linux-2.6.32.63/Documentation/DocBook'
    714 1-shylesh-replicate-3: '/'
    587 1-shylesh-replicate-3: '/linux-2.6.32.63'
     18 1-shylesh-replicate-3: '/linux-2.6.32.63/Documentation'
      2 1-shylesh-replicate-3: '/linux-2.6.32.63/Documentation/ABI'
      2 1-shylesh-replicate-3: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      2 1-shylesh-replicate-3: '/linux-2.6.32.63/Documentation/ABI/removed'
      2 1-shylesh-replicate-3: '/linux-2.6.32.63/Documentation/ABI/stable'
      2 1-shylesh-replicate-3: '/linux-2.6.32.63/Documentation/ABI/testing'
    715 1-shylesh-replicate-4: '/'
    587 1-shylesh-replicate-4: '/linux-2.6.32.63'
     18 1-shylesh-replicate-4: '/linux-2.6.32.63/Documentation'
      2 1-shylesh-replicate-4: '/linux-2.6.32.63/Documentation/ABI'
      2 1-shylesh-replicate-4: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      3 1-shylesh-replicate-4: '/linux-2.6.32.63/Documentation/ABI/removed'
      2 1-shylesh-replicate-4: '/linux-2.6.32.63/Documentation/ABI/stable'
      2 1-shylesh-replicate-4: '/linux-2.6.32.63/Documentation/ABI/testing'
      4 1-shylesh-replicate-4: '/linux-2.6.32.63/Documentation/DocBook'
    714 1-shylesh-replicate-5: '/'
    587 1-shylesh-replicate-5: '/linux-2.6.32.63'
     18 1-shylesh-replicate-5: '/linux-2.6.32.63/Documentation'
      2 1-shylesh-replicate-5: '/linux-2.6.32.63/Documentation/ABI'
      2 1-shylesh-replicate-5: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      2 1-shylesh-replicate-5: '/linux-2.6.32.63/Documentation/ABI/removed'
      2 1-shylesh-replicate-5: '/linux-2.6.32.63/Documentation/ABI/stable'
      2 1-shylesh-replicate-5: '/linux-2.6.32.63/Documentation/ABI/testing'
    714 1-shylesh-replicate-6: '/'
    587 1-shylesh-replicate-6: '/linux-2.6.32.63'
     18 1-shylesh-replicate-6: '/linux-2.6.32.63/Documentation'
      2 1-shylesh-replicate-6: '/linux-2.6.32.63/Documentation/ABI'
      2 1-shylesh-replicate-6: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      2 1-shylesh-replicate-6: '/linux-2.6.32.63/Documentation/ABI/removed'
      2 1-shylesh-replicate-6: '/linux-2.6.32.63/Documentation/ABI/stable'
      2 1-shylesh-replicate-6: '/linux-2.6.32.63/Documentation/ABI/testing'
      4 1-shylesh-replicate-6: '/linux-2.6.32.63/Documentation/DocBook'
      2 1-shylesh-replicate-6: '/linux-2.6.32.63/Documentation/DocBook/dvb'
    714 1-shylesh-replicate-7: '/'
    587 1-shylesh-replicate-7: '/linux-2.6.32.63'
     18 1-shylesh-replicate-7: '/linux-2.6.32.63/Documentation'
      2 1-shylesh-replicate-7: '/linux-2.6.32.63/Documentation/ABI'
      2 1-shylesh-replicate-7: '/linux-2.6.32.63/Documentation/ABI/obsolete'
    714 1-shylesh-replicate-8: '/'
    587 1-shylesh-replicate-8: '/linux-2.6.32.63'
     18 1-shylesh-replicate-8: '/linux-2.6.32.63/Documentation'
      2 1-shylesh-replicate-8: '/linux-2.6.32.63/Documentation/ABI'
      3 1-shylesh-replicate-8: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      2 1-shylesh-replicate-8: '/linux-2.6.32.63/Documentation/ABI/removed'
      2 1-shylesh-replicate-8: '/linux-2.6.32.63/Documentation/ABI/stable'
      2 1-shylesh-replicate-8: '/linux-2.6.32.63/Documentation/ABI/testing'
      4 1-shylesh-replicate-8: '/linux-2.6.32.63/Documentation/DocBook'
    925 2-shylesh-replicate-10: '/'
    677 2-shylesh-replicate-10: '/linux-2.6.32.63'
     20 2-shylesh-replicate-10: '/linux-2.6.32.63/Documentation'
      4 2-shylesh-replicate-10: '/linux-2.6.32.63/Documentation/ABI'
      3 2-shylesh-replicate-10: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      3 2-shylesh-replicate-10: '/linux-2.6.32.63/Documentation/ABI/removed'
      3 2-shylesh-replicate-10: '/linux-2.6.32.63/Documentation/ABI/stable'
      3 2-shylesh-replicate-10: '/linux-2.6.32.63/Documentation/ABI/testing'
      7 2-shylesh-replicate-10: '/linux-2.6.32.63/Documentation/DocBook'
      3 2-shylesh-replicate-10: '/linux-2.6.32.63/Documentation/DocBook/dvb'
      3 2-shylesh-replicate-10: '/linux-2.6.32.63/Documentation/DocBook/v4l'
    926 2-shylesh-replicate-12: '/'
    677 2-shylesh-replicate-12: '/linux-2.6.32.63'
     20 2-shylesh-replicate-12: '/linux-2.6.32.63/Documentation'
      4 2-shylesh-replicate-12: '/linux-2.6.32.63/Documentation/ABI'
    926 2-shylesh-replicate-13: '/'
    677 2-shylesh-replicate-13: '/linux-2.6.32.63'
     20 2-shylesh-replicate-13: '/linux-2.6.32.63/Documentation'
      4 2-shylesh-replicate-13: '/linux-2.6.32.63/Documentation/ABI'
      3 2-shylesh-replicate-13: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      3 2-shylesh-replicate-13: '/linux-2.6.32.63/Documentation/ABI/removed'
      3 2-shylesh-replicate-13: '/linux-2.6.32.63/Documentation/ABI/stable'
      3 2-shylesh-replicate-13: '/linux-2.6.32.63/Documentation/ABI/testing'
    926 2-shylesh-replicate-14: '/'
    677 2-shylesh-replicate-14: '/linux-2.6.32.63'
    925 2-shylesh-replicate-15: '/'
    677 2-shylesh-replicate-15: '/linux-2.6.32.63'
     20 2-shylesh-replicate-15: '/linux-2.6.32.63/Documentation'
      4 2-shylesh-replicate-15: '/linux-2.6.32.63/Documentation/ABI'
      3 2-shylesh-replicate-15: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      3 2-shylesh-replicate-15: '/linux-2.6.32.63/Documentation/ABI/removed'
      3 2-shylesh-replicate-15: '/linux-2.6.32.63/Documentation/ABI/stable'
      4 2-shylesh-replicate-15: '/linux-2.6.32.63/Documentation/ABI/testing'
      6 2-shylesh-replicate-15: '/linux-2.6.32.63/Documentation/DocBook'
    926 2-shylesh-replicate-16: '/'
    677 2-shylesh-replicate-16: '/linux-2.6.32.63'
     20 2-shylesh-replicate-16: '/linux-2.6.32.63/Documentation'
      4 2-shylesh-replicate-16: '/linux-2.6.32.63/Documentation/ABI'
      3 2-shylesh-replicate-16: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      3 2-shylesh-replicate-16: '/linux-2.6.32.63/Documentation/ABI/removed'
      3 2-shylesh-replicate-16: '/linux-2.6.32.63/Documentation/ABI/stable'
      3 2-shylesh-replicate-16: '/linux-2.6.32.63/Documentation/ABI/testing'
      6 2-shylesh-replicate-16: '/linux-2.6.32.63/Documentation/DocBook'
      3 2-shylesh-replicate-16: '/linux-2.6.32.63/Documentation/DocBook/dvb'
      3 2-shylesh-replicate-16: '/linux-2.6.32.63/Documentation/DocBook/v4l'
    926 2-shylesh-replicate-18: '/'
    679 2-shylesh-replicate-18: '/linux-2.6.32.63'
     20 2-shylesh-replicate-18: '/linux-2.6.32.63/Documentation'
      4 2-shylesh-replicate-18: '/linux-2.6.32.63/Documentation/ABI'
      3 2-shylesh-replicate-18: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      3 2-shylesh-replicate-18: '/linux-2.6.32.63/Documentation/ABI/removed'
      3 2-shylesh-replicate-18: '/linux-2.6.32.63/Documentation/ABI/stable'
      3 2-shylesh-replicate-18: '/linux-2.6.32.63/Documentation/ABI/testing'
    926 2-shylesh-replicate-19: '/'
    677 2-shylesh-replicate-19: '/linux-2.6.32.63'
     20 2-shylesh-replicate-19: '/linux-2.6.32.63/Documentation'
      4 2-shylesh-replicate-1: '/linux-2.6.32.63/arch/arm/mach-s3c6410/cpu.c'
    923 2-shylesh-replicate-2: '/'
    925 2-shylesh-replicate-20: '/'
    677 2-shylesh-replicate-20: '/linux-2.6.32.63'
     20 2-shylesh-replicate-20: '/linux-2.6.32.63/Documentation'
      4 2-shylesh-replicate-20: '/linux-2.6.32.63/Documentation/ABI'
      3 2-shylesh-replicate-20: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      3 2-shylesh-replicate-20: '/linux-2.6.32.63/Documentation/ABI/removed'
    677 2-shylesh-replicate-2: '/linux-2.6.32.63'
      4 2-shylesh-replicate-2: '/linux-2.6.32.63/arch/arm/mach-u300/Makefile'
      4 2-shylesh-replicate-2: '/linux-2.6.32.63/arch/arm/mach-u300/padmux.c'
      4 2-shylesh-replicate-2: '/linux-2.6.32.63/arch/arm/mach-u300/spi.h'
      4 2-shylesh-replicate-2: '/linux-2.6.32.63/arch/arm/plat-mxc/include/mach/iomux-mx27.h'
      4 2-shylesh-replicate-2: '/linux-2.6.32.63/arch/avr32/boards/atstk1000/setup.c'
     20 2-shylesh-replicate-2: '/linux-2.6.32.63/Documentation'
      4 2-shylesh-replicate-2: '/linux-2.6.32.63/Documentation/ABI'
      3 2-shylesh-replicate-2: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      3 2-shylesh-replicate-2: '/linux-2.6.32.63/Documentation/ABI/removed'
      3 2-shylesh-replicate-2: '/linux-2.6.32.63/Documentation/ABI/stable'
      3 2-shylesh-replicate-2: '/linux-2.6.32.63/Documentation/ABI/testing'
      6 2-shylesh-replicate-2: '/linux-2.6.32.63/Documentation/DocBook'
    926 2-shylesh-replicate-3: '/'
    677 2-shylesh-replicate-3: '/linux-2.6.32.63'
     20 2-shylesh-replicate-3: '/linux-2.6.32.63/Documentation'
      4 2-shylesh-replicate-3: '/linux-2.6.32.63/Documentation/ABI'
      3 2-shylesh-replicate-3: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      3 2-shylesh-replicate-3: '/linux-2.6.32.63/Documentation/ABI/removed'
      3 2-shylesh-replicate-3: '/linux-2.6.32.63/Documentation/ABI/stable'
      3 2-shylesh-replicate-3: '/linux-2.6.32.63/Documentation/ABI/testing'
    926 2-shylesh-replicate-4: '/'
    677 2-shylesh-replicate-4: '/linux-2.6.32.63'
     20 2-shylesh-replicate-4: '/linux-2.6.32.63/Documentation'
      4 2-shylesh-replicate-4: '/linux-2.6.32.63/Documentation/ABI'
      4 2-shylesh-replicate-4: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      3 2-shylesh-replicate-4: '/linux-2.6.32.63/Documentation/ABI/removed'
      3 2-shylesh-replicate-4: '/linux-2.6.32.63/Documentation/ABI/stable'
      3 2-shylesh-replicate-4: '/linux-2.6.32.63/Documentation/ABI/testing'
      6 2-shylesh-replicate-4: '/linux-2.6.32.63/Documentation/DocBook'
    926 2-shylesh-replicate-5: '/'
    677 2-shylesh-replicate-5: '/linux-2.6.32.63'
     20 2-shylesh-replicate-5: '/linux-2.6.32.63/Documentation'
      4 2-shylesh-replicate-5: '/linux-2.6.32.63/Documentation/ABI'
      3 2-shylesh-replicate-5: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      3 2-shylesh-replicate-5: '/linux-2.6.32.63/Documentation/ABI/removed'
      3 2-shylesh-replicate-5: '/linux-2.6.32.63/Documentation/ABI/stable'
      3 2-shylesh-replicate-5: '/linux-2.6.32.63/Documentation/ABI/testing'
    926 2-shylesh-replicate-6: '/'
    677 2-shylesh-replicate-6: '/linux-2.6.32.63'
     20 2-shylesh-replicate-6: '/linux-2.6.32.63/Documentation'
      4 2-shylesh-replicate-6: '/linux-2.6.32.63/Documentation/ABI'
      3 2-shylesh-replicate-6: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      3 2-shylesh-replicate-6: '/linux-2.6.32.63/Documentation/ABI/removed'
      3 2-shylesh-replicate-6: '/linux-2.6.32.63/Documentation/ABI/stable'
      3 2-shylesh-replicate-6: '/linux-2.6.32.63/Documentation/ABI/testing'
      6 2-shylesh-replicate-6: '/linux-2.6.32.63/Documentation/DocBook'
      3 2-shylesh-replicate-6: '/linux-2.6.32.63/Documentation/DocBook/dvb'
    926 2-shylesh-replicate-7: '/'
    677 2-shylesh-replicate-7: '/linux-2.6.32.63'
     20 2-shylesh-replicate-7: '/linux-2.6.32.63/Documentation'
      4 2-shylesh-replicate-7: '/linux-2.6.32.63/Documentation/ABI'
      3 2-shylesh-replicate-7: '/linux-2.6.32.63/Documentation/ABI/obsolete'
    926 2-shylesh-replicate-8: '/'
    677 2-shylesh-replicate-8: '/linux-2.6.32.63'
     20 2-shylesh-replicate-8: '/linux-2.6.32.63/Documentation'
      4 2-shylesh-replicate-8: '/linux-2.6.32.63/Documentation/ABI'
      3 2-shylesh-replicate-8: '/linux-2.6.32.63/Documentation/ABI/obsolete'
      3 2-shylesh-replicate-8: '/linux-2.6.32.63/Documentation/ABI/removed'
      4 2-shylesh-replicate-8: '/linux-2.6.32.63/Documentation/ABI/stable'
      3 2-shylesh-replicate-8: '/linux-2.6.32.63/Documentation/ABI/testing'
      6 2-shylesh-replicate-8: '/linux-2.6.32.63/Documentation/DocBook'

Comment 7 shylesh 2014-08-07 06:54:49 UTC
192.168.12.14, 192.168.12.16, 192.168.12.18 are the nodes on which network was brought down. I don't remember exact order in which nodes were brough back.
setup is partially available because somebody accidentally screwed my hypervisor.


splite brain logs
===================
[2014-08-06 17:58:06.854872] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-16:  metadata sel
f heal  failed,   on /
[2014-08-06 17:58:06.854972] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-15:  metadata sel
f heal  failed,   on /
[2014-08-06 17:58:06.855053] E [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 2-shylesh-replicate-18:  metadata sel
f heal  failed,   on /
[2014-08-06 17:58:06.855148] E [afr-self-heal-common.c:2906:afr_


[2014-08-06 17:58:05.805057] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-6: Unable to self-heal co
ntents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 0 ]
 ]
[2014-08-06 17:58:05.805202] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-7: Unable to self-heal co
ntents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 0 ]
 ]
[2014-08-06 17:58:05.805416] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 2-shylesh-replicate-3: Unable to self-heal co
ntents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 0 ]
 ]

Comment 8 Pranith Kumar K 2014-08-07 10:20:44 UTC
Shylesh,
     Is there at least one entry from the list I gave where you can see what is the kind of split-brain?

Pranith

Comment 9 Susant Kumar Palai 2015-09-28 12:25:08 UTC
Changing the component to AFR as there is "split brain" in bug description.

Comment 10 Susant Kumar Palai 2015-11-27 11:34:34 UTC
Component is gluster-afr, so removing zteam from devel whiteboard.

Comment 11 Vivek Agarwal 2015-12-03 17:12:12 UTC
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.


Note You need to log in before you can comment on or make changes to this bug.