Split brain happened when one of the server was brought down and up. Situation: When one of the servers was brought down afr had done xattrop on the extended attribute of the file corresponding to that server and the xattr corresponding to other server on that file was still pending. But by the time xattrop reached there it was brought down (thus that server will think that operation is pending on other subvolume), and the further changes to the file on the mount point happened only on the server which is up which modified its xattrs on the file indicating that the operations are pending on other subvolume. When the down server is brought up, leads split brain situation. [2011-08-22 15:13:23.132624] I [afr-inode-write.c:340:afr_trigger_open_fd_self_heal] 0-mirror-replicate-0: data missing-entry gfid self-heal triggered. path: /passwd, reason: Replicate up down flush, data lock is held [2011-08-22 15:13:23.154583] I [afr-common.c:1225:afr_launch_self_heal] 0-mirror-replicate-0: background data missing-entry gfid self-heal triggered. path: /passwd [2011-08-22 15:13:23.461168] I [afr-self-heal-common.c:1210:afr_sh_missing_entries_lookup_done] 0-mirror-replicate-0: No sources for dir of /passwd, in missing entry self-heal, continuing with the rest of the self-heals [2011-08-22 15:13:23.499398] E [afr-self-heal-data.c:683:afr_sh_data_fix] 0-mirror-replicate-0: Unable to self-heal contents of '/passwd' (possible split-brain). Please delete the file from all but the preferred subvolume. [2011-08-22 15:13:23.499541] E [afr-self-heal-common.c:2019:afr_self_heal_completion_cbk] 0-mirror-replicate-0: background data missing-entry gfid self-heal failed on /passwd [2011-08-22 15:13:24.160815] W [fuse-bridge.c:184:fuse_entry_cbk] 0-glusterfs-fuse: 5324247: LOOKUP() /passwd => -1 (Input/output error) [2011-08-22 15:13:24.161618] W [fuse-bridge.c:184:fuse_entry_cbk] 0-glusterfs-fuse: 5324252: LOOKUP() /passwd => -1 (Input/output error) [2011-08-22 15:15:39.128152] W [fuse-bridge.c:184:fuse_entry_cbk] 0-glusterfs-fuse: 5338631: LOOKUP() /passwd => -1 (Input/output error) [2011-08-22 15:15:39.136005] W [fuse-bridge.c:184:fuse_entry_cbk] 0-glusterfs-fuse: 5338633: LOOKUP() /passwd => -1 (Input/output error) [2011-08-22 15:15:39.215522] W [fuse-bridge.c:184:fuse_entry_cbk] 0-glusterfs-fuse: 5338634: LOOKUP() /passwd => -1 (Input/output error) [2011-08-22 15:15:40.264977] W [fuse-bridge.c:184:fuse_entry_cbk] 0-glusterfs-fuse: 5338643: LOOKUP() /passwd => -1 (Input/output error) [2011-08-22 15:15:40.313234] W [fuse-bridge.c:184:fuse_entry_cbk] 0-glusterfs-fuse: 5338661: LOOKUP() /passwd => -1 (Input/output error) [2011-08-22 15:15:40.646996] W [fuse-bridge.c:184:fuse_entry_cbk] 0-glusterfs-fuse: 5338711: LOOKUP() /passwd => -1 (Input/output error)
For this bug to hit, the brick process needs to die exactly after the change-log for itself is decremented to zero and before the pending change-log on the other-subvolume is not decremented to zero. This is a corner case. It is very difficult to hit this case. It is not a blocker.
Agree and removing blocker flag as discussed.
CHANGE: http://review.gluster.com/3149 (cluster/afr: increment change log with correct byte order) merged in master by Vijay Bellur (vijay)
Not a blocker. Removing blocker flag.
CHANGE: http://review.gluster.com/3226 (cluster/afr: Enforce order in pre/post op) merged in master by Anand Avati (avati)
Checked with glusterfs-3.3.0qa43. Brought down the brick many times while tests were running on the mount point, and the bug did not occur.