Bug 787653 - Split brain upon runningsanity
Summary: Split brain upon runningsanity
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-02-06 12:30 UTC by shylesh
Modified: 2015-12-01 16:45 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:20:20 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: 3.3.0qa41
Embargoed:


Attachments (Terms of Use)
split brain (281.83 KB, application/x-gzip)
2012-02-06 12:30 UTC, shylesh
no flags Details

Description shylesh 2012-02-06 12:30:23 UTC
Created attachment 559625 [details]
split brain

Description of problem:
Mount logs are showing split brain messages while running sanity for stripe-replicate fuse mount

Version-Release number of selected component (if applicable):
Mainline

How reproducible:


Steps to Reproduce:
1. created a 2x2 stripe-replicate volume
2. Ran the sanity

  
Actual results:
After some time mount point said "transport end point not connected". tests failed

Expected results:


Additional info:
Attached the logs
===================

[2012-02-06 04:49:42.811130] I [afr-self-heal-common.c:2022:afr_self_heal_completion_cbk] 0-stripe-rep-replicate-0: background  entry self-heal completed on /run27521/p8/d1
[2012-02-06 04:49:42.813440] W [client3_1-fops.c:554:client3_1_rmdir_cbk] 0-stripe-rep-client-2: remote operation failed: Directory not empty
...skipping...
[2012-02-06 04:49:50.908865] I [afr-self-heal-common.c:908:afr_sh_missing_entries_done] 0-stripe-rep-replicate-1: split brain found, aborting selfheal of /run27521/pc/d2
[2012-02-06 04:49:50.908884] E [afr-self-heal-common.c:2019:afr_self_heal_completion_cbk] 0-stripe-rep-replicate-1: background  gfid self-heal failed on /run27521/pc/d2
[2012-02-06 04:49:50.909437] W [client3_1-fops.c:2287:client3_1_lookup_cbk] 0-stripe-rep-client-0: remote operation failed: Invalid argument. Path: /run27521/pc/d2/c4
[2012-02-06 04:49:50.909456] E [afr-self-heal-common.c:998:afr_sh_common_lookup_resp_handler] 0-stripe-rep-replicate-0: path /run27521/pc/d2/c4 on subvolume stripe-rep-client-0 => -1 (Invalid argument)
[2012-02-06 04:49:50.909476] W [client3_1-fops.c:2287:client3_1_lookup_cbk] 0-stripe-rep-client-1: remote operation failed: Invalid argument. Path: /run27521/pc/d2/c4
[2012-02-06 04:49:50.909487] E [afr-self-heal-common.c:998:afr_sh_common_lookup_resp_handler] 0-stripe-rep-replicate-0: path /run27521/pc/d2/c4 on subvolume stripe-rep-client-1 => -1 (Invalid argument)
[2012-02-06 04:49:50.909496] E [afr-self-heal-common.c:1275:afr_sh_common_lookup_cbk] 0-stripe-rep-replicate-0: Failed to lookup /run27521/pc/d2/c4, reason Invalid argument
[2012-02-06 04:49:50.910145] W [client3_1-fops.c:2287:client3_1_lookup_cbk] 0-stripe-rep-client-1: remote operation failed: Invalid argument. Path: /run27521/pc/d2/c4
[2012-02-06 04:49:50.910164] E [afr-self-heal-common.c:998:afr_sh_common_lookup_resp_handler] 0-stripe-rep-replicate-0: path /run27521/pc/d2/c4 on subvolume stripe-rep-client-1 => -1 (Invalid argument)
[2012-02-06 04:49:50.910185] W [client3_1-fops.c:2287:client3_1_lookup_cbk] 0-stripe-rep-client-0: remote operation failed: Invalid argument. Path: /run27521/pc/d2/c4
[2012-02-06 04:49:50.910196] E [afr-self-heal-common.c:998:afr_sh_common_lookup_resp_handler] 0-stripe-rep-replicate-0: path /run27521/pc/d2/c4 on subvolume stripe-rep-client-0 => -1 (Invalid argument)
[2012-02-06 04:49:50.910205] E [afr-self-heal-common.c:1275:afr_sh_common_lookup_cbk] 0-stripe-rep-replicate-0: Failed to lookup /run27521/pc/d2/c4, reason Invalid argument
[2012-02-06 04:49:50.910714] E [afr-self-heal-common.c:2019:afr_self_heal_completion_cbk] 0-stripe-rep-replicate-0: background  entry self-heal failed on /run27521/pc/d2
[2012-02-06 04:49:50.910756] W [fuse-bridge.c:271:fuse_entry_cbk] 0-glusterfs-fuse: 143839: LOOKUP() /run27521/pc/d2 => -1 (No data available)

Comment 1 Pranith Kumar K 2012-03-01 12:27:37 UTC
It is not a split-brain. The log needs to be fixed.

Comment 2 Anand Avati 2012-03-29 17:02:00 UTC
CHANGE: http://review.gluster.com/3039 (cluster/afr: Fix the split-brain log) merged in master by Vijay Bellur (vijay)

Comment 3 Anand Avati 2012-03-29 17:23:32 UTC
CHANGE: http://review.gluster.com/3041 (cluster/afr: Fix split-brain log) merged in master by Vijay Bellur (vijay)

Comment 4 shylesh 2012-05-18 10:30:14 UTC
No log messages are seen now on 3.3.0qa41


Note You need to log in before you can comment on or make changes to this bug.