Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 836101

Summary: Reoccuring unhealable split-brain
Product: [Community] GlusterFS Reporter: Johannes Martin <jmartin>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED INSUFFICIENT_DATA QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: gluster-bugs, jdarcy, vbellur
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-22 11:31:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Johannes Martin 2012-06-28 06:35:21 UTC
Description of problem:
I have various character devices files stored on a glusterfs volume. For some reason, after upgrading from 3.1.2 to 3.3.0, the file got into a split brain condition. Deleting one of the replicas does not resolve the split brain condition.

Version-Release number of selected component (if applicable):
glusterfs 3.3.0 built on Jun 24 2012 22:48:03
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>


How reproducible:
Not sure.

Steps to Reproduce:
1. on client: ls /server/gluster/vz/var-lib-vz/private/6003/dev/ttyp2
--> ls: cannot access /server/gluster/vz/var-lib-vz/private/6003/dev/ttyp2: Input/output error
2. on server: rm /media/gluster/brick0/vz/var-lib-vz/private/6003/dev/ttyp2
3. on client: ls /server/gluster/vz/var-lib-vz/private/6003/dev/ttyp2
crw-rw-rw- 1 root tty 3, 2 Dec 10  2008 /server/gluster/vz/var-lib-vz/private/6003/dev/ttyp2
4. on client: ls /server/gluster/vz/var-lib-vz/private/6003/dev/ttyp2
ls: cannot access /server/gluster/vz/var-lib-vz/private/6003/dev/ttyp2: Input/output error
  
Actual results:
I/O error because of split brain condition.

Expected results:
Split brain healed.

Additional info:
Excerpt from client log file:
[2012-06-28 08:31:31.854210] E [afr-self-heal-common.c:1087:afr_sh_common_lookup_resp_handler] 0-vz-replicate-0: path /var-lib-vz/private/6003/dev/ttyp2 on subvolume vz-client-1 => -1 (No such file or directory)
[2012-06-28 08:31:31.856216] E [afr-self-heal-metadata.c:481:afr_sh_metadata_fix] 0-vz-replicate-0: Unable to self-heal permissions/ownership of '/var-lib-vz/private/6003/dev/ttyp2' (possible split-brain). Please fix the file on all backend volumes
[2012-06-28 08:31:31.856528] E [afr-self-heal-common.c:2156:afr_self_heal_completion_cbk] 0-vz-replicate-0: background  meta-data data entry missing-entry gfid self-heal failed on /var-lib-vz/private/6003/dev/ttyp2
[2012-06-28 08:31:33.859259] W [afr-self-heal-data.c:831:afr_lookup_select_read_child_by_txn_type] 0-vz-replicate-0: /var-lib-vz/private/6003/dev/ttyp2: Possible split-brain


Running 
getfattr -d -m trusted.gfid -e hex /media/gluster/brick0/vz/var-lib-vz/private/6003/dev/ttyp2
on either server yields no result (so solution from https://bugzilla.redhat.com/show_bug.cgi?id=825559 canot be applied here.

Comment 1 Pranith Kumar K 2012-07-02 10:32:01 UTC

*** This bug has been marked as a duplicate of bug 832305 ***

Comment 2 Johannes Martin 2012-09-06 09:26:22 UTC
(In reply to comment #1)
> 
> *** This bug has been marked as a duplicate of bug 832305 ***

I don't think this bug is really a duplicate of bug 832305.

I applied the patch from 832305, deleted the inaccessible files on one brick, and checked that they were accessible again. They were indeed recreated on the brick where I had deleted them, and the files were accessible through the glusterfs mount.

A couple hours later, the rsync process that syncs some other non-glusterfs mount to the glusterfs-mount reported errors again, and the files were again inaccessible.

Comment 3 Pranith Kumar K 2012-09-06 09:41:11 UTC
Could you please provide a test case to re-create the issue on our setup.

Comment 4 Pranith Kumar K 2012-09-23 03:26:35 UTC
Johannes,
    Any updates on the test-case to re-create the problem?

Thanks in advance for you help
Pranith

Comment 5 Johannes Martin 2012-09-28 05:14:47 UTC
Sorry for taking so long to get back to you. I'm currently in the process of upgrading the OS on the server (from Debian Lenny to Squeeze) and will then recreate the gluster shares from scratch and try to reproduce the problem.

Comment 6 Vijay Bellur 2012-12-11 05:30:53 UTC
Any luck with re-creating the problem?

Comment 7 Johannes Martin 2012-12-17 08:56:55 UTC
Sorry, I haven't had any time to work on this again. Maybe early next year.

Comment 8 Pranith Kumar K 2013-02-22 11:31:20 UTC
Please feel free to re-open the bug with the data requested.

Comment 9 Johannes Martin 2013-02-25 11:02:52 UTC
Sorry again for the slow response. 

I recreated the shares about three weeks ago and I've been running the rsync that originally led to the split brain daily so far without any problems. So I assume the problem is solved now.

Maybe there was some problem with the migration from pre 3.3.0 glusterfs to 3.3.0 that led to the permanent split brain.

Comment 10 Pranith Kumar K 2013-02-25 11:07:42 UTC
Johannes,
    Thanks for the response. We shall keep the bug closed for now.

Pranith.