Red Hat Bugzilla – Bug 836101
Reoccuring unhealable split-brain
Last modified: 2013-02-25 06:07:42 EST
Description of problem:
I have various character devices files stored on a glusterfs volume. For some reason, after upgrading from 3.1.2 to 3.3.0, the file got into a split brain condition. Deleting one of the replicas does not resolve the split brain condition.
Version-Release number of selected component (if applicable):
glusterfs 3.3.0 built on Jun 24 2012 22:48:03
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
Steps to Reproduce:
1. on client: ls /server/gluster/vz/var-lib-vz/private/6003/dev/ttyp2
--> ls: cannot access /server/gluster/vz/var-lib-vz/private/6003/dev/ttyp2: Input/output error
2. on server: rm /media/gluster/brick0/vz/var-lib-vz/private/6003/dev/ttyp2
3. on client: ls /server/gluster/vz/var-lib-vz/private/6003/dev/ttyp2
crw-rw-rw- 1 root tty 3, 2 Dec 10 2008 /server/gluster/vz/var-lib-vz/private/6003/dev/ttyp2
4. on client: ls /server/gluster/vz/var-lib-vz/private/6003/dev/ttyp2
ls: cannot access /server/gluster/vz/var-lib-vz/private/6003/dev/ttyp2: Input/output error
I/O error because of split brain condition.
Split brain healed.
Excerpt from client log file:
[2012-06-28 08:31:31.854210] E [afr-self-heal-common.c:1087:afr_sh_common_lookup_resp_handler] 0-vz-replicate-0: path /var-lib-vz/private/6003/dev/ttyp2 on subvolume vz-client-1 => -1 (No such file or directory)
[2012-06-28 08:31:31.856216] E [afr-self-heal-metadata.c:481:afr_sh_metadata_fix] 0-vz-replicate-0: Unable to self-heal permissions/ownership of '/var-lib-vz/private/6003/dev/ttyp2' (possible split-brain). Please fix the file on all backend volumes
[2012-06-28 08:31:31.856528] E [afr-self-heal-common.c:2156:afr_self_heal_completion_cbk] 0-vz-replicate-0: background meta-data data entry missing-entry gfid self-heal failed on /var-lib-vz/private/6003/dev/ttyp2
[2012-06-28 08:31:33.859259] W [afr-self-heal-data.c:831:afr_lookup_select_read_child_by_txn_type] 0-vz-replicate-0: /var-lib-vz/private/6003/dev/ttyp2: Possible split-brain
getfattr -d -m trusted.gfid -e hex /media/gluster/brick0/vz/var-lib-vz/private/6003/dev/ttyp2
on either server yields no result (so solution from https://bugzilla.redhat.com/show_bug.cgi?id=825559 canot be applied here.
*** This bug has been marked as a duplicate of bug 832305 ***
(In reply to comment #1)
> *** This bug has been marked as a duplicate of bug 832305 ***
I don't think this bug is really a duplicate of bug 832305.
I applied the patch from 832305, deleted the inaccessible files on one brick, and checked that they were accessible again. They were indeed recreated on the brick where I had deleted them, and the files were accessible through the glusterfs mount.
A couple hours later, the rsync process that syncs some other non-glusterfs mount to the glusterfs-mount reported errors again, and the files were again inaccessible.
Could you please provide a test case to re-create the issue on our setup.
Any updates on the test-case to re-create the problem?
Thanks in advance for you help
Sorry for taking so long to get back to you. I'm currently in the process of upgrading the OS on the server (from Debian Lenny to Squeeze) and will then recreate the gluster shares from scratch and try to reproduce the problem.
Any luck with re-creating the problem?
Sorry, I haven't had any time to work on this again. Maybe early next year.
Please feel free to re-open the bug with the data requested.
Sorry again for the slow response.
I recreated the shares about three weeks ago and I've been running the rsync that originally led to the split brain daily so far without any problems. So I assume the problem is solved now.
Maybe there was some problem with the migration from pre 3.3.0 glusterfs to 3.3.0 that led to the permanent split brain.
Thanks for the response. We shall keep the bug closed for now.