Bug 1119774

Summary: background meta-data data missing-entry self-heal failed
Product: [Community] GlusterFS Reporter: Hitesh <hitzimpossible>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED EOL QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.4.2CC: bugs, gluster-bugs, pkarampu
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-07 13:50:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Hitesh 2014-07-15 13:23:10 UTC
Description of problem:
Heal failed for a particular file when mate is coming up for the first time. 

Version-Release number of selected component (if applicable):
Gluster 3.4.2

How reproducible:
Intermittent

Steps to Reproduce:
1. Created replicated volumes in two bricks. 

Actual results:
Syncing stuck with below mentioned error. 
[2014-07-08 09:14:07.892573] W [afr-common.c:1505:afr_conflicting_iattrs] 0-_tftpboot-replicate-0: /cnp/atcaf125/0-0-6-0/boarddata/etc/fstab: gfid differs on subvolume 0 
[2014-07-08 09:14:07.892983] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-_tftpboot-replicate-0: background  meta-data data missing-entry self-heal failed on /cnp/atcaf125/0-0-6-0/boarddata/etc/fstab 

Also the checksum is different for the file which is giving error. 
root@unit1:/.krfs/_tftpboot/brick> md5sum cnp/atcaf125/0-0-6-0/boarddata/etc/fstab 
bc2c08accdc816547b35e86dae145d59  cnp/atcaf125/0-0-6-0/boarddata/etc/fstab 
root@unit1:/.krfs/_tftpboot/brick> 

root@unit0:/.krfs/_tftpboot/brick> md5sum cnp/atcaf125/0-0-6-0/boarddata/etc/fstab 
32c3c7925e7046d4565d61fa2cd15313  cnp/atcaf125/0-0-6-0/boarddata/etc/fstab 
root@unit0:/.krfs/_tftpboot/brick> 


Expected results:
Should sync without any error. 

Additional info:
Not sure if it is the same problem which is fixed with bug id 830665.

Comment 1 Pranith Kumar K 2014-07-16 05:16:00 UTC
Seems like there is a gfid-mismatch i.e there is an entry split-brain. i.e. same name with different gfids. Could you please follow instructions at https://github.com/gluster/glusterfs/blob/master/doc/split-brain.md to fix the issue. What the setup you have is entry-split-brain which is the final topic in that document. It would be nice to figure out how the system ended up in this situation. Could you please let us know what lead to this situation in your setup?

Comment 2 Hitesh 2014-07-16 10:09:17 UTC
Currently the lab is recovered but I had checked split-brain condition on the lab. There wasn't any split-brain entry.

Comment 3 Pranith Kumar K 2014-07-16 10:29:16 UTC
These are different kind of split-brains:
Here is the log from the bug description you gave, which suggests that such a thing would have happened.
[2014-07-08 09:14:07.892573] W [afr-common.c:1505:afr_conflicting_iattrs] 0-_tftpboot-replicate-0: /cnp/atcaf125/0-0-6-0/boarddata/etc/fstab: gfid differs on subvolume 0

Comment 4 Niels de Vos 2015-05-17 21:58:21 UTC
GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5.

This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs@gluster.org".

If there is no response by the end of the month, this bug will get automatically closed.

Comment 5 Kaleb KEITHLEY 2015-10-07 13:50:53 UTC
GlusterFS 3.4.x has reached end-of-life.\                                                   \                                                                               If this bug still exists in a later release please reopen this and change the version or open a new bug.