Description of problem: If metadata split-brain is detected, data/entry self is not taking place. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Create replicate volume 2. set self heal daemon off. 3.. Kill one brick. 4. Perform action at mount point to change metadata 5. Kill another brick and bring back the first brick. 6. Again perform action to change the metadata at mount point. 7. Bring back latest killed brick. 8. Now files are in metadata split-brain. Kill any brick and change the data at the mount point. Bring back the killed brick. 9. ls at mount-point will not successfully heal data-split brain Actual results: Data wont get healed. Expected results: Data should get healed although there is metadata split-brain. Additional info:
REVIEW: http://review.gluster.org/5253 (cluster/afr: Allow data/entry self heal for metadata split-brain) posted (#2) for review on master by venkatesh somyajulu (vsomyaju)
REVIEW: http://review.gluster.org/5253 (cluster/afr: Allow data/entry self heal for metadata split-brain) posted (#3) for review on master by venkatesh somyajulu (vsomyaju)
REVIEW: http://review.gluster.org/5253 (cluster/afr: Allow data/entry self heal for metadata split-brain) posted (#4) for review on master by venkatesh somyajulu (vsomyaju)
REVIEW: http://review.gluster.org/5253 (cluster/afr: Allow data/entry self heal for metadata split-brain) posted (#5) for review on master by venkatesh somyajulu (vsomyaju)
COMMIT: http://review.gluster.org/5253 committed in master by Vijay Bellur (vbellur) ------ commit ef8092fab7b6fa5a16cc0e22b75945758519d5a6 Author: Venkatesh Somyajulu <vsomyaju> Date: Fri Jun 28 19:11:47 2013 +0530 cluster/afr: Allow data/entry self heal for metadata split-brain Problem: Currently whenever there is metadata split-brain, a variable sh->op_failed is set to 1 to denote that self heal got failed. But if we proceed for data self heal, even code-path of data self heal also relies on the sh->op_failed variable. So if will check for sh->op_failed variable and will eventually fails to do data self heal. So needed a mechanism to allow data self heal even if metadata is in split brain. Fix: Some data structure revamp is done in http://review.gluster.com/#/c/5106/ fix and this patch is based on the above fix. Now we can store which particular self-heal got failed i.e GFID_OR_MISSING_ENTRY_SELF_HEAL, METADATA, DATA, ENTRY. And we can do two types of self heal failure check. 1. Individual type check: We can check which among all four (Metadata, Data, Gfid or missing entry, entry self heal) got failed. 2. In afr_self_heal_completion_cbk, we need to make check based on the fact that if any specific self heal got failed treat the complete self heal as failure so that it will populate corresponding circular buffer of event history accordingly. Change-Id: Icb91e513bcc752386fc8a78812405cfabe5cac2d BUG: 977797 Signed-off-by: Venkatesh Somyajulu <vsomyaju> Reviewed-on: http://review.gluster.org/5253 Reviewed-by: Pranith Kumar Karampuri <pkarampu> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Vijay Bellur <vbellur>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report. glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user