Bug 1524325

Summary: wrong healing source after upgrade
Product: [Community] GlusterFS Reporter: Dmitry Melekhov <dm>
Component: bitrotAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: high Docs Contact: bugs <bugs>
Priority: unspecified    
Version: 3.10CC: amukherj, bugs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-20 18:24:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dmitry Melekhov 2017-12-11 09:25:50 UTC
Created attachment 1365830 [details]
logs

Description of problem:

We run 2 nodes cluster with replicated volume, yes , this is not recommended setup, but...
Nodes names are father and son.
VMs and gluster are or these nodes.

We moved all VMs to one node (namely father).
We upgraded gluster on one node from 3.10.7 to 3.10.8 on one of nodes ( namely son) and rebooted it.
After this we see that  healing for one of VM images is running from son to father:

[root@son ~]# gluster volume heal pool info
Brick father:/wall/pool/brick
/shador.img 
/balamak.img 
/devaron.img 
/talita.img 
Status: Connected
Number of entries: 4

Brick son:/wall/pool/brick
/endor.img 
Status: Connected
Number of entries: 1


And image became broken.

There was bitrot detection enabled on this volume and , looks like, it is only process which accessed local data on son during boot ( please, look into logs).

We disabled bitrot detection for now.

Version-Release number of selected component (if applicable):

Centos 7.4, gluster 3.10.7 and 3.10.8.


How reproducible:

we don't know how to reproduce it.

Steps to Reproduce:
1. install 2 nodes gluster with replicated volume
2. set VMs on it
3. upgrade
4. reboot

May be just reboot is enough, we don't know

Actual results:
some (one in our case ) VM images are broken, because healed from old data.


Expected results:

healthy data on cluster.


Thank you!

Comment 1 Dmitry Melekhov 2017-12-11 12:03:26 UTC
btw, upgraded and rebooted second node (father) with bitrot detection turned off,
everything is fine.

Comment 2 Dmitry Melekhov 2017-12-13 04:35:25 UTC
And, I don't think this is replicate.
I guess this is caused by bitrot- if there was no i/o in VM , then it may change metadata on wrong server.

Comment 3 Shyamsundar 2018-06-20 18:24:51 UTC
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.