Bug 1560918

Summary: gluster volume heal <vol> full doesn't cleanup deleted files
Product: [Community] GlusterFS Reporter: Dimitri Ars <dimitri.ars>
Component: selfhealAssignee: Vishal Pandey <vpandey>
Status: CLOSED DEFERRED QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: mainlineCC: atumball, bugs, ksubrahm, pkarampu, ravishankar, risjain, rkavunga
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-15 06:43:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dimitri Ars 2018-03-27 09:17:06 UTC
Description of problem:

We restore 1 peer from a VM Snapshot. This peer is a member of 3-replica volumes. We would like to sync the brick data from the 2 proper bricks in the replica.

When doing a "gluster volume heal <vol> full" we expect that this will replicate to the restored brick....it does only partly, all changed and new files are updated, but files/directories that were deleted (or renamed) after the restore point (but existed in the restore!) are still there.


Version-Release number of selected component (if applicable):
3.12.6

How reproducible:
Create a 3 node gluster
Create a 3 replica volume and put some data
Create a VM snapshot (eg not a gluster snapshot!) of the nodes
Change, Add, Rename, Remove some data
Restore the VM snapshot for 1 node
Run gluster volume heal <vol> full.

Actual results:
Check the data on the brick level of the restored node. You will still see removed files/dirs and for renamed files/dirs you will have both the "old" and the new name.

Expected results:
Brick data is consistent on all bricks.

Additional info:
This is part of a restore scenario where 1 gluster node is determined corrupt (in whatever area, doesn't have to be gluster) and will be restored using a VM snapshot. The VM Snapshot restore is fast, keeps all IP's, uuids (to which heketi in kubernetes and its database points!) so removing replicas, bricks, rebuilding, modifying heketi database doesn't feel like a good scenario.
If there are better ways to do this we'd be interested as well, but I'm wondering how people feel about the usage of the heal command and the expected result.

Comment 1 Shyamsundar 2018-10-23 14:54:39 UTC
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.

Comment 2 Amar Tumballi 2019-07-15 05:19:28 UTC
This got missed in our triaging earlier. Dimitri, is the issue still happening with latest releases? We did fix some of the issues with glusterfs heal etc in last 2yrs (since 3.12 branching).

Comment 3 Dimitri Ars 2019-07-15 06:30:23 UTC
Not sure if this is still happening. As we're moving away from gluster this is not of interest for us anymore, I would say close this for now, someone else can report it if it's still there.

Comment 4 Amar Tumballi 2019-07-15 06:43:07 UTC
Thanks for the update Dimitri. Sad to see you pick other solutions. Will keep working on Gluster's stability so we have project which will keep its users happy :-)

For now, will close it as DEFERRED, so we can check this and continue on this when we get it.