1112158 – gluster volume heal full does not heal data and files inaccessible from mount point

Bug 1112158 - gluster volume heal full does not heal data and files inaccessible from mount point

Summary: gluster volume heal full does not heal data and files inaccessible from mount...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Ravishankar N
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1240658
TreeView+	depends on / blocked

Reported:	2014-06-23 09:01 UTC by Ravishankar N
Modified:	2018-08-29 03:53 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Clones:	1240658 (view as bug list)
Environment:
Last Closed:	2018-08-29 03:53:30 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ravishankar N 2014-06-23 09:01:14 UTC

Description of problem:
When the file and gfid link is deleted from the backend of one of the replica pairs, they become inaccessible from the mount point and 'gluster volume heal <volname> full' fails to trigger the heal.

Version-Release number of selected component (if applicable):


How reproducible:
Always on a 2 node setup (seems to work fine if volume is created on one node only).

Steps to Reproduce:
1. Create 1x2 replica, fuse mount it.
2. mkdir -p /fuse_mnt/dir1/dir2
3. rm -f /brick2/.glusterfs/<gfid-link of dir1>
4. rmdir /brick2/dir1/dir2
5. gluster v heal <volname> full

After some time:
6.[root@ravi4 fuse_mnt]# ls -lR
.:
total 0
drwxr-xr-x 2 root root 6 Jun 23  2014 dir1

./dir1:
total 0

Actual results:
dir2 is missing from mount point and in brick2

Expected results:
dir2 must be accessible since it is present in brick1 and heal full must recreate it in brick2

Comment 1 Ravishankar N 2014-07-16 10:11:57 UTC

The repro steps mentioned in the bug description is not accurate as the brick2 was not killed/restarted. The correct steps would be:

1. Create 1x2 replica on 2 nodes, fuse mount it.
2. mkdir -p /fuse_mnt/dir1/dir2
3. kill one brick process (say brick2)
3. rm -rf /brick2/*
4. rm -rf /brick2/.glusterfs
5. restart brick2 
6. gluster v heal <volname> full

During multiple trial runs, it is observed that the heal full is not happening on the node having the highest  UUID (i.e if brick2's node has highes UUID, then data is not healed to it).

Comment 2 Ravishankar N 2015-06-25 04:35:22 UTC

Note: Since full heals are done in replace-brick or increasing the replica count scenarios. 

For replace-brick, http://review.gluster.org/#/c/10076/ and http://review.gluster.org/#/c/10448/ should fix the issue. The patches mark the replaced brick as needing heal via AFR changelog xattrs. This should cause the self-heal daemon to automatically trigger heals without the need to run the `heal <volname> full` command.

We would need to do a similar fix for replica count increase (add-brick command) as well.

Comment 3 Ravishankar N 2015-11-23 05:35:36 UTC

Note: Auto healing of files during add-brick is being addressed in http://review.gluster.org/#/q/topic:bug-1276203

Comment 4 Amar Tumballi 2018-08-29 03:53:30 UTC

Lot of time since no activity on this bug. We have either fixed it already or it is mostly not critical anymore!

Please re-open the bug if the issue is burning for you, or you want to take the bug to closure with fixes.

Note You need to log in before you can comment on or make changes to this bug.