When a self-heal is started, that client will continue to do the self-heal until it's complete. If, however, that client is stopped (unmounted client, restarted shd, etc), the heal starts over from the beginning. When you're healing files that take many days to heal, this behavior is undesirable. Since the bricks already have the lock data that shows how far along the heal is, there should be a way to keep track (metadata? brick memory?) and allow another client to continue. If this is integrated with throttling, this could even be a pooled process queue that could be picked up by whichever shd has free tokens, allowing the entire cluster to progress the heal and, potentially, take over the background heal from a fuse client. This should be controllable from the cli where the admin could stop the self-heal from continuing on a specific client and it would continue from another shd, optionally the specific shd could be specified as part of the instruction. This would be useful where the fuse client begins a background self-heal across the slower client network, but the admin wants the self-heal to be run by a shd across a faster backend connection.
I think this is possible with Granular entry/data self-heal feature that is coming up. Will keep you updated :-).
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.
Migrated to github: https://github.com/gluster/glusterfs/issues/599 Please follow the github issue for further updates on this bug.