On a fresh setup, If I perform the below mentioned steps: 1.Create a Distributed-Disperse and Disperse volume disabled shd. 2.Started the volume on both master and slave. 3.Created and started a geo-rep setup. 4.Mounted the volume and started untar from client. 5.Stopped geo-rep session after sometime. 5.Checked heal info on both volumes.(There was nothing to heal.) 6.After sometime enabled shd on both volumes. 7.Checked heal info on both volumes.(There was nothing to heal.) I am not seeing any issues when shd is disabled at the start and then enabled.the trigger point here seems to be the geo-rep session stop when shd is enabled.
Here is the root cause of the issue - when features.read-only is enabled, ro_fxattrop will check for the following condition - if (is_readonly_or_worm_enabled(frame, this) && !allzero) STACK_UNWIND_STRICT(fxattrop, frame, -1, EROFS, NULL, xdata); In this is_readonly_or_worm_enabled(frame, this) will return "false" for shd if frame->root->pid < 0, which we set for the frame used in healing as "-6". However, in this case this frame->root->pid is coming up with value as "0". That's why this condition is failing (0 < 0) and the function returning "true" and making this as read-only for shd process also. Why is it happening? when shd triggers heal for the file, it is finding that there is nothing to heal so it is calling "ec_data_undo_pending" to remove dirty flag for data part which in turn calling syncop_fxattrop->SYNCOP We do not pass frame to this SYNCOP and it gets the frame from the task - \ task = synctask_get(); \ stb->task = task; \ if (task) \ frame = task->opframe; However, while creating task we provided frame as NULL. ec_launch_heal(ec_t *ec, ec_fop_data_t *fop) { int ret = 0; ret = synctask_new(ec->xl->ctx->env, ec_synctask_heal_wrap, ec_heal_done, NULL, fop); ---------------------------- So synctask_create will create a task with new frame but it will not set frame->root->pid as -6 and it will be "0" only. This is what we are checking in "is_readonly_or_worm_enabled" and getting read-only as TRUE and the heal (fxattrop) is failing with "read-only file system" error. When we don't enable feature.read-only, this xlator will not be loaded and this condition will not be checked and hence fxattrop sent by "ec_data_undo_pending" will succeed and it will remove the dirty [data part] flag.
https://review.gluster.org/#/c/glusterfs/+/23129/
REVIEW: https://review.gluster.org/23129 (cluster/ec: Create heal task with heal process id) posted (#1) for review on master by Ashish Pandey
REVIEW: https://review.gluster.org/23129 (cluster/ec: Create heal task with heal process id) merged (#2) on master by Amar Tumballi