Bug 1734252
| Summary: | Heal not completing after geo-rep session is stopped on EC volumes. | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Ashish Pandey <aspandey> |
| Component: | disperse | Assignee: | Ashish Pandey <aspandey> |
| Status: | CLOSED NEXTRELEASE | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | mainline | CC: | bugs, kiyer, nchilaka, rhs-bugs, sankarshan, storage-qa-internal |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1733531 | Environment: | |
| Last Closed: | 2019-07-30 12:58:23 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1733531 | ||
| Bug Blocks: | |||
|
Comment 1
Ashish Pandey
2019-07-30 05:02:11 UTC
Here is the root cause of the issue -
when features.read-only is enabled, ro_fxattrop will check for the following condition -
if (is_readonly_or_worm_enabled(frame, this) && !allzero)
STACK_UNWIND_STRICT(fxattrop, frame, -1, EROFS, NULL, xdata);
In this is_readonly_or_worm_enabled(frame, this) will return "false" for shd if frame->root->pid < 0, which we set for the frame used in healing as "-6".
However, in this case this frame->root->pid is coming up with value as "0". That's why this condition is failing (0 < 0) and the function returning "true"
and making this as read-only for shd process also.
Why is it happening?
when shd triggers heal for the file, it is finding that there is nothing to heal so it is calling "ec_data_undo_pending" to remove dirty flag for data part
which in turn calling syncop_fxattrop->SYNCOP
We do not pass frame to this SYNCOP and it gets the frame from the task -
\
task = synctask_get(); \
stb->task = task; \
if (task) \
frame = task->opframe;
However, while creating task we provided frame as NULL.
ec_launch_heal(ec_t *ec, ec_fop_data_t *fop)
{
int ret = 0;
ret = synctask_new(ec->xl->ctx->env, ec_synctask_heal_wrap, ec_heal_done,
NULL, fop);
----------------------------
So synctask_create will create a task with new frame but it will not set frame->root->pid as -6 and it will be "0" only.
This is what we are checking in "is_readonly_or_worm_enabled" and getting read-only as TRUE and the heal (fxattrop) is failing with "read-only file system" error.
When we don't enable feature.read-only, this xlator will not be loaded and this condition will not be checked and hence fxattrop sent by "ec_data_undo_pending"
will succeed and it will remove the dirty [data part] flag.
REVIEW: https://review.gluster.org/23129 (cluster/ec: Create heal task with heal process id) posted (#1) for review on master by Ashish Pandey REVIEW: https://review.gluster.org/23129 (cluster/ec: Create heal task with heal process id) merged (#2) on master by Amar Tumballi |