| Summary: | big stale lock is seen when a file which needs data self-heal is opened with O_TRUNC | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Pranith Kumar K <pkarampu> |
| Component: | replicate | Assignee: | Pranith Kumar K <pkarampu> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | mainline | CC: | gluster-bugs |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
CHANGE: http://review.gluster.com/339 (The steps in normal data self heal:) merged in master by Anand Avati (avati) |
The steps in normal data self heal: 1) take big lock by self-heal frame. Get the xattrs/stat to decide source, sink information. 2) spawn loop frames which perform self-heal by taking small locks on the file. Every time a new lock is taken and the old lock is released. 3) Before releasing the final small lock a big lock is taken by the self-heal frame, and unlock on small-lock. Erasing of the pending xattrs happen then the big unlock happen and that is the end of the data self-heal. When a data self-heal is needed for a file and the fop that triggers the self-heal is open with O_TRUNC. Fuse sends open then an explicit truncate for this. Open triggers the self-heal but by the time it tries to spawn the loops the file size is truncated to 0, so no loops are formed. These are the steps: 1) Take big lock by self-heal frame. Get the xattrs/stat to decide source, sink information. 2) loop frames are not spawned. The big lock is not released. 3) One more big lock is taken by the same self-heal frame, Erasing of the pending xattrs etc happen, now it does two big unlocks, but after the first unlock, the information on which the locks were performed is forgotten, so the next unlock becomes a no-op. So there is a stale big lock on that file preventing further writes. As a fix, if the loops are not spawned, use the previous big lock to perform the rest of the operations needed in completing the data self-heal. No need to have one more big lock.