Bug 765238 (GLUSTER-3506)

Summary: big stale lock is seen when a file which needs data self-heal is opened with O_TRUNC
Product: [Community] GlusterFS Reporter: Pranith Kumar K <pkarampu>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Pranith Kumar K 2011-09-03 09:13:40 UTC
The steps in normal data self heal:
    1) take big lock by self-heal frame. Get the xattrs/stat to decide
    source, sink information.
    2) spawn loop frames which perform self-heal by taking small locks on
    the file. Every time a new lock is taken and the old lock is released.
    3) Before releasing the final small lock a big lock is taken by the
    self-heal frame, and unlock on small-lock. Erasing of the pending xattrs
    happen then the big unlock happen and that is the end of the data self-heal.
    
    When a data self-heal is needed for a file and the fop
    that triggers the self-heal is open with O_TRUNC. Fuse sends open then
    an explicit truncate for this. Open triggers the self-heal but by the
    time it tries to spawn the loops the file size is truncated to 0, so
    no loops are formed.
    These are the steps:
    1) Take big lock by self-heal frame. Get the xattrs/stat to decide
    source, sink information.
    2) loop frames are not spawned. The big lock is not released.
    3) One more big lock is taken by the same self-heal frame, Erasing of
    the pending xattrs etc happen, now it does two big unlocks, but after
    the first unlock, the information on which the locks were performed is
    forgotten, so the next unlock becomes a no-op. So there is a stale big
    lock on that file preventing further writes.
    
    As a fix, if the loops are not spawned, use the previous big lock to
    perform the rest of the operations needed in completing the data
    self-heal. No need to have one more big lock.

Comment 1 Anand Avati 2011-09-06 06:24:44 UTC
CHANGE: http://review.gluster.com/339 (The steps in normal data self heal:) merged in master by Anand Avati (avati)