Bug 765238 (GLUSTER-3506) - big stale lock is seen when a file which needs data self-heal is opened with O_TRUNC
Summary: big stale lock is seen when a file which needs data self-heal is opened with ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-3506
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-09-03 09:13 UTC by Pranith Kumar K
Modified: 2011-09-06 15:36 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Pranith Kumar K 2011-09-03 09:13:40 UTC
The steps in normal data self heal:
    1) take big lock by self-heal frame. Get the xattrs/stat to decide
    source, sink information.
    2) spawn loop frames which perform self-heal by taking small locks on
    the file. Every time a new lock is taken and the old lock is released.
    3) Before releasing the final small lock a big lock is taken by the
    self-heal frame, and unlock on small-lock. Erasing of the pending xattrs
    happen then the big unlock happen and that is the end of the data self-heal.
    
    When a data self-heal is needed for a file and the fop
    that triggers the self-heal is open with O_TRUNC. Fuse sends open then
    an explicit truncate for this. Open triggers the self-heal but by the
    time it tries to spawn the loops the file size is truncated to 0, so
    no loops are formed.
    These are the steps:
    1) Take big lock by self-heal frame. Get the xattrs/stat to decide
    source, sink information.
    2) loop frames are not spawned. The big lock is not released.
    3) One more big lock is taken by the same self-heal frame, Erasing of
    the pending xattrs etc happen, now it does two big unlocks, but after
    the first unlock, the information on which the locks were performed is
    forgotten, so the next unlock becomes a no-op. So there is a stale big
    lock on that file preventing further writes.
    
    As a fix, if the loops are not spawned, use the previous big lock to
    perform the rest of the operations needed in completing the data
    self-heal. No need to have one more big lock.

Comment 1 Anand Avati 2011-09-06 06:24:44 UTC
CHANGE: http://review.gluster.com/339 (The steps in normal data self heal:) merged in master by Anand Avati (avati)


Note You need to log in before you can comment on or make changes to this bug.