Bug 765329 (GLUSTER-3597) - VM hangs while self-heal is in progress
Summary: VM hangs while self-heal is in progress
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-3597
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-09-20 15:58 UTC by Pranith Kumar K
Modified: 2011-09-21 13:50 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Pranith Kumar K 2011-09-20 15:58:38 UTC
Afr transaction performs lock, pre-op, op, post-op and unlock steps in that
    order. The child_up[] is overloaded with the information of where all
    the first two steps succeeded. This works perfectly fine for
    Transaction, but the locking/unlocking part of the code is re-used by
    data self-heal. In that each loop_frame does lock, rchecksum,
    read-from-source and write-to-sinks, unlock steps.
    
    Rchecksum fop assumes that the fop needs to happen on one source + all
    sinks and sets the call_count to that number. But if the lock step fails
    on any of the sinks it will mark the child_up of that child to 0, which
    will result in call_count mismatch and the frame will hang thinking that
    some more cbks need to come. When this happens loop_frame will never go
    to unlock step leading to hangs on that file.

Comment 1 Anand Avati 2011-09-21 08:25:17 UTC
CHANGE: http://review.gluster.com/474 (Afr transaction performs lock, pre-op, op, post-op and unlock steps in that) merged in master by Vijay Bellur (vijay)


Note You need to log in before you can comment on or make changes to this bug.