Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 765329 (GLUSTER-3597)

Summary:	VM hangs while self-heal is in progress
Product:	[Community] GlusterFS	Reporter:	Pranith Kumar K <pkarampu>
Component:	replicate	Assignee:	Pranith Kumar K <pkarampu>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	mainline	CC:	gluster-bugs
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pranith Kumar K 2011-09-20 15:58:38 UTC

Afr transaction performs lock, pre-op, op, post-op and unlock steps in that
    order. The child_up[] is overloaded with the information of where all
    the first two steps succeeded. This works perfectly fine for
    Transaction, but the locking/unlocking part of the code is re-used by
    data self-heal. In that each loop_frame does lock, rchecksum,
    read-from-source and write-to-sinks, unlock steps.
    
    Rchecksum fop assumes that the fop needs to happen on one source + all
    sinks and sets the call_count to that number. But if the lock step fails
    on any of the sinks it will mark the child_up of that child to 0, which
    will result in call_count mismatch and the frame will hang thinking that
    some more cbks need to come. When this happens loop_frame will never go
    to unlock step leading to hangs on that file.

Comment 1 Anand Avati 2011-09-21 08:25:17 UTC

CHANGE: http://review.gluster.com/474 (Afr transaction performs lock, pre-op, op, post-op and unlock steps in that) merged in master by Vijay Bellur (vijay)