Bug 1707259

Summary:	Volume heal for block hosting volume is pending for over 4 hours
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Rachael <rgeorge>
Component:	replicate	Assignee:	Ravishankar N <ravishankar>
Status:	CLOSED WORKSFORME	QA Contact:	Nag Pavan Chilakam <nchilaka>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	ocs-3.11	CC:	aramteke, jarrpa, knarra, kramdoss, ksubrahm, madam, nchilaka, pasik, pkarampu, pprakash, puebele, ravishankar, rcyriac, rgeorge, rhs-bugs, rtalur, sarumuga, sheggodu, storage-qa-internal
Target Milestone:	---	Keywords:	ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1672543	Environment:
Last Closed:	2020-09-11 14:05:06 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1672543

Comment 16 Yaniv Kaul 2019-11-25 10:10:11 UTC

What's the next step here?

Comment 17 Ravishankar N 2019-11-25 10:39:57 UTC

(In reply to Yaniv Kaul from comment #16)
> What's the next step here?

(Copying from https://bugzilla.redhat.com/show_bug.cgi?id=1721355#c9)
<snip>  we need to add better eager-lock debugging infra to AFR. Since https://review.gluster.org/19503, AFR maintains multiple queues for fops for performance gains and it is a bit difficult as of now to gain insight into when eager locks are acquired and released when there is a lot of I/O being pumped. </snip>

Also, we need to write some python gdb helper scripts (my question on stack overflow (comment #13) has since received an answer) to isolate threads of interest in a multi-hundred thread process. I'm currently working on the lock less heal info bug. I can pick this up once that is complete.