Bug 1707259

Summary: Volume heal for block hosting volume is pending for over 4 hours
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rachael <rgeorge>
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED WORKSFORME QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: high Docs Contact:
Priority: unspecified    
Version: ocs-3.11CC: aramteke, jarrpa, knarra, kramdoss, ksubrahm, madam, nchilaka, pasik, pkarampu, pprakash, puebele, ravishankar, rcyriac, rgeorge, rhs-bugs, rtalur, sarumuga, sheggodu, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1672543 Environment:
Last Closed: 2020-09-11 14:05:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1672543    

Comment 16 Yaniv Kaul 2019-11-25 10:10:11 UTC
What's the next step here?

Comment 17 Ravishankar N 2019-11-25 10:39:57 UTC
(In reply to Yaniv Kaul from comment #16)
> What's the next step here?

(Copying from https://bugzilla.redhat.com/show_bug.cgi?id=1721355#c9)
<snip>  we need to add better eager-lock debugging infra to AFR. Since https://review.gluster.org/19503, AFR maintains multiple queues for fops for performance gains and it is a bit difficult as of now to gain insight into when eager locks are acquired and released when there is a lot of I/O being pumped. </snip>

Also, we need to write some python gdb helper scripts (my question on stack overflow (comment #13) has since received an answer) to isolate threads of interest in a multi-hundred thread process. I'm currently working on the lock less heal info bug. I can pick this up once that is complete.