Bug 1721355

Summary: Heal Info is hung when I/O is in progress on a gluster block volume
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ravishankar N <ravishankar>
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED ERRATA QA Contact: Sayalee <saraut>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.5CC: aloganat, arukumar, atoborek, bkunal, dwalveka, jmulligan, nchilaka, pasik, pprakash, puebele, rhs-bugs, rkothiya, sabose, sheggodu, storage-qa-internal, vdas
Target Milestone: ---   
Target Release: RHGS 3.5.z Batch Update 3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-6.0-38 Doc Type: Known Issue
Doc Text:
Previously, the ‘gluster volume heal $volname info’ command would hang if the lock was already acquired by a client writing to the same file as it blocked the locks on the files to determine and print if they needed to heal. With this update, the command displays the list of files needing to heal without taking any locks.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-17 04:50:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1696815, 1703695, 1787998, 1812114    

Description Ravishankar N 2019-06-18 04:56:44 UTC
Description of problem:
I was observing this issue while working on BZ 1707259. 
When there is no self-heals pending, but there are a lot of I/Os happening on a replicate volume with gluster block profile enabled, heal-info was hung. The moment I/O stopped, the command completed successfully. I'm guessing it has something to do with eager locking but I need to RCA it.

Version-Release number of selected component (if applicable):
rhgs-3.5.0

How reproducible:
Always on my dev VMs.

Steps to Reproduce:
- Create a 1x3 replica volume (3 node setup)
- Apply  gluster-block profile on the volume. (gluster v set $volname group gluster-block)
- Mount a fuse client on another node and run parallel 'dd's :
for i in seq{1..20}; do dd if=/dev/urandom of=FILE_$i bs=1024 count=102400& done
- After 10-20 seconds while the I/O is going on, run heal-info command - It will be hung.


Actual results:
heal info command is hung

Expected results:
It should not be hung.

Comment 13 Patric Uebele 2019-07-19 09:49:50 UTC
Ok to do it in a later BU

Comment 39 Pranith Kumar K 2019-12-03 08:33:27 UTC
*** Bug 1483977 has been marked as a duplicate of this bug. ***

Comment 40 Pranith Kumar K 2019-12-03 08:34:01 UTC
*** Bug 1643559 has been marked as a duplicate of this bug. ***

Comment 41 Pranith Kumar K 2019-12-03 08:34:48 UTC
*** Bug 1643081 has been marked as a duplicate of this bug. ***

Comment 42 Pranith Kumar K 2019-12-03 08:34:49 UTC
*** Bug 1763596 has been marked as a duplicate of this bug. ***

Comment 44 Karthik U S 2020-03-30 09:25:46 UTC
*** Bug 1812114 has been marked as a duplicate of this bug. ***

Comment 57 Arthy Loganathan 2020-11-18 11:28:31 UTC
As per #comment51, verified afr in-service upgrade scenarios of gluster and its working as expected.

Comment 64 errata-xmlrpc 2020-12-17 04:50:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5603