Bug 1638026

Summary: "gluster vol heal <vol name> info" is hung on Distributed-Replicated ( Arbiter )
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Sunil Kumar Acharya <sheggodu>
Component: arbiterAssignee: Ravishankar N <ravishankar>
Status: CLOSED ERRATA QA Contact: Vijay Avuthu <vavuthu>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: apaladug, atumball, ksandha, nchilaka, pkarampu, pprakash, ravishankar, rcyriac, rhs-bugs, sanandpa, sankarshan, srmukher, storage-qa-internal, vavuthu
Target Milestone: ---Keywords: Automation, ZStream
Target Release: RHGS 3.4.z Async Update   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-18.2 Doc Type: Bug Fix
Doc Text:
Previously a flaw in the self-heal code caused an inode-lock to be taken twice on the file that needed heal but released only once. Due to this, a stale lock was left behind on the brick, causing further operations(like heal or write from the client) that needed the lock to be hung. With this update, the inode locks are released accurately without leaving behind any stale locks in the brick. This prevents further heals or writes from the client from experiencing a hang.
Story Points: ---
Clone Of: 1636902 Environment:
Last Closed: 2018-10-23 09:11:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1636902, 1637802, 1637953, 1637989, 1638159    
Bug Blocks:    

Comment 5 Vijay Avuthu 2018-10-17 12:23:28 UTC
Update:
========

Build Used: glusterfs-3.12.2-18.2.el7rhgs.x86_64

> Ran the automation case "test_entry_self_heal_heal_command" on 2*(2+1) and didn't see any hangs for "gluster volume heal <volname> info "

> heal info is able to list the files without hung.

[root@rhsauto049 ~]# gluster vol heal testvol_distributed-replicated info
Brick rhsauto049.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed-replicated_brick0
Status: Connected
Number of entries: 0

Brick rhsauto029.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed-replicated_brick1
Status: Connected
Number of entries: 0

Brick rhsauto034.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed-replicated_brick2
Status: Connected
Number of entries: 0

Brick rhsauto039.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed-replicated_brick3
/files/user1_a/dir1_a/dir0_a/testfile2_a.txt 
/files/user1_a/dir1_a/dir0_a 
/files/user1_a/dir1_a/dir0_a/testfile10_a.txt 
/files/user1_a/dir1_a 
/files/user1_a/dir1_a/dir1_a/testfile2_a.txt 
/files/user1_a/dir1_a/dir1_a 
/files/user1_a/dir1_a/dir1_a/testfile10_a.txt 
/files/user1_a/dir1_a/testfile2_a.txt 
/files/user1_a/dir1_a/testfile10_a.txt 
Status: Connected
Number of entries: 9

Brick rhsauto040.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed-replicated_brick4
/files/user1_a/dir1_a/dir0_a/testfile2_a.txt 
/files/user1_a/dir1_a/dir0_a 
/files/user1_a/dir1_a/dir0_a/testfile10_a.txt 
/files/user1_a/dir1_a 
/files/user1_a/dir1_a/dir1_a/testfile2_a.txt 
/files/user1_a/dir1_a/dir1_a 
/files/user1_a/dir1_a/dir1_a/testfile10_a.txt 
/files/user1_a/dir1_a/testfile2_a.txt 
/files/user1_a/dir1_a/testfile10_a.txt 
Status: Connected
Number of entries: 9

Brick rhsauto041.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed-replicated_brick5
<gfid:f8395fc2-fd6f-4408-a7d3-d039273bd7f2>/user1_a/dir1_a 
Status: Connected
Number of entries: 1

> Since this bug is for hang, I am changing the status to verified and raise new bug for heal pending issue.

Comment 13 errata-xmlrpc 2018-10-23 09:11:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2970