Bug 1636902

Summary:	"gluster vol heal <vol name> info" is hung on Distributed-Replicated ( Arbiter )
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Vijay Avuthu <vavuthu>
Component:	arbiter	Assignee:	Ravishankar N <ravishankar>
Status:	CLOSED ERRATA	QA Contact:	Vijay Avuthu <vavuthu>
Severity:	high	Docs Contact:
Priority:	high
Version:	rhgs-3.4	CC:	anepatel, apaladug, atumball, bkunal, bmekala, dwojslaw, nbalacha, nchilaka, pkarampu, rcyriac, rhs-bugs, sanandpa, sankarshan, sheggodu, storage-qa-internal, vavuthu
Target Milestone:	---	Keywords:	Automation, ZStream
Target Release:	RHGS 3.4.z Batch Update 1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.12.2-23	Doc Type:	Bug Fix
Doc Text:	Previously a flaw in the self-heal code caused an inode-lock to be taken twice on the file that needed heal but released only once. Due to this, a stale lock was left behind on the brick, causing further operations(like heal or write from the client) that needed the lock to be hung. With this update, the inode locks are released accurately without leaving behind any stale locks in the brick. This prevents further heals or writes from the client from experiencing a hang.	Story Points:	---
Clone Of:
Clones:	1638026 (view as bug list)		Environment:
Last Closed:	2018-10-31 08:46:58 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1637802, 1637953, 1637989, 1638159
Bug Blocks:	1638026

Description Vijay Avuthu 2018-10-08 08:39:33 UTC

Description of problem:

While running automation runs, found that healing is not completed on Distributed-Replicated ( Arbiter )

Version-Release number of selected component (if applicable):

glusterfs-3.12.2-18.1.el7rhgs.x86_64


How reproducible: Always


Steps to Reproduce:

1) create distributed-replicated volume ( Arbiter:2 x (2 + 1) ) and mount the volume
2) Disable client side heals
3) write IO using below script

#python /usr/share/glustolibs/io/scripts/file_dir_ops.py create_deep_dirs_with_files --dir-length 2 --dir-depth 2 --max-num-of-dirs 2 --num-of-files 20 /mnt/testvol_distributed-replicated_glusterfs/files

4) Disable self-heal-daemon
5) bring bricks offline from each set ( brick2 and brick3 )
6) create files from mount point

#python /usr/share/glustolibs/io/scripts/file_dir_ops.py create_files -f 20 /mnt/testvol_distributed-replicated_glusterfs/files

7) bring bricks online 
8) Enable self-heal-daemon
9) issue volume heal
10) wait for heal to complete

11) Disable self-heal-daemon
12) bring bricks offline from each set ( brick0 and brick5 )
13) Modify data
python /usr/share/glustolibs/io/scripts/file_dir_ops.py mv /mnt/testvol_distributed-replicated_glusterfs/files
14) bring bricks online
15) Enable self-heal-daemon
16) Issue volume heal
17) Wait for heal to complete


Actual results:

After step 17, heal info is still pending

[root@rhsauto039 ~]# gluster vol heal testvol_distributed-replicated info
Brick rhsauto039.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed-replicated_brick0
Status: Connected
Number of entries: 0

Brick rhsauto045.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed-replicated_brick1
Status: Connected
Number of entries: 0

Brick rhsauto025.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed-replicated_brick2
Status: Connected
Number of entries: 0

Brick rhsauto047.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed-replicated_brick3
/files/user2_a/dir0_a/dir0_a 
/files/user2_a/dir0_a 
Status: Connected
Number of entries: 2

Brick rhsauto040.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed-replicated_brick4
/files/user2_a/dir0_a/dir0_a 
/files/user2_a/dir0_a 
Status: Connected
Number of entries: 2

Brick rhsauto026.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed-replicated_brick5
<gfid:950fbe25-b5a1-4999-a718-f3424100189a>/user2_a/dir0_a/dir0_a 
<gfid:950fbe25-b5a1-4999-a718-f3424100189a>/user2_a/dir0_a 
Status: Connected
Number of entries: 2

[root@rhsauto039 ~]#


Expected results:

healing should complete

Additional info:

[root@rhsauto039 ~]# gluster vol info
 
Volume Name: testvol_distributed-replicated
Type: Distributed-Replicate
Volume ID: 521dc7f1-0b1f-46f8-b802-6894a1828b32
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: rhsauto039.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed-replicated_brick0
Brick2: rhsauto045.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed-replicated_brick1
Brick3: rhsauto025.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed-replicated_brick2 (arbiter)
Brick4: rhsauto047.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed-replicated_brick3
Brick5: rhsauto040.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed-replicated_brick4
Brick6: rhsauto026.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed-replicated_brick5 (arbiter)
Options Reconfigured:
cluster.self-heal-daemon: on
cluster.data-self-heal: off
cluster.metadata-self-heal: off
cluster.entry-self-heal: off
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
[root@rhsauto039 ~]# 


[root@rhsauto039 ~]# gluster vol status
Status of volume: testvol_distributed-replicated
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick rhsauto039.lab.eng.blr.redhat.com:/br
icks/brick0/testvol_distributed-replicated_
brick0                                      49153     0          Y       18060
Brick rhsauto045.lab.eng.blr.redhat.com:/br
icks/brick0/testvol_distributed-replicated_
brick1                                      49152     0          Y       21012
Brick rhsauto025.lab.eng.blr.redhat.com:/br
icks/brick0/testvol_distributed-replicated_
brick2                                      49152     0          Y       21449
Brick rhsauto047.lab.eng.blr.redhat.com:/br
icks/brick0/testvol_distributed-replicated_
brick3                                      49152     0          Y       20558
Brick rhsauto040.lab.eng.blr.redhat.com:/br
icks/brick0/testvol_distributed-replicated_
brick4                                      49152     0          Y       20536
Brick rhsauto026.lab.eng.blr.redhat.com:/br
icks/brick0/testvol_distributed-replicated_
brick5                                      49153     0          Y       22118
Self-heal Daemon on localhost               N/A       N/A        Y       18083
Self-heal Daemon on rhsauto047.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       20889
Self-heal Daemon on rhsauto040.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       21114
Self-heal Daemon on rhsauto045.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       21593
Self-heal Daemon on rhsauto025.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       21733
Self-heal Daemon on rhsauto026.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       22141
 
Task Status of Volume testvol_distributed-replicated
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@rhsauto039 ~]# 


SOS Reports , health-report and State Dumps : http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/vavuthu/arbiter_heal_issue/

> the same scenario is passing in plain Arbiter volume.

Comment 9 Amar Tumballi 2018-10-10 10:15:40 UTC

Also note, this was hit by a User in community, and would surely propose it as 'Blocker', as this is a common activity in all scenarios, OCS, RHHI, RHGS.

Comment 10 Ravishankar N 2018-10-10 11:26:01 UTC

Vijay was telling me that manual testing did not find any issues with the scratch build. While he is running the automated tests, moving the BZ to POST. Upstream patch link is https://review.gluster.org/#/c/21380/

Comment 16 Bipin Kunal 2018-10-16 06:42:07 UTC

*** Bug 1638947 has been marked as a duplicate of this bug. ***

Comment 17 Amar Tumballi 2018-10-16 11:17:35 UTC

Can this be prevented by using 'sdfs' feature? (serializing directory entry ops) ?

Comment 19 Anees Patel 2018-10-22 07:10:37 UTC

Verified the fix on build 
glusterfs-libs-3.12.2-23.el7rhgs.x86_64.

The is no heal hang issue observed anymore, but the heal is pending and is tracked in bug: 1640148, hence setting this bz to verified state.

Comment 20 Pranith Kumar K 2018-10-22 13:37:20 UTC

(In reply to Amar Tumballi from comment #17)
> Can this be prevented by using 'sdfs' feature? (serializing directory entry
> ops) ?

Without serializing all entry ops irrespective of the parent-directory on which the fop comes, I don't think it is possible. But doing this will lead to very bad performance. So at the moment I will try to fix it in AFR/EC as the xlators are doing things that posix is not well equipped to do.

Comment 21 Pranith Kumar K 2018-10-22 14:16:48 UTC

Vijay,
     For all upgrade/healing tests, can we have an extra step after each upgrade completes, where we add a fresh mount and a way to create a new file in existing directory and add data to existing files? This is the only way to ensure that this bug doesn't repeat in future.

Pranith

Comment 22 Ravishankar N 2018-10-22 15:53:20 UTC

*** Bug 1635967 has been marked as a duplicate of this bug. ***

Comment 23 Vijay Avuthu 2018-10-23 06:36:46 UTC

(In reply to Pranith Kumar K from comment #21)
> Vijay,
>      For all upgrade/healing tests, can we have an extra step after each
> upgrade completes, where we add a fresh mount and a way to create a new file
> in existing directory and add data to existing files? This is the only way
> to ensure that this bug doesn't repeat in future.
> 
> Pranith

sure pranith. We include that in our upgrade testing.

Comment 24 Pranith Kumar K 2018-10-23 08:52:18 UTC

(In reply to Vijay Avuthu from comment #23)
> (In reply to Pranith Kumar K from comment #21)
> > Vijay,
> >      For all upgrade/healing tests, can we have an extra step after each
> > upgrade completes, where we add a fresh mount and a way to create a new file
> > in existing directory and add data to existing files? This is the only way
> > to ensure that this bug doesn't repeat in future.
> > 
> > Pranith
> 
> sure pranith. We include that in our upgrade testing.

Forgot to mention, even for EC volumes the tests should be modified in similar fashion.

Comment 25 Ravishankar N 2018-10-25 12:12:50 UTC

Changing doc text to be identical to BZ 1638026

Comment 27 errata-xmlrpc 2018-10-31 08:46:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3432

Comment 29 Red Hat Bugzilla 2023-09-15 01:27:43 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days