Bug 1659334

Summary:	FUSE mount seems to be hung and not accessible
Product:	[Community] GlusterFS	Reporter:	Prasad Desala <tdesala>
Component:	fuse	Assignee:	bugs <bugs>
Status:	CLOSED NEXTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	mainline	CC:	atumball, bugs, guillaume.pavese, jahernan, nbalacha, pasik, tdesala
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-7.0	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1662838 1698728 (view as bug list)		Environment:
Last Closed:	2019-07-05 07:36:13 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1662838, 1698728, 1702270, 1702271

Description Prasad Desala 2018-12-14 06:05:50 UTC

Description of problem:
=======================
When glusto automation script[1] is ran on master branch (6dev-0.370.git8293d21.el7.x86_64) the test script is getting hung at below glusto log,
2018-12-14 11:18:52,556 INFO (run) root.42.139 (cp): getfattr --absolute-names --only-values -n 'trusted.glusterfs.pathinfo' /mnt/testvol_distributed-replicated_glusterfs

It is not able to access the mount point due to which the test script is getting stuck at that point. I even logged into the client machine to see if we can access it manually but the commands(df -h, cd <mountpoint>) are getting hung.

[1] https://review.gluster.org/#/c/glusto-tests/+/21826/

Version-Release number of selected component (if applicable):
6dev-0.370.git8293d21.el7.x86_64

How reproducible:
always (when ran this glusto patch)

Steps to Reproduce:
===================
1) Run the glusto patch (https://review.gluster.org/#/c/glusto-tests/+/21826/) on master build.

Actual results:
===============
FUSE mount is not accessible.

Expected results:
================
FUSE mount should accessible without any hangs.

Comment 4 Yaniv Kaul 2019-04-17 07:44:53 UTC

What's the status of this BZ?

Comment 5 Nithya Balachandran 2019-04-17 12:31:29 UTC

(In reply to Yaniv Kaul from comment #4)
> What's the status of this BZ?

Susant (spalai) is looking into the issue downstream. The fix will be posted upstream.

Comment 6 Xavi Hernandez 2019-04-17 15:52:27 UTC

The hang is caused by a log message sent from inside a locked region in a memory allocation function, which also makes use of dynamic memory, causing a deadlock (Susant has already posted a patch [1] to avoid the deadlock)

However the log message should never be triggered because it means that something is not working fine. I found another case where this also happens [2]. Debugging it, I found an issue in memory accounting management. I fixed it in another patch [3].

[1] https://review.gluster.org/c/glusterfs/+/22589
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1663375
[3] https://review.gluster.org/c/glusterfs/+/22554

Comment 7 Xavi Hernandez 2019-04-17 15:59:33 UTC

To be clear, the referenced bug is not related to this issue, but I found it when I started debugging the original problem.

Comment 8 Worker Ant 2019-04-17 16:03:22 UTC

REVIEW: https://review.gluster.org/22554 (core: handle memory accounting correctly) posted (#4) for review on master by Xavi Hernandez

Comment 9 Xavi Hernandez 2019-04-17 16:05:12 UTC

I've referenced this bug in my patch. I think Sasant's patch should also be added.

Comment 10 Worker Ant 2019-04-22 03:54:57 UTC

REVIEW: https://review.gluster.org/22554 (core: handle memory accounting correctly) merged (#5) on master by Atin Mukherjee

Comment 11 Xavi Hernandez 2019-04-23 11:55:51 UTC

*** Bug 1702268 has been marked as a duplicate of this bug. ***

Comment 12 Amar Tumballi 2019-07-05 07:36:13 UTC

https://review.gluster.org/#/c/glusterfs/+/22600/ merged.