Bug 1659334

Summary: FUSE mount seems to be hung and not accessible
Product: [Community] GlusterFS Reporter: Prasad Desala <tdesala>
Component: fuseAssignee: bugs <bugs>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: atumball, bugs, guillaume.pavese, jahernan, nbalacha, pasik, tdesala
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-7.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1662838 1698728 (view as bug list) Environment:
Last Closed: 2019-07-05 07:36:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1662838, 1698728, 1702270, 1702271    

Description Prasad Desala 2018-12-14 06:05:50 UTC
Description of problem:
=======================
When glusto automation script[1] is ran on master branch (6dev-0.370.git8293d21.el7.x86_64) the test script is getting hung at below glusto log,
2018-12-14 11:18:52,556 INFO (run) root.42.139 (cp): getfattr --absolute-names --only-values -n 'trusted.glusterfs.pathinfo' /mnt/testvol_distributed-replicated_glusterfs

It is not able to access the mount point due to which the test script is getting stuck at that point. I even logged into the client machine to see if we can access it manually but the commands(df -h, cd <mountpoint>) are getting hung.

[1] https://review.gluster.org/#/c/glusto-tests/+/21826/

Version-Release number of selected component (if applicable):
6dev-0.370.git8293d21.el7.x86_64

How reproducible:
always (when ran this glusto patch)

Steps to Reproduce:
===================
1) Run the glusto patch (https://review.gluster.org/#/c/glusto-tests/+/21826/) on master build.

Actual results:
===============
FUSE mount is not accessible.

Expected results:
================
FUSE mount should accessible without any hangs.

Comment 4 Yaniv Kaul 2019-04-17 07:44:53 UTC
What's the status of this BZ?

Comment 5 Nithya Balachandran 2019-04-17 12:31:29 UTC
(In reply to Yaniv Kaul from comment #4)
> What's the status of this BZ?

Susant (spalai) is looking into the issue downstream. The fix will be posted upstream.

Comment 6 Xavi Hernandez 2019-04-17 15:52:27 UTC
The hang is caused by a log message sent from inside a locked region in a memory allocation function, which also makes use of dynamic memory, causing a deadlock (Susant has already posted a patch [1] to avoid the deadlock)

However the log message should never be triggered because it means that something is not working fine. I found another case where this also happens [2]. Debugging it, I found an issue in memory accounting management. I fixed it in another patch [3].

[1] https://review.gluster.org/c/glusterfs/+/22589
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1663375
[3] https://review.gluster.org/c/glusterfs/+/22554

Comment 7 Xavi Hernandez 2019-04-17 15:59:33 UTC
To be clear, the referenced bug is not related to this issue, but I found it when I started debugging the original problem.

Comment 8 Worker Ant 2019-04-17 16:03:22 UTC
REVIEW: https://review.gluster.org/22554 (core: handle memory accounting correctly) posted (#4) for review on master by Xavi Hernandez

Comment 9 Xavi Hernandez 2019-04-17 16:05:12 UTC
I've referenced this bug in my patch. I think Sasant's patch should also be added.

Comment 10 Worker Ant 2019-04-22 03:54:57 UTC
REVIEW: https://review.gluster.org/22554 (core: handle memory accounting correctly) merged (#5) on master by Atin Mukherjee

Comment 11 Xavi Hernandez 2019-04-23 11:55:51 UTC
*** Bug 1702268 has been marked as a duplicate of this bug. ***

Comment 12 Amar Tumballi 2019-07-05 07:36:13 UTC
https://review.gluster.org/#/c/glusterfs/+/22600/ merged.