Bug 1368441

Summary: NetBSD machines hang on umount
Product: [Community] GlusterFS Reporter: Nigel Babu <nigelb>
Component: project-infrastructureAssignee: Nigel Babu <nigelb>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, gluster-infra, jdarcy, oleksandr
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-23 11:28:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nigel Babu 2016-08-19 12:05:12 UTC
Occasionally, the NetBSD machines hang when trying to umount.

The only way to recover is to do a hard reboot or a `/sbin/reboot -n`. We need to figure out why we're hitting this and in the meanwhile figure out when this happens, fail sooner, and make sure we're notified.

FAILURE on http://build.gluster.org/job/netbsd7-regression/47/consoleText
nbslave7g.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/42/consoleText
nbslave79.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/39/consoleText
nbslave7c.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/36/consoleText
nbslave7h.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/35/consoleText
nbslave7c.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/32/consoleText
nbslave74.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/30/consoleText
nbslave74.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/29/consoleText
nbslave7h.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/28/consoleText
nbslave71.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/27/consoleText
nbslave7j.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/26/consoleText
nbslave79.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/16/consoleText
nbslave7h.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/19/consoleText
nbslave74.cloud.gluster.org

Comment 1 Jeff Darcy 2016-08-19 12:32:42 UTC
In bug 1359879, forcibly killing a glusterfs (client) process seemed to get things unstuck without a reboot.  Is that not so for these hangs?

Comment 2 Nigel Babu 2016-08-19 12:52:48 UTC
I should try that next time.

Comment 3 Nigel Babu 2016-08-20 09:44:39 UTC
Interesting. So we do a kill -9 in our test clean up scripts. I did a random check of all the netbsd machines. A bunch of them had hung umount processes. But, a `pkill gluster` fixed all but one. I'm going to add two things. A `px ax | grep gluster` to the start of every job and a `pkill gluster`. I want to see how many times there are processes left over and how many times we end up killing those processes.

Comment 4 Nigel Babu 2016-08-22 08:59:39 UTC
*** Bug 1366168 has been marked as a duplicate of this bug. ***

Comment 5 Nigel Babu 2016-08-23 11:28:15 UTC
The issue of new tests failing have now been fixed with the addition of the `pkill gluster`. I've filed bug 1369401 to track what's causing umount to hang in the first place.

Comment 6 Nigel Babu 2016-08-23 11:31:21 UTC
*** Bug 1359879 has been marked as a duplicate of this bug. ***