Bug 1368441 - NetBSD machines hang on umount
Summary: NetBSD machines hang on umount
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: project-infrastructure
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nigel Babu
QA Contact:
URL:
Whiteboard:
: 1359879 1366168 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-19 12:05 UTC by Nigel Babu
Modified: 2016-08-23 12:10 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-08-23 11:28:15 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Nigel Babu 2016-08-19 12:05:12 UTC
Occasionally, the NetBSD machines hang when trying to umount.

The only way to recover is to do a hard reboot or a `/sbin/reboot -n`. We need to figure out why we're hitting this and in the meanwhile figure out when this happens, fail sooner, and make sure we're notified.

FAILURE on http://build.gluster.org/job/netbsd7-regression/47/consoleText
nbslave7g.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/42/consoleText
nbslave79.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/39/consoleText
nbslave7c.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/36/consoleText
nbslave7h.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/35/consoleText
nbslave7c.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/32/consoleText
nbslave74.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/30/consoleText
nbslave74.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/29/consoleText
nbslave7h.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/28/consoleText
nbslave71.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/27/consoleText
nbslave7j.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/26/consoleText
nbslave79.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/16/consoleText
nbslave7h.cloud.gluster.org

FAILURE on http://build.gluster.org/job/netbsd7-regression/19/consoleText
nbslave74.cloud.gluster.org

Comment 1 Jeff Darcy 2016-08-19 12:32:42 UTC
In bug 1359879, forcibly killing a glusterfs (client) process seemed to get things unstuck without a reboot.  Is that not so for these hangs?

Comment 2 Nigel Babu 2016-08-19 12:52:48 UTC
I should try that next time.

Comment 3 Nigel Babu 2016-08-20 09:44:39 UTC
Interesting. So we do a kill -9 in our test clean up scripts. I did a random check of all the netbsd machines. A bunch of them had hung umount processes. But, a `pkill gluster` fixed all but one. I'm going to add two things. A `px ax | grep gluster` to the start of every job and a `pkill gluster`. I want to see how many times there are processes left over and how many times we end up killing those processes.

Comment 4 Nigel Babu 2016-08-22 08:59:39 UTC
*** Bug 1366168 has been marked as a duplicate of this bug. ***

Comment 5 Nigel Babu 2016-08-23 11:28:15 UTC
The issue of new tests failing have now been fixed with the addition of the `pkill gluster`. I've filed bug 1369401 to track what's causing umount to hang in the first place.

Comment 6 Nigel Babu 2016-08-23 11:31:21 UTC
*** Bug 1359879 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.