Bug 678497

Summary: netfs.sh patch, when network is lost it takes too long to unmount the NFS filesystems
Product: Red Hat Enterprise Linux 6 Reporter: Raul Mahiques <rmahique>
Component: resource-agentsAssignee: Chris Feist <cfeist>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: low Docs Contact:
Priority: low    
Version: 6.3CC: cluster-maint, cmarthal, djansa, edamato, lhh
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: resource-agents-3.9.2-6.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 678494 Environment:
Last Closed: 2011-12-06 12:02:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Fix
none
Fix, pass 2 none

Description Raul Mahiques 2011-02-18 09:22:00 UTC
+++ This bug was initially created as a clone of Bug #678494 +++

Description of problem:
With the current netfs.sh script when the network connection to an NFS server is lost the script takes longer than it could to unmount the FS.
Using "umount -f" before "fuser" will speed up the process if there is no process holding the mountpoint.


Version-Release number of selected component (if applicable):


How reproducible:
- Setup 2 or more NFS netfs resources in the cluster.
- Cut connectivity to the NFS share.


Steps to Reproduce:
1.Setup 2 or more NFS netfs resources in the cluster.
2.Cut connectivity to the NFS share.

  
Actual results:
It takes longer than it could to unmount the FS.

Expected results:
It unmounts the NFS filesystem quicker when there is no process holding it.

Additional info:

Comment 2 Lon Hohberger 2011-02-18 17:52:17 UTC
This patch doesn't apply to RHEL6; we already do umount -f; all we need to do is move do_force_unmount() to -before- 'fuser -kvm' in fs-lib.sh

Comment 6 Lon Hohberger 2011-08-03 23:26:19 UTC
Test setup:
 * mount on crackle from 192.168.122.201
 * make 192.168.122.201 unavailable (in this case, disabled the service)
 * start 'touch /mnt/tmp/b' in one terminal
 * 'time ./netfst stop' in second terminal

Source of netfst:

[root@crackle ~]# cat netfst
#!/bin/sh

export OCF_RESKEY_name="foo"
export OCF_RESKEY_host="192.168.122.201"
export OCF_RESKEY_export="/mnt/gfs2"
export OCF_RESKEY_mountpoint="/mnt/tmp"
export OCF_RESKEY_force_unmount="1"

/usr/share/cluster/netfs.sh $1

Pre-patch 'stop' of netfst:


[root@crackle ~]# mount | grep /mnt/tmp
192.168.122.201:/mnt/gfs2 on /mnt/tmp type nfs (rw,sync,soft,noac,addr=192.168.122.201)
[root@crackle ~]# time ./netfst stop
<info>   unmounting /mnt/tmp
[netfs.sh] unmounting /mnt/tmp
umount.nfs: /mnt/tmp: device is busy
umount.nfs: /mnt/tmp: device is busy
<debug>  umount failed: 16
[netfs.sh] umount failed: 16
<warning>Sending SIGTERM to processes on /mnt/tmp
[netfs.sh] Sending SIGTERM to processes on /mnt/tmp
Cannot stat /mnt/tmp: Input/output error
Cannot stat /mnt/tmp: Input/output error
Cannot stat /mnt/tmp: Input/output error
<info>   unmounting /mnt/tmp
[netfs.sh] unmounting /mnt/tmp

real    15m23.828s
user    0m0.162s
sys     0m0.495s
[root@crackle ~]# echo $?
0
[root@crackle ~]# mount | grep /mnt/tmp
[root@crackle ~]# 

Post-patch results (test build of resource-agents w/ patch):


[root@crackle ~]# mount | grep /mnt/tmp
192.168.122.201:/mnt/gfs2 on /mnt/tmp type nfs (rw,sync,soft,noac,addr=192.168.122.201)
[root@crackle ~]# rpm -Uvh resource-agents-3.9.2-3.el6.x86_64.rpm
Preparing...                ########################################### [100%]
   1:resource-agents        ########################################### [100%]
[root@crackle ~]# time ./netfst stop
<info>   unmounting /mnt/tmp
[netfs.sh] unmounting /mnt/tmp
umount.nfs: /mnt/tmp: device is busy
umount.nfs: /mnt/tmp: device is busy
<debug>  umount failed: 16
[netfs.sh] umount failed: 16
<warning>Calling 'umount -f /mnt/tmp'
[netfs.sh] Calling 'umount -f /mnt/tmp'
<info>   192.168.122.201:/mnt/gfs2 is not mounted
[netfs.sh] 192.168.122.201:/mnt/gfs2 is not mounted

real    3m23.697s
user    0m0.154s
sys     0m0.305s

Comment 7 Lon Hohberger 2011-08-03 23:27:31 UTC
Created attachment 516591 [details]
Fix

Comment 8 Lon Hohberger 2011-08-03 23:29:18 UTC
Comment on attachment 516591 [details]
Fix

Patch was bad, caused regressions in other agents (fs.sh)

Comment 9 Lon Hohberger 2011-08-08 19:35:59 UTC
Created attachment 517291 [details]
Fix, pass 2

Improved patch which does not have the fs.sh regression.

Comment 10 Lon Hohberger 2011-08-08 19:44:29 UTC
Updated test result:

[root@crackle ~]# touch /mnt/tmp/b &
[1] 9369
[root@crackle ~]# time ./netfst stop
<info>   unmounting /mnt/tmp
[netfs.sh] unmounting /mnt/tmp
umount.nfs: /mnt/tmp: device is busy
umount.nfs: /mnt/tmp: device is busy
<debug>  umount failed: 16
[netfs.sh] umount failed: 16
<warning>Calling 'umount -f /mnt/tmp'
[netfs.sh] Calling 'umount -f /mnt/tmp'
touch: cannot touch `/mnt/tmp/b': Input/output error
<info>   192.168.122.201:/mnt/gfs2 is not mounted
[netfs.sh] 192.168.122.201:/mnt/gfs2 is not mounted
[1]+  Exit 1                  touch /mnt/tmp/b

real    3m23.518s
user    0m0.157s
sys     0m0.269s
[root@crackle ~]# echo $?
0

Running regression runs on fs.sh, but I think we're good.

Comment 11 Lon Hohberger 2011-08-08 19:52:55 UTC
netfs and fs regression runs passed on 3.9.1-3.el6

Comment 12 Lon Hohberger 2011-08-08 20:04:04 UTC
Patch pushed to upstream master:

https://github.com/ClusterLabs/resource-agents/commit/9af820f580691195378cb5bfd58a0a0cdb03802a

And posted to cluster-devel for inclusion in RHEL6 branch:

https://www.redhat.com/archives/cluster-devel/2011-August/msg00030.html

Comment 13 Lon Hohberger 2011-08-08 20:05:21 UTC
(In reply to comment #11)
> netfs and fs regression runs passed on 3.9.1-3.el6

Where 1 means 2 (3.9.2-3.el6).  This build was a test build.

Comment 17 errata-xmlrpc 2011-12-06 12:02:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1580.html