Red Hat Bugzilla – Bug 678497
netfs.sh patch, when network is lost it takes too long to unmount the NFS filesystems
Last modified: 2011-12-06 07:02:39 EST
+++ This bug was initially created as a clone of Bug #678494 +++ Description of problem: With the current netfs.sh script when the network connection to an NFS server is lost the script takes longer than it could to unmount the FS. Using "umount -f" before "fuser" will speed up the process if there is no process holding the mountpoint. Version-Release number of selected component (if applicable): How reproducible: - Setup 2 or more NFS netfs resources in the cluster. - Cut connectivity to the NFS share. Steps to Reproduce: 1.Setup 2 or more NFS netfs resources in the cluster. 2.Cut connectivity to the NFS share. Actual results: It takes longer than it could to unmount the FS. Expected results: It unmounts the NFS filesystem quicker when there is no process holding it. Additional info:
This patch doesn't apply to RHEL6; we already do umount -f; all we need to do is move do_force_unmount() to -before- 'fuser -kvm' in fs-lib.sh
Test setup: * mount on crackle from 192.168.122.201 * make 192.168.122.201 unavailable (in this case, disabled the service) * start 'touch /mnt/tmp/b' in one terminal * 'time ./netfst stop' in second terminal Source of netfst: [root@crackle ~]# cat netfst #!/bin/sh export OCF_RESKEY_name="foo" export OCF_RESKEY_host="192.168.122.201" export OCF_RESKEY_export="/mnt/gfs2" export OCF_RESKEY_mountpoint="/mnt/tmp" export OCF_RESKEY_force_unmount="1" /usr/share/cluster/netfs.sh $1 Pre-patch 'stop' of netfst: [root@crackle ~]# mount | grep /mnt/tmp 192.168.122.201:/mnt/gfs2 on /mnt/tmp type nfs (rw,sync,soft,noac,addr=192.168.122.201) [root@crackle ~]# time ./netfst stop <info> unmounting /mnt/tmp [netfs.sh] unmounting /mnt/tmp umount.nfs: /mnt/tmp: device is busy umount.nfs: /mnt/tmp: device is busy <debug> umount failed: 16 [netfs.sh] umount failed: 16 <warning>Sending SIGTERM to processes on /mnt/tmp [netfs.sh] Sending SIGTERM to processes on /mnt/tmp Cannot stat /mnt/tmp: Input/output error Cannot stat /mnt/tmp: Input/output error Cannot stat /mnt/tmp: Input/output error <info> unmounting /mnt/tmp [netfs.sh] unmounting /mnt/tmp real 15m23.828s user 0m0.162s sys 0m0.495s [root@crackle ~]# echo $? 0 [root@crackle ~]# mount | grep /mnt/tmp [root@crackle ~]# Post-patch results (test build of resource-agents w/ patch): [root@crackle ~]# mount | grep /mnt/tmp 192.168.122.201:/mnt/gfs2 on /mnt/tmp type nfs (rw,sync,soft,noac,addr=192.168.122.201) [root@crackle ~]# rpm -Uvh resource-agents-3.9.2-3.el6.x86_64.rpm Preparing... ########################################### [100%] 1:resource-agents ########################################### [100%] [root@crackle ~]# time ./netfst stop <info> unmounting /mnt/tmp [netfs.sh] unmounting /mnt/tmp umount.nfs: /mnt/tmp: device is busy umount.nfs: /mnt/tmp: device is busy <debug> umount failed: 16 [netfs.sh] umount failed: 16 <warning>Calling 'umount -f /mnt/tmp' [netfs.sh] Calling 'umount -f /mnt/tmp' <info> 192.168.122.201:/mnt/gfs2 is not mounted [netfs.sh] 192.168.122.201:/mnt/gfs2 is not mounted real 3m23.697s user 0m0.154s sys 0m0.305s
Created attachment 516591 [details] Fix
Comment on attachment 516591 [details] Fix Patch was bad, caused regressions in other agents (fs.sh)
Created attachment 517291 [details] Fix, pass 2 Improved patch which does not have the fs.sh regression.
Updated test result: [root@crackle ~]# touch /mnt/tmp/b & [1] 9369 [root@crackle ~]# time ./netfst stop <info> unmounting /mnt/tmp [netfs.sh] unmounting /mnt/tmp umount.nfs: /mnt/tmp: device is busy umount.nfs: /mnt/tmp: device is busy <debug> umount failed: 16 [netfs.sh] umount failed: 16 <warning>Calling 'umount -f /mnt/tmp' [netfs.sh] Calling 'umount -f /mnt/tmp' touch: cannot touch `/mnt/tmp/b': Input/output error <info> 192.168.122.201:/mnt/gfs2 is not mounted [netfs.sh] 192.168.122.201:/mnt/gfs2 is not mounted [1]+ Exit 1 touch /mnt/tmp/b real 3m23.518s user 0m0.157s sys 0m0.269s [root@crackle ~]# echo $? 0 Running regression runs on fs.sh, but I think we're good.
netfs and fs regression runs passed on 3.9.1-3.el6
Patch pushed to upstream master: https://github.com/ClusterLabs/resource-agents/commit/9af820f580691195378cb5bfd58a0a0cdb03802a And posted to cluster-devel for inclusion in RHEL6 branch: https://www.redhat.com/archives/cluster-devel/2011-August/msg00030.html
(In reply to comment #11) > netfs and fs regression runs passed on 3.9.1-3.el6 Where 1 means 2 (3.9.2-3.el6). This build was a test build.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1580.html