+++ This bug was initially created as a clone of Bug #208656 +++ Description of problem: While running a relocation test case (derringer), service relocation from link-14 to link-13 failed because the service could not be stopped on link-14. Version-Release number of selected component (if applicable): rgmanager-1.9.53-0 How reproducible: 20% Steps to Reproduce: 1. Setup a GFS file system as an NFS cluster resource 2. relocate service between nodes 3. wait 4. goto 2 Actual results: Here is the /var/log/messages output from link-14, where the service was being stopped. It looks like rgmanager is trying to umount the file system twice. Sep 29 16:38:53 link-14 clurgmgrd[7302]: <notice> Stopping service nfs_service Sep 29 16:38:53 link-14 clurgmgrd: [7302]: <info> Removing IPv4 address 10.15.89.200 from eth1 Sep 29 16:39:03 link-14 clurgmgrd: [7302]: <info> Removing export: *:/mnt/gfs1 Sep 29 16:39:03 link-14 clurgmgrd: [7302]: <warning> Dropping node-wide NFS lock s Sep 29 16:39:04 link-14 clurgmgrd: [7302]: <info> Sending reclaim notifications via link-14 Sep 29 16:39:04 link-14 rpc.statd[6447]: Version 1.0.6 Starting Sep 29 16:39:04 link-14 rpc.statd[6447]: Flags: No-Daemon Notify-Only Sep 29 16:39:04 link-14 rpc.statd[6447]: statd running as root. chown /tmp/statd-link-14.6400/sm to choose different user Sep 29 16:39:07 link-14 rpc.statd[6447]: Caught signal 15, un-registering and exiting. Sep 29 16:39:07 link-14 clurgmgrd: [7302]: <info> unmounting /dev/mapper/link_ia64-link_ia640 (/mnt/gfs1) Sep 29 16:39:07 link-14 clurgmgrd: [7302]: <notice> Forcefully unmounting /mnt/gfs1 Sep 29 16:39:12 link-14 clurgmgrd: [7302]: <info> unmounting /dev/mapper/link_ia64-link_ia640 (/mnt/gfs1) Sep 29 16:39:12 link-14 clurgmgrd: [7302]: <notice> Forcefully unmounting /mnt/gfs1 Sep 29 16:39:12 link-14 clurgmgrd: [7302]: <err> 'umount /dev/mapper/link_ia64-link_ia640' failed (/mnt/gfs1), error=0 Sep 29 16:39:12 link-14 clurgmgrd[7302]: <notice> stop on clusterfs "link_ia640" returned 2 (invalid argument(s)) Sep 29 16:39:12 link-14 clurgmgrd: [7302]: <info> Removing export: *:/mnt/ext3 Sep 29 16:39:12 link-14 clurgmgrd: [7302]: <info> unmounting /mnt/ext3 Sep 29 16:39:12 link-14 clurgmgrd[7302]: <crit> #12: RG nfs_service failed to stop; intervention required Sep 29 16:39:12 link-14 clurgmgrd[7302]: <notice> Service nfs_service is failed Sep 29 16:39:13 link-14 clurgmgrd[7302]: <alert> #2: Service nfs_service returned failure code. Last Owner: link-14 Sep 29 16:39:13 link-14 clurgmgrd[7302]: <alert> #4: Administrator intervention required. Expected results: The file system should umount and relocate to link-13 as expected. Additional info: -- Additional comment from lhh on 2006-10-05 16:37 EST -- On a second pass, it looks like killing lockd isn't dropping all the locks ... It oculd be just a consistency issue between fs.sh and clusterfs.sh; I'll look in to that first. If it isn't, chances are good that there's an open reference on the file system, preventing umount from suceeding. -- Additional comment from lhh on 2006-10-05 16:40 EST -- We're not killing lockd during umount of the cluster file system. -- Additional comment from lhh on 2006-10-05 16:41 EST -- Created an attachment (id=137860) Adds killing of lockd to clusterfs teardown when nfslock=1
Devel ACK for 5.0.0.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering. This request is not yet committed for inclusion in release.
clusterfs.sh from rhel4 branch copied over which addresses the problem.
Moving all RHCS ver 5 bugs to RHEL 5 so we can remove RHCS v5 which never existed.