From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8a5) Gecko/20041122 Description of problem: When trying to failover a cluster service from one member to another using redhat-config-cluster it fails. This cluster service includes a shared disk partition. Attached is a portion of the messages file from the member currently running the service. It shows the umount problem. This problem just started after applying updates to the cluster members. Version-Release number of selected component (if applicable): clumanager-1.2.28-1 How reproducible: Always Steps to Reproduce: 1.use redhat-config-cluster to move the service from one member to another 2. 3. Actual Results: See attached log Expected Results: Service should have started up on other cluster member Additional info:
Created attachment 122454 [details] portion of messages file
Please file a request with Red Hat Support as well: http://www.redhat.com/apps/support/ They will ask for a full sysreport, which will be invaluable aid in diagnosing the problem. Chances are good that this is more related to a recent kernel or nfs-utils update than a clumanager update (given that the file system mounting/unmounting code has not changed in quite some time...). This information will also be helpful: - output of 'lsof -bn' when the problem occurs - output of 'fuser -vm /home2' when the problem occurs If you do not have a support contract, please add the following to this bugzilla: - /etc/cluster.xml - /var/lib/nfs/rmtab - /proc/fs/nfs/exports - output of 'rpm -qa | grep kernel' - output of 'rpm -q nfs-utils - output of 'rpm -q clumanager' - If possible, the last version of clumanager and recent kernels you were running. Thank you!
Created attachment 122462 [details] cluster.xml
Created attachment 122463 [details] rmtab
Created attachment 122464 [details] exports
output of 'rpm -qa | grep kernel' is: kernel-2.4.21-27.0.2.EL kernel-2.4.21-32.0.1.EL kernel-smp-2.4.21-27.0.4.EL kernel-source-2.4.21-32.0.1.EL kernel-2.4.21-4.0.2.EL kernel-smp-2.4.21-27.0.2.EL kernel-smp-2.4.21-32.0.1.EL kernel-pcmcia-cs-3.1.31-13 kernel-2.4.21-27.0.4.EL kernel-utils-2.4-8.37.12 output of 'rpm-q nfs-utils' is: nfs-utils-1.0.6-42EL output of 'rpm -q clumanager' is: clumanager-1.2.28-1 I don't have the last version of clumanager. The kernel was actually not updated at this time. The current kernel is (uname -a): Linux email1.norco.com 2.4.21-32.0.1.ELsmp #1 SMP Tue May 17 17:52:23 EDT 2005 i686 i686 i386 GNU/Linux I got the output of 'lsof -bn' and 'fuser -vm /home2' by doing a 'disable' of the email service through redhat-config-cluster. (Disabling of the service seems to produce the same result as moving the service to another cluster member)
Created attachment 122465 [details] lsof -bn
'fuser -vm /home2' produces no output
Yesterday morning we started testing a cluster and ran into the same problem. NFS does not let go of the device mounted if a client has already mounted it. If no clients mount it, then things "work". As Norco stated above, once nfs is killed or stopped, the device can be unmounted normally.
There is a known problem with the -42 release of nfs-utils which causes exportfs to not correctly remove exports; a fix should be coming out shortly. As an alternative, you can use an older release of the nfs-utils package to work around the problem.
I went back to nfs-utils-1.0.6-33EL. That fixed the problem for now. Thanks.
Great -- closing NOTABUG for now (it's not a clumanager bug) This should be fixed when nfs-utils-1.0.6-43EL goes out after testing.