Hide Forgot
Description of problem: My nfs share I've setup on RHEL6 Cluster Suite does in fact work... however, it keeps "failing": Jan 18 15:49:50 cdtg-rtp-sun-1 rgmanager[6559]: Recovering failed service service:CDTG-NFS-share Jan 18 15:49:51 cdtg-rtp-sun-1 rgmanager[23610]: Adding export: xxx.18.188.0/25:/data/cluster-storage/ (fsid=40050,rw) Jan 18 15:49:51 cdtg-rtp-sun-1 rgmanager[23683]: Adding IPv4 address xxx.18.188.202/25 to bond0 Jan 18 15:49:54 cdtg-rtp-sun-1 ntpd[6419]: Listening on interface #1969 bond0, xxx.18.188.202#123 Enabled Jan 18 15:49:54 cdtg-rtp-sun-1 rgmanager[6559]: Service service:CDTG-NFS-share started From another machine, I can see that the NFS service is up: #showmount -e xxx.18.188.202 Export list for xxx.18.188.202: /data/cluster-storage xxx.18.188.0/25 ... then about a minute later... Jan 18 15:51:00 cdtg-rtp-sun-1 rgmanager[24355]: nfsclient:CDTG-NFS-Service is missing! Jan 18 15:51:00 cdtg-rtp-sun-1 rgmanager[6559]: status on nfsclient "CDTG-NFS-Service" returned 1 (generic error) Jan 18 15:51:00 cdtg-rtp-sun-1 rgmanager[24412]: Removing export: xxx.18.188.0/25:/data/cluster-storage/ Jan 18 15:51:00 cdtg-rtp-sun-1 rgmanager[24447]: Adding export: xxx.18.188.0/25:/data/cluster-storage/ (fsid=40050,rw) Jan 18 15:51:10 cdtg-rtp-sun-1 rgmanager[6559]: Stopping service service:CDTG-NFS-share Jan 18 15:51:10 cdtg-rtp-sun-1 rgmanager[24658]: Removing IPv4 address xxx.18.188.202/25 from bond0 Jan 18 15:51:12 cdtg-rtp-sun-1 ntpd[6419]: Deleting interface #1969 bond0, xxx.18.188.202#123, interface stats: received=0, sent=0, dropped=0, active_time=78 secs Jan 18 15:51:20 cdtg-rtp-sun-1 rgmanager[24720]: Removing export: xxx.18.188.0/25:/data/cluster-storage/ Jan 18 15:51:20 cdtg-rtp-sun-1 rgmanager[6559]: Service service:CDTG-NFS-share is recovering Jan 18 15:51:24 cdtg-rtp-sun-1 rgmanager[6559]: Service service:CDTG-NFS-share is now running on member 2 ... why is this? The export works, but is continuously failed-over by rgmanager on both nodes! :-( Version-Release number of selected component (if applicable): rgmanager-3.0.12-10.el6.x86_64
Created attachment 474147 [details] sanitized cluster.conf config file
Not sure if this has any bearing on the issue, but I found this similar complaint: http://www.redhat.com/archives/linux-cluster/2010-March/msg00019.html
The xxx.18.188.202 IP address didn't have a hostname before, and and such "clufindhostname -i xxx.18.188.202" would fail with error code 2. I've added a reverse lookup entry for it, and the clufindhostname is much happier now, and exists with error code 0. However, this doesn't stop the failing-over of an otherwise working service from happening.
Ah ha - *** This bug has been marked as a duplicate of bug 661881 ***
clufindhostname is not working right; also, remove the trailing slash from your mount point name in your fs resource.