Bug 670630

Summary: NFS share is working fine, but keeps failing-over
Product: Red Hat Enterprise Linux 6 Reporter: joshua
Component: resource-agentsAssignee: Marek Grac <mgrac>
Status: CLOSED DUPLICATE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: cluster-maint, djansa, lhh
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-03-18 15:10:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sanitized cluster.conf config file none

Description joshua 2011-01-18 20:59:31 UTC
Description of problem:

My nfs share I've setup on RHEL6 Cluster Suite does in fact work... however, it keeps "failing":

Jan 18 15:49:50 cdtg-rtp-sun-1 rgmanager[6559]: Recovering failed service service:CDTG-NFS-share
Jan 18 15:49:51 cdtg-rtp-sun-1 rgmanager[23610]: Adding export: xxx.18.188.0/25:/data/cluster-storage/ (fsid=40050,rw)
Jan 18 15:49:51 cdtg-rtp-sun-1 rgmanager[23683]: Adding IPv4 address xxx.18.188.202/25 to bond0
Jan 18 15:49:54 cdtg-rtp-sun-1 ntpd[6419]: Listening on interface #1969 bond0, xxx.18.188.202#123 Enabled
Jan 18 15:49:54 cdtg-rtp-sun-1 rgmanager[6559]: Service service:CDTG-NFS-share started

From another machine, I can see that the NFS service is up:
#showmount  -e xxx.18.188.202
Export list for xxx.18.188.202:
/data/cluster-storage xxx.18.188.0/25

... then about a minute later...

Jan 18 15:51:00 cdtg-rtp-sun-1 rgmanager[24355]: nfsclient:CDTG-NFS-Service is missing!
Jan 18 15:51:00 cdtg-rtp-sun-1 rgmanager[6559]: status on nfsclient "CDTG-NFS-Service" returned 1 (generic error)
Jan 18 15:51:00 cdtg-rtp-sun-1 rgmanager[24412]: Removing export: xxx.18.188.0/25:/data/cluster-storage/
Jan 18 15:51:00 cdtg-rtp-sun-1 rgmanager[24447]: Adding export: xxx.18.188.0/25:/data/cluster-storage/ (fsid=40050,rw)
Jan 18 15:51:10 cdtg-rtp-sun-1 rgmanager[6559]: Stopping service service:CDTG-NFS-share
Jan 18 15:51:10 cdtg-rtp-sun-1 rgmanager[24658]: Removing IPv4 address xxx.18.188.202/25 from bond0
Jan 18 15:51:12 cdtg-rtp-sun-1 ntpd[6419]: Deleting interface #1969 bond0, xxx.18.188.202#123, interface stats: received=0, sent=0, dropped=0, active_time=78 secs
Jan 18 15:51:20 cdtg-rtp-sun-1 rgmanager[24720]: Removing export: xxx.18.188.0/25:/data/cluster-storage/
Jan 18 15:51:20 cdtg-rtp-sun-1 rgmanager[6559]: Service service:CDTG-NFS-share is recovering
Jan 18 15:51:24 cdtg-rtp-sun-1 rgmanager[6559]: Service service:CDTG-NFS-share is now running on member 2

... why is this?  The export works, but is continuously failed-over by rgmanager on both nodes! :-(


Version-Release number of selected component (if applicable):

rgmanager-3.0.12-10.el6.x86_64

Comment 2 joshua 2011-01-18 21:05:38 UTC
Created attachment 474147 [details]
sanitized cluster.conf config file

Comment 3 joshua 2011-01-18 22:45:41 UTC
Not sure if this has any bearing on the issue, but I found this similar complaint:

http://www.redhat.com/archives/linux-cluster/2010-March/msg00019.html

Comment 4 joshua 2011-01-18 22:51:05 UTC
The xxx.18.188.202 IP address didn't have a hostname before, and and such "clufindhostname -i xxx.18.188.202" would fail with error code 2.  I've added a reverse lookup entry for it, and the clufindhostname is much happier now, and exists with error code 0.  However, this doesn't stop the failing-over of an otherwise working service from happening.

Comment 5 Lon Hohberger 2011-03-18 15:10:32 UTC
Ah ha -

*** This bug has been marked as a duplicate of bug 661881 ***

Comment 6 Lon Hohberger 2011-03-18 15:11:34 UTC
clufindhostname is not working right; also, remove the trailing slash from your mount point name in your fs resource.