Bug 674508 - rgmanager does not detect when a remote nfs share ressources is not available
Summary: rgmanager does not detect when a remote nfs share ressources is not available
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: rgmanager
Version: 5.8
Hardware: All
OS: All
unspecified
medium
Target Milestone: rc
: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-02-02 08:50 UTC by Pierre Amadio
Modified: 2018-11-14 16:07 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-02-02 16:16:45 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Pierre Amadio 2011-02-02 08:50:22 UTC
* Description of problem:

When using netfs in a cluster to mount a NFS export from a third server, if the nfs share get disconnected or if the nfs server crash, rgmanager does not notice that the export is not available anymore because the mountpoint remains mounted, and the "status" check just checks if it is listed in the output of the command "mount". 


* How reproducible:

Ttwo nodes cluster, mounting a NFS share as a service from a third server.
All servers virtualized, and updated up to 5.6. 

<?xml version="1.0"?>
<cluster config_version="5" name="rhcs5">
	<fence_daemon post_fail_delay="0" post_join_delay="3"/>
	<clusternodes>
		<clusternode name="jc_mycluster1" nodeid="1" votes="1">
			<fence/>
		</clusternode>
		<clusternode name="jc_mycluster2" nodeid="2" votes="1">
			<fence/>
		</clusternode>
	</clusternodes>
	<cman expected_votes="1" two_node="1"/>
	<fencedevices/>
	<rm>
		<failoverdomains/>
		<resources>
			<netfs export="/nfstest" force_unmount="1" fstype="nfs" host="192.168.122.129" mountpoint="/data/nfstest" name="nfstest_data" options="rw,sync"/>
		</resources>
		<service autostart="1" exclusive="0" name="nfs_client_test" recovery="relocate">
			<netfs ref="nfstest_data"/>
		</service>
	</rm>
</cluster>


1. Start the service that contains the netfs resource.
2. Disconnect the NFS server that exports the mountpoint to the cluster from the nodes, either using iptables, ifconfig down, or switching it off.
3. Check in the nodes the status of the service with the command 'clustat'.
  
Actual results:
The command 'clustat' will show it with "State" "started", and no failover will occur because the result of "status/monitor" will not return error.
Any command trying to access the mountpoint will hang.

Expected results:
When the server gets disconnected from the nodes, rgmanager should be able to detect it is unaccessible and stop the service.

Comment 1 Lon Hohberger 2011-02-02 16:16:45 UTC
NFS by default never returns I/O errors to userspace applications; this means that all things accessing the mount point (including rgmanager) will hang.  As such, you have two options:

  Add these options to netfs *resource definition*:

    options="soft,tcp"       

  Add this directive to netfs *reference*:

    __enforce_timeouts="1"

You can tune the soft I/O timeout using retrans=x and timeo=x nfs options (see nfs(5)).  The rgmanager __enforce_timeouts directive is not likely to actually resolve the issue; it will cause the service to go in to recovery but the mount point may fail to unmount when rgmanager brings the service down.


Note You need to log in before you can comment on or make changes to this bug.