Description of problem: I have 2 RH-AS2.1 systems clustered. I have 2
NFS services built. When the clients mount the service and I do a
cluster stop on the prime server, the NFS services failover correctly
to the second server and mount points remain intact.
When I do a cluster start on the prime to automatically move the
services back to the prime, both services will not come back
correctly. One of the two will move back to the prime. The clients
report NFS timeouts and hang.
The messages in the cluster.log:
clusvcmgrd: <err> service error Cannot stop filesystems /filesystemx
<crit> Not all services stopped cleanly, reboot needed.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.cluster stop on system1, services failover to system2. verify
2.cluster start on system1. df to check services are back.
1 of 2 services will not return
Expected results: We would expect both services to return to system1
upon restarting the cluster daemons. The services are configured to
return to system1 when that system is back to normal.
If you have not done so already, it would be beneficial to open a
support ticket so that the appropriate information can be obtained to
debug/solve this issue.
BTW - you can also try enabling "force unmount" support. You'll need
to upload your cluster.conf as well.
Not enough data to debug; closing.