Description of problem:
When a typical NFS service is relocated the old node is still maintaining
established TCP/IP connections under the service IP (which has been removed).
When that node gets the service and service IP relocated back, it will try to
resume that connection. This can lead to DUP/ACK storms (to be filed as a
different bug, bug ID will follow)
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.mount a clustered NFS export
2.relocate that export
3.Check with netstat on the old node that there are still established TCP/IP
connections to the client, even though the servide IP has been deleted.
netstat shows established TCP/IP connections on the old node after the relocating
ip.sh should abort any established TCP/IP connections upon stopping (is that
Shutting down or restarting nfs on the old node after (or during) the relocation
is a workaround to get the old TCP/IP connections defused.
The DUP/ACK storm bug triggered by this one is filed as bug #167572
I've almost got a workaround ready which will stop/start the entire NFS server,
while maintaining the ability to define clients using the cluster GUI.
The caveat is that it won't be useful if you want to run more than one NFS
server in the cluster (assuming two nodes), and it will obviously clobber
non-cluster NFS services when nfsd is brought down.
This may solve your problem. However, I still have a bit more work to do to
make locking work properly (currently, reclaims are not correctly handled).
Here's a drop-in agent which you can use instead of <nfsexport> which should
work (sans locking for now).
Created attachment 119144 [details]
Simple nfs server RA (no UI support for it though).
Assigning to NFS maintainer, but staying on CC list for now.
Axel, our NFS maintainer fixed a reference count problem which caused problems
during failover, and the fix was released with the most recent kernel errata.
While this was not the only problem with NFS failover at the moment, the fix in
the latest errata may also fix the TCP connection / dup ACK problems.
Can you please reproduce the problem and let us know if it still happens?
(this and: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=167572 )