Description of problem: When a typical NFS service is relocated the old node is still maintaining established TCP/IP connections under the service IP (which has been removed). When that node gets the service and service IP relocated back, it will try to resume that connection. This can lead to DUP/ACK storms (to be filed as a different bug, bug ID will follow) Version-Release number of selected component (if applicable): 1.9.34 How reproducible: always Steps to Reproduce: 1.mount a clustered NFS export 2.relocate that export 3.Check with netstat on the old node that there are still established TCP/IP connections to the client, even though the servide IP has been deleted. Actual results: netstat shows established TCP/IP connections on the old node after the relocating Expected results: ip.sh should abort any established TCP/IP connections upon stopping (is that possible?) Additional info: Shutting down or restarting nfs on the old node after (or during) the relocation is a workaround to get the old TCP/IP connections defused.
The DUP/ACK storm bug triggered by this one is filed as bug #167572
I've almost got a workaround ready which will stop/start the entire NFS server, while maintaining the ability to define clients using the cluster GUI. The caveat is that it won't be useful if you want to run more than one NFS server in the cluster (assuming two nodes), and it will obviously clobber non-cluster NFS services when nfsd is brought down. This may solve your problem. However, I still have a bit more work to do to make locking work properly (currently, reclaims are not correctly handled). Here's a drop-in agent which you can use instead of <nfsexport> which should work (sans locking for now).
Created attachment 119144 [details] Simple nfs server RA (no UI support for it though).
Assigning to NFS maintainer, but staying on CC list for now.
Axel, our NFS maintainer fixed a reference count problem which caused problems during failover, and the fix was released with the most recent kernel errata. While this was not the only problem with NFS failover at the moment, the fix in the latest errata may also fix the TCP connection / dup ACK problems. Can you please reproduce the problem and let us know if it still happens? (this and: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=167572 )