Bug 167571

Summary: Relocating NFS over TCP exports leaves established TCP/IP connections behind
Product: Red Hat Enterprise Linux 4 Reporter: Axel Thimm <axel.thimm>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: hgarcia, lhh
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-17 14:42:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 132823, 167572    
Attachments:
Description Flags
Simple nfs server RA (no UI support for it though). none

Description Axel Thimm 2005-09-05 18:15:15 UTC
Description of problem:
When a typical NFS service is relocated the old node is still maintaining
established TCP/IP connections under the service IP (which has been removed). 

When that node gets the service and service IP relocated back, it will try to
resume that connection. This can lead to DUP/ACK storms (to be filed as a
different bug, bug ID will follow)

Version-Release number of selected component (if applicable):
1.9.34

How reproducible:
always

Steps to Reproduce:
1.mount a clustered NFS export
2.relocate that export
3.Check with netstat on the old node that there are still established TCP/IP
  connections to the client, even though the servide IP has been deleted.
  
Actual results:
netstat shows established TCP/IP connections on the old node after the relocating

Expected results:
ip.sh should abort any established TCP/IP connections upon stopping (is that
possible?)

Additional info:
Shutting down or restarting nfs on the old node after (or during) the relocation
is a workaround to get the old TCP/IP connections defused.

Comment 1 Axel Thimm 2005-09-05 18:17:37 UTC
The DUP/ACK storm bug triggered by this one is filed as bug #167572

Comment 2 Lon Hohberger 2005-09-22 15:11:22 UTC
I've almost got a workaround ready which will stop/start the entire NFS server,
while maintaining the ability to define clients using the cluster GUI.

The caveat is that it won't be useful if you want to run more than one NFS
server in the cluster (assuming two nodes), and it will obviously clobber
non-cluster NFS services when nfsd is brought down.

This may solve your problem.  However, I still have a bit more work to do to
make locking work properly (currently, reclaims are not correctly handled).

Here's a drop-in agent which you can use instead of <nfsexport> which should
work (sans locking for now).


Comment 3 Lon Hohberger 2005-09-22 15:12:45 UTC
Created attachment 119144 [details]
Simple nfs server RA (no UI support for it though).

Comment 4 Lon Hohberger 2005-10-04 21:27:53 UTC
Assigning to NFS maintainer, but staying on CC list for now.

Comment 8 Lon Hohberger 2005-11-08 14:56:43 UTC
Axel, our NFS maintainer fixed a reference count problem which caused problems
during failover, and the fix was released with the most recent kernel errata.

While this was not the only problem with NFS failover at the moment, the fix in
the latest errata may also fix the TCP connection / dup ACK problems.

Can you please reproduce the problem and let us know if it still happens?

(this and: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=167572 )