167571 – Relocating NFS over TCP exports leaves established TCP/IP connections behind

Bug 167571 - Relocating NFS over TCP exports leaves established TCP/IP connections behind

Summary: Relocating NFS over TCP exports leaves established TCP/IP connections behind

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Steve Dickson
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	RHEL4NFSFailover 167572
TreeView+	depends on / blocked

Reported:	2005-09-05 18:15 UTC by Axel Thimm
Modified:	2008-08-02 23:40 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-10-17 14:42:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Simple nfs server RA (no UI support for it though). (5.02 KB, text/plain) 2005-09-22 15:12 UTC, Lon Hohberger	no flags	Details
View All

Description Axel Thimm 2005-09-05 18:15:15 UTC

Description of problem:
When a typical NFS service is relocated the old node is still maintaining
established TCP/IP connections under the service IP (which has been removed). 

When that node gets the service and service IP relocated back, it will try to
resume that connection. This can lead to DUP/ACK storms (to be filed as a
different bug, bug ID will follow)

Version-Release number of selected component (if applicable):
1.9.34

How reproducible:
always

Steps to Reproduce:
1.mount a clustered NFS export
2.relocate that export
3.Check with netstat on the old node that there are still established TCP/IP
  connections to the client, even though the servide IP has been deleted.
  
Actual results:
netstat shows established TCP/IP connections on the old node after the relocating

Expected results:
ip.sh should abort any established TCP/IP connections upon stopping (is that
possible?)

Additional info:
Shutting down or restarting nfs on the old node after (or during) the relocation
is a workaround to get the old TCP/IP connections defused.

Comment 1 Axel Thimm 2005-09-05 18:17:37 UTC

The DUP/ACK storm bug triggered by this one is filed as bug #167572

Comment 2 Lon Hohberger 2005-09-22 15:11:22 UTC

I've almost got a workaround ready which will stop/start the entire NFS server,
while maintaining the ability to define clients using the cluster GUI.

The caveat is that it won't be useful if you want to run more than one NFS
server in the cluster (assuming two nodes), and it will obviously clobber
non-cluster NFS services when nfsd is brought down.

This may solve your problem.  However, I still have a bit more work to do to
make locking work properly (currently, reclaims are not correctly handled).

Here's a drop-in agent which you can use instead of <nfsexport> which should
work (sans locking for now).

Comment 3 Lon Hohberger 2005-09-22 15:12:45 UTC

Created attachment 119144 [details]
Simple nfs server RA (no UI support for it though).

Comment 4 Lon Hohberger 2005-10-04 21:27:53 UTC

Assigning to NFS maintainer, but staying on CC list for now.

Comment 8 Lon Hohberger 2005-11-08 14:56:43 UTC

Axel, our NFS maintainer fixed a reference count problem which caused problems
during failover, and the fix was released with the most recent kernel errata.

While this was not the only problem with NFS failover at the moment, the fix in
the latest errata may also fix the TCP connection / dup ACK problems.

Can you please reproduce the problem and let us know if it still happens?

(this and: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=167572 )

Note You need to log in before you can comment on or make changes to this bug.