Bug 167571

Summary:

Relocating NFS over TCP exports leaves established TCP/IP connections behind

Product:

Red Hat Enterprise Linux 4

Reporter:

Axel Thimm <Axel.Thimm>

Component:

kernel

Assignee:

Steve Dickson <steved>

Status:

CLOSED INSUFFICIENT_DATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

4.0

CC:

hgarcia, lhh

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2006-10-17 14:42:37 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

132823, 167572

Attachments:

Description	Flags
Simple nfs server RA (no UI support for it though).	none

Description Axel Thimm 2005-09-05 18:15:15 UTC

Description of problem:
When a typical NFS service is relocated the old node is still maintaining
established TCP/IP connections under the service IP (which has been removed). 

When that node gets the service and service IP relocated back, it will try to
resume that connection. This can lead to DUP/ACK storms (to be filed as a
different bug, bug ID will follow)

Version-Release number of selected component (if applicable):
1.9.34

How reproducible:
always

Steps to Reproduce:
1.mount a clustered NFS export
2.relocate that export
3.Check with netstat on the old node that there are still established TCP/IP
  connections to the client, even though the servide IP has been deleted.
  
Actual results:
netstat shows established TCP/IP connections on the old node after the relocating

Expected results:
ip.sh should abort any established TCP/IP connections upon stopping (is that
possible?)

Additional info:
Shutting down or restarting nfs on the old node after (or during) the relocation
is a workaround to get the old TCP/IP connections defused.

Comment 1 Axel Thimm 2005-09-05 18:17:37 UTC

The DUP/ACK storm bug triggered by this one is filed as bug #167572

Comment 2 Lon Hohberger 2005-09-22 15:11:22 UTC

I've almost got a workaround ready which will stop/start the entire NFS server,
while maintaining the ability to define clients using the cluster GUI.

The caveat is that it won't be useful if you want to run more than one NFS
server in the cluster (assuming two nodes), and it will obviously clobber
non-cluster NFS services when nfsd is brought down.

This may solve your problem.  However, I still have a bit more work to do to
make locking work properly (currently, reclaims are not correctly handled).

Here's a drop-in agent which you can use instead of <nfsexport> which should
work (sans locking for now).

Comment 3 Lon Hohberger 2005-09-22 15:12:45 UTC

Created attachment 119144 [details]
Simple nfs server RA (no UI support for it though).

Comment 4 Lon Hohberger 2005-10-04 21:27:53 UTC

Assigning to NFS maintainer, but staying on CC list for now.

Comment 8 Lon Hohberger 2005-11-08 14:56:43 UTC

Axel, our NFS maintainer fixed a reference count problem which caused problems
during failover, and the fix was released with the most recent kernel errata.

While this was not the only problem with NFS failover at the moment, the fix in
the latest errata may also fix the TCP connection / dup ACK problems.

Can you please reproduce the problem and let us know if it still happens?

(this and: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=167572 )