Bug 154040 - Kernel RPC client doesn't reuse TCP port when reconnecting
Summary: Kernel RPC client doesn't reuse TCP port when reconnecting
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Ric Wheeler
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 430698
TreeView+ depends on / blocked
 
Reported: 2005-04-06 18:31 UTC by Chuck Lever
Modified: 2010-03-16 17:41 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-16 17:41:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Chuck Lever 2005-04-06 18:31:42 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050225 Firefox/1.0.1

Description of problem:
NFS servers use a duplicate reply cache to detect client retransmissions and minimize the risk of replaying non-idempotent RPC requests.  Such reply caches are typically keyed on client IP address, client port, and RPC XID.

If an NFS server or the network causes an RPC over TCP socket to be dropped, the Linux RPC client does not reconnect to the server using the same port number.  This means that any items in the server's DRC that are keyed to the old port number are no longer usable, and the server can't detect replays on the new connection.

Version-Release number of selected component (if applicable):
kernel-2.6.9-5.0.3-EL

How reproducible:
Always

Steps to Reproduce:
1.  Start an I/O workload on an NFS client (TCP mount)
2.  Note the TCP connection information via "netstat"
3.  Cause the connection to drop (temporary server or network outage)
4.  Note the TCP connection again

Actual Results:  The client reconnects to the server using a different port on the new connection than was used on the old one.

Expected Results:  The client should reconnect to the server using the same port, regardless of TCP TIME_WAIT requirements.  (Note that this is the behavior exhibited by the NFS/RPC reference implementation on Solaris).

Additional info:

This is a potential data corruption bug, so I am marking the severity of this bugzilla "High."  This is a problem in all releases of RHEL.

Note that the "expected behavior" is the same as exhibited by the NFS/RPC reference implementation on Solaris.

I am working on a fix for 2.6 mainline.

Comment 3 RHEL Program Management 2007-05-09 11:25:38 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 12 RHEL Program Management 2008-08-03 02:27:41 UTC
Product Management has reviewed and declined this request.  You may appeal this
decision by reopening this request.


Note You need to log in before you can comment on or make changes to this bug.