Bug 154040

Summary: Kernel RPC client doesn't reuse TCP port when reconnecting
Product: Red Hat Enterprise Linux 4 Reporter: Chuck Lever <cel>
Component: kernelAssignee: Ric Wheeler <rwheeler>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: high    
Version: 4.0CC: davej, jbaron, riel, shillman, steved, xdl-redhat-bugzilla
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-16 17:41:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 430698    

Description Chuck Lever 2005-04-06 18:31:42 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050225 Firefox/1.0.1

Description of problem:
NFS servers use a duplicate reply cache to detect client retransmissions and minimize the risk of replaying non-idempotent RPC requests.  Such reply caches are typically keyed on client IP address, client port, and RPC XID.

If an NFS server or the network causes an RPC over TCP socket to be dropped, the Linux RPC client does not reconnect to the server using the same port number.  This means that any items in the server's DRC that are keyed to the old port number are no longer usable, and the server can't detect replays on the new connection.

Version-Release number of selected component (if applicable):
kernel-2.6.9-5.0.3-EL

How reproducible:
Always

Steps to Reproduce:
1.  Start an I/O workload on an NFS client (TCP mount)
2.  Note the TCP connection information via "netstat"
3.  Cause the connection to drop (temporary server or network outage)
4.  Note the TCP connection again

Actual Results:  The client reconnects to the server using a different port on the new connection than was used on the old one.

Expected Results:  The client should reconnect to the server using the same port, regardless of TCP TIME_WAIT requirements.  (Note that this is the behavior exhibited by the NFS/RPC reference implementation on Solaris).

Additional info:

This is a potential data corruption bug, so I am marking the severity of this bugzilla "High."  This is a problem in all releases of RHEL.

Note that the "expected behavior" is the same as exhibited by the NFS/RPC reference implementation on Solaris.

I am working on a fix for 2.6 mainline.

Comment 3 RHEL Program Management 2007-05-09 11:25:38 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 12 RHEL Program Management 2008-08-03 02:27:41 UTC
Product Management has reviewed and declined this request.  You may appeal this
decision by reopening this request.