Bug 499756 - TCP connection terminated with RST
Summary: TCP connection terminated with RST
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: 4.9
Assignee: Steve Dickson
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-05-08 00:29 UTC by Bikash
Modified: 2018-11-14 18:30 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-08 15:41:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Trace file (16.15 KB, application/octet-stream)
2009-05-08 00:32 UTC, Bikash
no flags Details
Handle RST - attempt 1 (2.26 KB, patch)
2009-10-02 18:26 UTC, Sachin Prabhu
no flags Details | Diff

Description Bikash 2009-05-08 00:29:07 UTC
Description of problem:

When using NFS over TCP, very occasionally it will send RPC calls in such a way that the end of one RPC call contains the header for the next, and the next RPC call begins with data and runs on for a very long time due to the RPC headers being all weird. The filer's response is to a) complain about a nonsense RPC call, b) complain about a too-long RPC call ('nfsd.record.too.long'), and c) Kill the TCP connection to the client with an RST.  


The customer was able to reproduce it "several times in 2 hours" with 70+ clients using a tool to generate large sequential writes. I am attaching the trace from the customer.

They call it out as being present in RHEL4 and in Fedora Core 3/4/5 and kernels from 2.6.9 to 2.6.15. They also report that it no longer occurs as of 2.6.16, so it seems to have been fixed by the Linux folks already.

Is this known issue in the 2.6.9 kernel (RHEL4.x)? Is this something going to be identified and fix targeted in RHEL4.x?

Comment 1 Bikash 2009-05-08 00:32:13 UTC
Created attachment 342955 [details]
Trace file

This the trace that the customer has provided.

Comment 22 Sachin Prabhu 2009-06-29 13:52:46 UTC
The problem with a nfs client not responding quickly to RSTs from the server has been reported at
http://bugzilla.kernel.org/show_bug.cgi?id=11154

The following patch was proposed to fix this problem
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2a9e1cfa23fb62da37739af81127dab5af095d99

Comment 38 Sachin Prabhu 2009-10-02 18:26:08 UTC
Created attachment 363511 [details]
Handle RST - attempt 1

This is the first attempt at backporting the patch from 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2a9e1cfa23fb62da37739af81127dab5af095d99

The patch is _NOT_ KABI safe. 

It adds a function pointer old_error_report to structure rpc_xprt. 

rpx_xprt is not directly exported. However it is passed as a parameter for rpc_create_client() which is exported. Hence the modification to rpc_xprt breaks KABI.

The module compiles fine. However we are not sure if it works as intended since we need a nfsd server which sends a RST packet.


Note You need to log in before you can comment on or make changes to this bug.