This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 499756 - TCP connection terminated with RST
TCP connection terminated with RST
Status: CLOSED CANTFIX
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
All Linux
high Severity high
: rc
: 4.9
Assigned To: Steve Dickson
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-05-07 20:29 EDT by Bikash
Modified: 2011-12-08 10:41 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-12-08 10:41:16 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Trace file (16.15 KB, application/octet-stream)
2009-05-07 20:32 EDT, Bikash
no flags Details
Handle RST - attempt 1 (2.26 KB, patch)
2009-10-02 14:26 EDT, Sachin Prabhu
no flags Details | Diff

  None (edit)
Description Bikash 2009-05-07 20:29:07 EDT
Description of problem:

When using NFS over TCP, very occasionally it will send RPC calls in such a way that the end of one RPC call contains the header for the next, and the next RPC call begins with data and runs on for a very long time due to the RPC headers being all weird. The filer's response is to a) complain about a nonsense RPC call, b) complain about a too-long RPC call ('nfsd.record.too.long'), and c) Kill the TCP connection to the client with an RST.  


The customer was able to reproduce it "several times in 2 hours" with 70+ clients using a tool to generate large sequential writes. I am attaching the trace from the customer.

They call it out as being present in RHEL4 and in Fedora Core 3/4/5 and kernels from 2.6.9 to 2.6.15. They also report that it no longer occurs as of 2.6.16, so it seems to have been fixed by the Linux folks already.

Is this known issue in the 2.6.9 kernel (RHEL4.x)? Is this something going to be identified and fix targeted in RHEL4.x?
Comment 1 Bikash 2009-05-07 20:32:13 EDT
Created attachment 342955 [details]
Trace file

This the trace that the customer has provided.
Comment 22 Sachin Prabhu 2009-06-29 09:52:46 EDT
The problem with a nfs client not responding quickly to RSTs from the server has been reported at
http://bugzilla.kernel.org/show_bug.cgi?id=11154

The following patch was proposed to fix this problem
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2a9e1cfa23fb62da37739af81127dab5af095d99
Comment 38 Sachin Prabhu 2009-10-02 14:26:08 EDT
Created attachment 363511 [details]
Handle RST - attempt 1

This is the first attempt at backporting the patch from 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2a9e1cfa23fb62da37739af81127dab5af095d99

The patch is _NOT_ KABI safe. 

It adds a function pointer old_error_report to structure rpc_xprt. 

rpx_xprt is not directly exported. However it is passed as a parameter for rpc_create_client() which is exported. Hence the modification to rpc_xprt breaks KABI.

The module compiles fine. However we are not sure if it works as intended since we need a nfsd server which sends a RST packet.

Note You need to log in before you can comment on or make changes to this bug.