499756 – TCP connection terminated with RST

Bug 499756 - TCP connection terminated with RST

Summary: TCP connection terminated with RST

Keywords:
Status:	CLOSED CANTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	4.9
Assignee:	Steve Dickson
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-05-08 00:29 UTC by Bikash
Modified:	2018-11-14 18:30 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-12-08 15:41:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Trace file (16.15 KB, application/octet-stream) 2009-05-08 00:32 UTC, Bikash	no flags	Details
Handle RST - attempt 1 (2.26 KB, patch) 2009-10-02 18:26 UTC, Sachin Prabhu	no flags	Details \| Diff
Show Obsolete (1) View All

Description Bikash 2009-05-08 00:29:07 UTC

Description of problem:

When using NFS over TCP, very occasionally it will send RPC calls in such a way that the end of one RPC call contains the header for the next, and the next RPC call begins with data and runs on for a very long time due to the RPC headers being all weird. The filer's response is to a) complain about a nonsense RPC call, b) complain about a too-long RPC call ('nfsd.record.too.long'), and c) Kill the TCP connection to the client with an RST.  


The customer was able to reproduce it "several times in 2 hours" with 70+ clients using a tool to generate large sequential writes. I am attaching the trace from the customer.

They call it out as being present in RHEL4 and in Fedora Core 3/4/5 and kernels from 2.6.9 to 2.6.15. They also report that it no longer occurs as of 2.6.16, so it seems to have been fixed by the Linux folks already.

Is this known issue in the 2.6.9 kernel (RHEL4.x)? Is this something going to be identified and fix targeted in RHEL4.x?

Comment 1 Bikash 2009-05-08 00:32:13 UTC

Created attachment 342955 [details]
Trace file

This the trace that the customer has provided.

Comment 22 Sachin Prabhu 2009-06-29 13:52:46 UTC

The problem with a nfs client not responding quickly to RSTs from the server has been reported at
http://bugzilla.kernel.org/show_bug.cgi?id=11154

The following patch was proposed to fix this problem
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2a9e1cfa23fb62da37739af81127dab5af095d99

Comment 38 Sachin Prabhu 2009-10-02 18:26:08 UTC

Created attachment 363511 [details]
Handle RST - attempt 1

This is the first attempt at backporting the patch from 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2a9e1cfa23fb62da37739af81127dab5af095d99

The patch is _NOT_ KABI safe. 

It adds a function pointer old_error_report to structure rpc_xprt. 

rpx_xprt is not directly exported. However it is passed as a parameter for rpc_create_client() which is exported. Hence the modification to rpc_xprt breaks KABI.

The module compiles fine. However we are not sure if it works as intended since we need a nfsd server which sends a RST packet.

Note You need to log in before you can comment on or make changes to this bug.