57189 – tcp receive window on Oracle socket dwindles to 0

Bug 57189 - tcp receive window on Oracle socket dwindles to 0

Summary: tcp receive window on Oracle socket dwindles to 0

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.1
Hardware:	i386
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2001-12-06 18:06 UTC by Need Real Name
Modified:	2005-10-31 22:00 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2003-02-13 18:10:18 UTC
Embargoed:

Attachments	(Terms of Use)

Description Need Real Name 2001-12-06 18:06:55 UTC

Version-Release number of selected component (if applicable):
2.4.9-6enterprise

How Reproducible:
Somewhat regularly.

Steps to Reproduce:
1. Run application
2. Observe tcpdump show the window size fall to 0 bytes

Description of Problem:
Application does a large query on an Oracle database. Soon, application 
goes almost totally idle. An strace of the process will show the 
application doing 
read(10,"U",2000)=1
read(10,"N",1999)=1
etc
getting 1 byte at a time from the socket. /proc/pid/fd/10 is a socket.

Looking further into this, we ran tcpdump on the communications between 
the box and the oracle server. The dump starts off normal, with the 
solaris tcp receive window constant at 24616. The linux window starts at 
about 8k, and quickly rises to ~50-64k

At a somewhat random occurence, the tcp receive window on the linux box 
starts to fall quickly, eventually becoming 0. The app then starts reading 
the data byte by byte. Once the particular query is completed, a new 
socket is opened, and the receive window is reset to normal behaviour. It 
then can re-occur during a later data connection.

All other network traffic to and from the box seems normal.

This is using Oracle client 8.1.7.2. The network card is an intel 
etherexpress 100. The problem vanishes if we use 2.4.15aa1 custom kernel

Other things we tried in diagnosing the problem:
Replace eepro100 driver with e100, same problem
Add an additional Intel ethernet card to the box on the same vlan as the 
oracle server, same problem. This eliminated network card, network cables, 
and cat port as problems.
There were no routing problems. The linux box talked directly to the 
oracle box without traversing any routers, and the oracle box replied the 
same. There are no router hop between the boxes.
The duplex settings are correct, set at 100-FD.
There are no errors on any of the cisco counters for either the linux box 
or the solaris box.
Using 2.4.15aa1 solves our problem. The app was run 3 times for test, and 
it completed successfully last night.

Comment 1 Arjan van de Ven 2001-12-06 19:43:09 UTC

Fix added to the kernel; kernel 2.4.9-17.6 or later have this fix.

Comment 2 Matt Domsch 2003-02-13 18:10:18 UTC

Fixed a year ago, new kernels include the fix.  Closing.

Note You need to log in before you can comment on or make changes to this bug.