Bug 190644

Summary: TCP broken after update to 2.6.16-1.2107_FC5
Product: [Fedora] Fedora Reporter: Stanis Trendelenburg <stanis.trendelenburg>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 5CC: ahough, bitmage, dac, dowdle, pfrields, rjp_rhb, stefan.hoelldampf, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-05-06 23:59:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Stanis Trendelenburg 2006-05-04 07:45:40 UTC
Description of problem:
After upgrading to the latest kernel, TCP stops working.

Version-Release number of selected component (if applicable):
kernel-2.6.16-1.2107_FC5

Steps to Reproduce:
1. update to kernel 2.6.16-1.2107_FC5
2. try to connect to anly local or remote TCP service
3. wait
  
Actual results:
nothing happens (timeout)

Expected results:
something happens

Additional info:
* This is not a HW problem. Connections to/from localhost are affected in the
same way as to/from other hosts. Collision and Error counters on all interfaces
(including lo) are 0.
* ICMP and UDP seem to work normal
* tcpdump shows that not all services are equally affected: http connections are
stalled right after the initial handshake, imap or ssh after about 70-100
packets have been exchanged.
* everything works as expected when rebooting to 2.6.16-1.2080_FC5

Comment 1 Jörgen Jonsson 2006-05-04 10:59:18 UTC
I got the same problem also with 2.6.16-1.2108_FC5

Comment 2 Brian Daniels 2006-05-04 14:24:33 UTC
Also happening here on a AMD64X2, nforce chipset with 2107.  Reverting to
previous kernel corrects.

Comment 3 Russ Price 2006-05-04 18:25:18 UTC
Happens here with 2107 on AMD64X2, NForce C51 chipset (Shuttle SN21G5 system).

Going back to 2096 fixes the problem.

Comment 4 Scott Dowdle 2006-05-04 18:59:24 UTC
Same thing for me on the following equipment:

[root@scott ~]# lspci
00:00.0 Host bridge: Intel Corporation 82845 845 (Brookdale) Chipset Host Bridge
(rev 03)
00:01.0 PCI bridge: Intel Corporation 82845 845 (Brookdale) Chipset AGP Bridge
(rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 12)
00:1f.0 ISA bridge: Intel Corporation 82801BA ISA Bridge (LPC) (rev 12)
00:1f.1 IDE interface: Intel Corporation 82801BA IDE U100 (rev 12)
00:1f.2 USB Controller: Intel Corporation 82801BA/BAM USB (Hub #1) (rev 12)
00:1f.3 SMBus: Intel Corporation 82801BA/BAM SMBus (rev 12)
00:1f.4 USB Controller: Intel Corporation 82801BA/BAM USB (Hub #2) (rev 12)
00:1f.5 Multimedia audio controller: Intel Corporation 82801BA/BAM AC'97 Audio
(rev 12)
01:00.0 VGA compatible controller: nVidia Corporation NV11 [GeForce2 MX/MX 400]
(rev b2)
02:09.0 Ethernet controller: National Semiconductor Corporation DP83815
(MacPhyter) Ethernet Controller

Comment 5 Scott Dowdle 2006-05-04 19:02:38 UTC
Just to clarify: I *AM NOT* using the proprietary nVidida driver.  Just the
stock Xorg which works fine (without Direct Rendering).  All stock kernel and
modules.  No taint. :)

Comment 6 Mike 2006-05-04 19:45:26 UTC
"me too"

some traffic works fine but other stalls after few kilobytes send/received

# lspci
00:00.0 Host bridge: nVidia Corporation nForce2 AGP (different version?) (rev a2)
00:00.1 RAM memory: nVidia Corporation nForce2 Memory Controller 1 (rev a2)
00:00.2 RAM memory: nVidia Corporation nForce2 Memory Controller 4 (rev a2)
00:00.3 RAM memory: nVidia Corporation nForce2 Memory Controller 3 (rev a2)
00:00.4 RAM memory: nVidia Corporation nForce2 Memory Controller 2 (rev a2)
00:00.5 RAM memory: nVidia Corporation nForce2 Memory Controller 5 (rev a2)
00:01.0 ISA bridge: nVidia Corporation nForce2 ISA Bridge (rev a4)
00:01.1 SMBus: nVidia Corporation nForce2 SMBus (MCP) (rev a2)
00:02.0 USB Controller: nVidia Corporation nForce2 USB Controller (rev a4)
00:02.1 USB Controller: nVidia Corporation nForce2 USB Controller (rev a4)
00:02.2 USB Controller: nVidia Corporation nForce2 USB Controller (rev a4)
00:04.0 Ethernet controller: nVidia Corporation nForce2 Ethernet Controller (rev a1)
00:06.0 Multimedia audio controller: nVidia Corporation nForce2 AC97 Audio
Controler (MCP) (rev a1)
00:08.0 PCI bridge: nVidia Corporation nForce2 External PCI Bridge (rev a3)
00:09.0 IDE interface: nVidia Corporation nForce2 IDE (rev a2)
00:1e.0 PCI bridge: nVidia Corporation nForce2 AGP (rev a2)
02:00.0 VGA compatible controller: ATI Technologies Inc RV280 [Radeon 9200] (rev 01)
02:00.1 Display controller: ATI Technologies Inc RV280 [Radeon 9200] (Secondary)
(rev 01)

no taint


Comment 7 David A. Cafaro 2006-05-04 19:53:08 UTC
I can report I have the same issue on my Pentium 4 with an Intel 8254OEM Gigabit
Adapter.  Here is the lspci:

$ lspci
00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub
Interface (rev 02)
00:01.0 PCI bridge: Intel Corporation 82865G/PE/P PCI to AGP Controller (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI
Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface
Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller
(rev 02)
00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02)
00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER (ICH5/ICH5R)
AC'97 Audio Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation NV18 [GeForce4 MX 440 AGP
8x] (rev a2)
02:07.0 VGA compatible controller: nVidia Corporation NV17 [GeForce4 MX 440]
(rev a3)
02:0c.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet
Controller (rev 02)

Pings work, TCP traffic dies very early in the session.  Reverted to previous
kernel and all works fine.

Comment 8 Pawel Salek 2006-05-04 21:54:52 UTC
"Me too" - apache with this kernel displays only very simple web pages. There is
no response to requests that should return more data that few kB. Reproducible
even at "telnet localhost 80" level. Reverting to 2.6.16-1.2096_FC5 makes the
problem go away.

Comment 9 Assen Totin 2006-05-04 22:32:14 UTC
Two more cases: 

1. A fetchmail-to-qmail connection (worked perfctly before upgrade to 2107) now
hangs,leaving the qmail socket open an deventually timing out. Telnet to port 25
works fine, qmail accepts and delivers message. Fetchmail on its own also
retrieves mail fine.. but when it comes to delivering it further via socket -
nothing (via "qmail-inject" wtil works :)

2. CUPS starts behaving weird - when a request is sent froma locat application
(say, lpr) to a remote printer, cups receives the job, writes some log entries,
but neither spools is - nor attempts to connect to remote print server. 

And, of sourse, the rhgb - has a separate open bug, it is traced to a possible
UNIX socket problem... so could this be some misbehaviousr of the kernel in
regards of sockets?


Comment 10 Mike 2006-05-04 23:11:49 UTC
... for example, I can get all mail by fetchmail, but if I use stunnel then
connection stalls at last line

01:22:00.114151 IP (tos 0x0, ttl 128, id 27412, offset 0, flags [DF], proto: TCP
(6), length: 40) my-ip.37875 > remote-ip.995: ., cksum 0x9082 (correct), ack
28951 win 15624
01:22:00.124821 IP (tos 0x0, ttl 119, id 62339, offset 0, flags [DF], proto: TCP
(6), length: 1452) remote-ip.995 > my-ip.37875: . 28951:30363(1412) ack 699 win 5840
01:22:00.124874 IP (tos 0x0, ttl 128, id 27413, offset 0, flags [DF], proto: TCP
(6), length: 40) my-ip.37875 > remote-ip.995: ., cksum 0x8973 (correct), ack
30363 win 16019
01:22:00.135950 IP (tos 0x0, ttl 119, id 62340, offset 0, flags [DF], proto: TCP
(6), length: 1452) remote-ip.995 > my-ip.37875: . 30363:31775(1412) ack 699 win 5840
01:22:00.135982 IP (tos 0x0, ttl 128, id 27414, offset 0, flags [DF], proto: TCP
(6), length: 40) my-ip.37875 > remote-ip.995: ., cksum 0x83ef (correct), ack
31775 win 16019
01:22:00.143790 IP (tos 0x0, ttl 119, id 62341, offset 0, flags [DF], proto: TCP
(6), length: 1102) remote-ip.995 > my-ip.37875: P 31775:32837(1062) ack 699 win 5840
01:22:00.143842 IP (tos 0x0, ttl 128, id 27415, offset 0, flags [DF], proto: TCP
(6), length: 40) my-ip.37875 > remote-ip.995: ., cksum 0x7fc9 (correct), ack
32837 win 16019


Comment 11 J. Adam Hough 2006-05-05 00:32:17 UTC
I am getting the same problems with this kernel 2.6.16-1.2107_FC5 on a IBM t41p
which uses the e1000 driver and on my home system which uses the forcedeth
driver  I however add a twist to this problem in that on my two work machines
with this same kernel using the tg3 driver do work.  My machines at work are on
the LSU network and my home machines are on the cox network here in Baton Rouge.
 I have not tried using my laptop at owrk with the lastest kernel.

From my home I am able to do dns lookups pings and the initial http handshake.

Reverting to the previous kernel 2.6.16-1.2096_FC5 does restore connectivity.

Comment 12 David Highley 2006-05-05 03:42:29 UTC
I can confirm that it appears to be network related and it does not matter
whether you define the network paramters or use dhcp.

Note with what appears to be a new update process of removing previous kernels
leaving one previous; if we get two bad kernels in a row the systems will be toast.

Comment 13 Brian Daniels 2006-05-05 12:54:47 UTC
>if we get two bad kernels in a row the systems will be toast.

My understanding is that you can avoid this by editing (as root): 
/etc/yum/pluginconf.d/installonlyn.conf

Change 
tokeep = 2

to however many kernels you want to keep on the system.

Comment 14 Russ Price 2006-05-05 20:46:13 UTC
Problem appears fixed in 2111.

Comment 15 Stanis Trendelenburg 2006-05-05 21:04:51 UTC
2.6.16-1.2111_FC5 fixed it for me, too.