Bug 173760 - IPv6 TCP checksum error when using VIA Velocity 6122
Summary: IPv6 TCP checksum error when using VIA Velocity 6122
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 4
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: John W. Linville
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-11-20 19:34 UTC by Jay Cliburn
Modified: 2007-11-30 22:11 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2006-01-13 20:24:02 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Packet capture of failing "ssh -6" session (55.78 KB, text/plain)
2005-11-20 19:38 UTC, Jay Cliburn
no flags Details
jwltest-via-velocity-tx_csum.patch (392 bytes, patch)
2005-11-30 18:38 UTC, John W. Linville
no flags Details | Diff

Description Jay Cliburn 2005-11-20 19:34:06 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7

Description of problem:
ssh, vsftp, and a homemade simple client-server fail to work over IPv6 between an IPv6-enabled FC4 x86_64 host and an FC4 i386 host on my home network.  Each host is configured with site-local IPv6 addresses.  The referenced IPv6-enabled applications work fine between two identical FC4 i386 hosts (Dell Optiplex GX110s) on the same network.  I believe the problem is related to the VIA Velocity 6122 gigabit ethernet driver on the x86_64 machine (Abit AV8 motherboard, VIA K8T800 chipset).  I repeatedly encounter TCP checksum errors on the Abit machine (described in detail below).

The newest version of the VIA Velocity 6122 driver I can find is v1.16, but that version won't build against the kernels listed below; some of the kernel pci structures were changed in kernel 2.6.12, and apparently VIA hasn't kept up.  The 1.16 driver references nonexistent structure members.  FC4 currently uses v1.13 of the velocity driver.

Things I've tried:

*  kernel 2.6.13-1.1532_FC4 (server and client)
*  kernel 2.6.13-1.1456_FC4 (server and client)
*  kernel 2.6.12-1.1390_FC4 (server and client)
*  kernel 2.6.12-1.1369_FC4 (server and client)
*  kernel 2.6.12-1.1637_FC4 (server and client)
*  ssh v4.2p1 rpm from fedora.redhat.com
*  ssh v4.2p1 source from openssh.org (Note 1)
*  vsftpd v2.0.3-1 from fedora.redhat.com
*  global-scope instead of site local IPv6 addresses (Note 2)

Note 1:  I had to build openssh4.2p1 with the zlib check turned off because Fedora still uses version 1.2.2.2.  The OpenSSH configure script wants zlib 1.2.3 or higher.

Note 2:  I established an account with freenet6.net and started the tpsc daemon in router mode, which assigns global-scope addresses to participating hosts on my local network.  From either the server or the client machine, I get the swimming/dancing turtle at kame.net, so IPv6 through a v4 tunnel works.

What DOES work:  I can ping6 either host from the other host. traceroute6 works between the hosts.  

The client in this case is an FC4 x86_64 running 2.6.13-1.1637_FC4, and the server is an FC4 i386 also running 2.6.13-1.1637_FC4.  Both systems are running openssh 4.2p1-fc4.1 from the Fedora updates-released repo.  In the stuff that follows, the client and server can be identified by the last 4 digits of their respective IPv6 addresses as follows:

client: 9069 (Abit AV8, VIA Velocity 6122 GigE)
server: 6dda (Dell Optiplex GX110, 3Com PCI 3c905C Tornado 10/100)

Basically, what's happening is the client and server execute the TCP 3-way handshake in frames 3, 4, and 5.  Then in frame 6 the server asserts its ssh protocol preference (protocol 2), to which the client acks in frame 7.  In frame 8 the trouble begins.  The client asserts that it, too, can speak protocol 2, but that frame suffers a TCP checksum error.  Frames 9, 10, 11, 13, 14, 15, 16, 17, and 25  are retransmissions of frame 8, each bearing a checksum error.  Oddly, the "incorrect" checksum value remains constant at 0x5d72, while the "should be" value is all over the map.  At frame 32 the server throws
up its arms in dismay and issues a FIN,ACK and closes the connection.

A vstfp session is strikingly similar, though to save space the results are not provided below.  The 3-way handshake executes fine, and the vsftpd server issues the login initiation.  When the client responds with the login username, that packet has a TCP checksum error and is dropped by the server.  The client begins sending retransmits, each bearing a checksum error with the same characteristics as the ssh session; the "incorrect" value stays constant and the "should be" value varies.

The same pattern emerges when I run a simple home-grown IPv6 client-server program I wrote to further narrow this down.  I can provide the source code if desired.

A fellow writing in spring 2003 describes identical indications in this thread(http://www.uwsg.iu.edu/hypermail/linux/kernel/0303.2/0734.html), but efforts to contact him by email have been unsuccessful.

A packet capture of the failing ssh session is attached.

Version-Release number of selected component (if applicable):
kernel-2.6.14-1.1637_FC4

How reproducible:
Always

Steps to Reproduce:
1.  From the Abit machine, ssh -6 <DellserverIPv6address>
2.  Session hangs with no password prompt, and eventually times out.
3.
  

Actual Results:  [jcliburn@osprey ~]$ ssh -6 fec0::2b0:d0ff:fe82:6dda

<lengthy delay ensues>

Connection closed by fec0::2b0:d0ff:fe82:6dda
[jcliburn@osprey ~]$


Expected Results:  [jcliburn@gadwall ~]$ ssh -6 fec0::2b0:d0ff:fe82:6dda
jcliburn@fec0::2b0:d0ff:fe82:6dda's password:
Last login: Sun Nov 20 13:33:30 2005 from 192.168.1.3
[jcliburn@petrel ~]$


Additional info:

See attached packet capture.

Comment 1 Jay Cliburn 2005-11-20 19:38:58 UTC
Created attachment 121275 [details]
Packet capture of failing "ssh -6" session

Comment 2 Jay Cliburn 2005-11-20 21:12:01 UTC
Apologies, but there are errors in my kernel numbers above.  

My current kernel is 2.6.14-1.1637_FC4.

Other kernels that exhibit the same problem:

2.6.13-1.1532_FC4
2.6.12-1.1456_FC4
2.6.12-1.1398_FC4
2.6.11-1.1369_FC4


Comment 3 Jay Cliburn 2005-11-21 00:33:55 UTC
I found a solution.  I modified the via-velocity.c kernel module source file and
changed the TX_CSUM_DEF flag from 1 to 0, then rebuilt and modprobed the module.
 All IPv6 functionality that was failing now works.  I guess the Velocity 6122
can't handle offloaded checksumming for IPv6.

[root@osprey net]# diff -c via-velocity.c ~jcliburn/via-velocity.c
*** via-velocity.c      2005-11-20 18:34:23.000000000 -0600
--- /home/jcliburn/via-velocity.c       2005-11-20 18:33:18.000000000 -0600
***************
*** 166,172 ****
  */
  VELOCITY_PARAM(IP_byte_align, "Enable IP header dword aligned");

! #define TX_CSUM_DEF     1
  /* txcsum_offload[] is used for setting the checksum offload ability of NIC.
     (We only support RX checksum offload now)
     0: disable csum_offload[checksum offload
--- 166,172 ----
  */
  VELOCITY_PARAM(IP_byte_align, "Enable IP header dword aligned");

! #define TX_CSUM_DEF     0
  /* txcsum_offload[] is used for setting the checksum offload ability of NIC.
     (We only support RX checksum offload now)
     0: disable csum_offload[checksum offload


Comment 4 John W. Linville 2005-11-22 13:17:06 UTC
Test kernels w/ above patch are available here: 
 
   http://people.redhat.com/linville/kernels/fc4/ 
 
Please verify that they work as expected, and post the results here...thanks! 

Comment 5 Jay Cliburn 2005-11-23 00:11:00 UTC
It works.  Thanks.

[jcliburn@osprey ~]$ uname -a
Linux osprey 2.6.14-1.1641_FC4.jwltest.24 #1 Mon Nov 21 15:49:36 EST 2005 x86_64
x86_64 x86_64 GNU/Linux
[jcliburn@osprey ~]$ ssh -6 fec0::2b0:d0ff:fe82:6dda
jcliburn@fec0::2b0:d0ff:fe82:6dda's password:
Last login: Tue Nov 22 18:14:19 2005 from fec0::250:8dff:feef:9069
[jcliburn@petrel ~]$ YAY!!!


Comment 6 John W. Linville 2005-11-30 18:38:02 UTC
Created attachment 121646 [details]
jwltest-via-velocity-tx_csum.patch

Comment 7 John W. Linville 2005-11-30 18:39:41 UTC
I'd like to try another alternative... 
 
Test kernels w/ the above patch are available here: 
 
   http://people.redhat.com/linville/kernels/fc4/ 
 
Please give those a try and post the results...thanks! 

Comment 8 Jay Cliburn 2005-12-01 01:01:51 UTC
It works for both IPv4 and IPv6.  Only IPv6 shown here.  Thanks.

[jcliburn@osprey ~]$ date;uname -a
Wed Nov 30 18:59:19 CST 2005
Linux osprey 2.6.14-1.1645_FC4.jwltest.25 #1 Wed Nov 30 09:17:21 EST 2005 x86_64
x86_64 x86_64 GNU/Linux
[jcliburn@osprey ~]$ ssh -6 2001:5c0:8c82:0:250:8dff:fed3:7b0d
jcliburn@2001:5c0:8c82:0:250:8dff:fed3:7b0d's password:
Last login: Wed Nov 30 18:53:35 2005 from osprey6
[jcliburn@kite ~]$ IT WORKS!!
[jcliburn@kite ~]$ netstat --inet6 -an
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address           State
tcp        0      0 :::22                       :::*                      LISTEN
tcp        0      0 2001:5c0:8c82:0:250:8dff:22 2001:5c0:8c82::1:53641   
ESTABLISHED

Comment 9 Daniel Roesen 2005-12-27 02:57:43 UTC
Just for the records: I had the same problem recently on a FC1 box using e1000. I
had to disable sendfile() support in several server apps (Apache, ProFTPD, etc.)
to make it work. Can someone In The Know cross-check current FC kernel wether this
problem is still in e1000 there?

Comment 10 John W. Linville 2006-01-03 20:10:01 UTC
The problem is still upstream (so it's still in Fedora too).  Still 
negotiating... 

Comment 11 John W. Linville 2006-01-13 20:24:02 UTC
Looks like the patch got merged...closing this as UPSTREAM.  It may take a 
little while, but the fix will filter into Fedora naturally over the next 
several weeks. 


Note You need to log in before you can comment on or make changes to this bug.