Bug 950633 - [NetKVM] Check if fix needed for cases when Windows send 0xffff checksum for packets
Summary: [NetKVM] Check if fix needed for cases when Windows send 0xffff checksum for ...
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: virtio-win
Version: 6.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Dmitry Fleytman
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-04-10 14:15 UTC by Yan Vugenfirer
Modified: 2013-12-06 07:42 UTC (History)
8 users (show)

(edit)
Cause: 
Wrong implementation of the corner case of checksum testing.

Consequence: 
Some of the packets might be dropped by the driver.

Fix: 
Fix checksum calculation.

Result: 
Packets will not be wrongly dropped by the driver.
Clone Of:
(edit)
Last Closed: 2013-11-22 00:08:06 UTC


Attachments (Terms of Use)
wireshark_error_screenshot (257.23 KB, application/zip)
2013-06-19 09:30 UTC, guo jiang
no flags Details
Wireshark screenshot for version before the bugfix (build 49) (138.29 KB, image/tiff)
2013-07-21 08:32 UTC, Dmitry Fleytman
no flags Details
Wireshark screenshot for version after the bugfix (build 64) (133.23 KB, image/tiff)
2013-07-21 08:33 UTC, Dmitry Fleytman
no flags Details
virtio-win-prewhql49-screenshot (113.33 KB, image/png)
2013-07-23 04:36 UTC, guo jiang
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:1729 normal SHIPPED_LIVE virtio-win bug fix and enhancement update 2013-11-21 00:39:25 UTC

Description Yan Vugenfirer 2013-04-10 14:15:42 UTC
Description of problem:

Reported by community:

Dmitry Skorodumov - March 14, 2013, 12:45 p.m.
Windows-host with disabled TX-checksum offloading may send
packets with tcp->checksum=0xffff. Most likely it is caused
by by Windows algorithm for computing incremental checksum
- see RFC 1624.

RFC1624 (sec.5) states that 0xffff and 0x0000 are equal, because
for example

0xCD7A + 0x3285 + 0xFFFF = 0xFFFF
0xCD7A + 0x3285 + 0x0000 = 0xFFFF;

Fix tcp/udp verification algorithm to check that checksum
of (transport-header + pseudo_hdr_csum) == 0
instead of recomputing checksum and comparing its value with
original value in transp_hdr->csum.

The problem could be reproduced on host linux network interface
without rx-offloading, because otherwise Guest-driver receives
from host packet with flag HDR_DATA_VALID
and VerifyTcpChecksum doesn't do any work.

---
 NetKVM/Common/sw-offload.c |   46 ++++++++++++++++++++++++++-----------------
 1 files changed, 28 insertions(+), 18 deletions(-)
Dmitry Skorodumov - March 14, 2013, 12:52 p.m.
> Dmitry Skorodumov <sdmitry <at> parallels.com> writes:
> 
> Windows-host with disabled TX-checksum offloading may send
> packets with tcp->checksum=0xffff. Most likely it is caused
> by by Windows algorithm for computing incremental checksum
> - see RFC 1624.

Forgot to mention that packets with checksum 0xffff appears almost 100%
if run from windows-host to kvm-guest

netperf -H <guest_addr> -t TCP_RR

Dmitry


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Yan Vugenfirer 2013-04-10 14:16:06 UTC
diff --git a/NetKVM/Common/sw-offload.c b/NetKVM/Common/sw-offload.c
index 24acfdc..c8e1a39 100644
--- a/NetKVM/Common/sw-offload.c
+++ b/NetKVM/Common/sw-offload.c
@@ -372,6 +372,7 @@  static __inline tTcpIpPacketParsingResult
 VerifyTcpChecksum( IPHeader *pIpHeader, ULONG len, tTcpIpPacketParsingResult known, ULONG whatToFix)
 {
 	USHORT  phcs;
+	USHORT checksum;
 	tTcpIpPacketParsingResult res = known;
 	TCPHeader *pTcpHeader = (TCPHeader *)RtlOffsetToPointer(pIpHeader, res.ipHeaderSize);
 	USHORT saved = pTcpHeader->tcp_xsum;
@@ -395,16 +396,21 @@  VerifyTcpChecksum( IPHeader *pIpHeader, ULONG len, tTcpIpPacketParsingResult kno
 			else if (res.xxpFull)
 			{
 				//USHORT ipFullLength = swap_short(pIpHeader->v4.ip_length);
-				pTcpHeader->tcp_xsum = phcs;
-				CalculateTcpChecksumGivenPseudoCS(pTcpHeader, xxpHeaderAndPayloadLen);
-				if (saved == pTcpHeader->tcp_xsum)
-					res.xxpCheckSum = ppresCSOK;
-
 				if (!(whatToFix & pcrFixXxpChecksum))
-					pTcpHeader->tcp_xsum = saved;
+				{
+					checksum = CheckSumCalculator(phcs, pTcpHeader, xxpHeaderAndPayloadLen);
+					if (checksum == 0)
+						res.xxpCheckSum = ppresCSOK;
+				}
 				else
+				{
+					pTcpHeader->tcp_xsum = phcs;
+					CalculateTcpChecksumGivenPseudoCS(pTcpHeader, xxpHeaderAndPayloadLen);
+					if (saved == pTcpHeader->tcp_xsum)
+						res.xxpCheckSum = ppresCSOK;
 					res.fixedXxpCS =
 						res.xxpCheckSum == ppresCSBad || res.xxpCheckSum == ppresPCSOK;
+				}
 			}
 			else if (whatToFix)
 			{
@@ -416,10 +422,9 @@  VerifyTcpChecksum( IPHeader *pIpHeader, ULONG len, tTcpIpPacketParsingResult kno
 			// we have correct PHCS and we do not need to fix anything
 			// there is a very small chance that it is also good TCP CS
 			// in such rare case we give a priority to TCP CS
-			CalculateTcpChecksumGivenPseudoCS(pTcpHeader, xxpHeaderAndPayloadLen);
-			if (saved == pTcpHeader->tcp_xsum)
+			checksum = CheckSumCalculator(phcs, pTcpHeader, xxpHeaderAndPayloadLen);
+			if (checksum == 0)
 				res.xxpCheckSum = ppresCSOK;
-			pTcpHeader->tcp_xsum = saved;
 		}
 	}
 	else
@@ -437,6 +442,7 @@  static __inline tTcpIpPacketParsingResult
 VerifyUdpChecksum( IPHeader *pIpHeader, ULONG len, tTcpIpPacketParsingResult known, ULONG whatToFix)
 {
 	USHORT  phcs;
+	USHORT checksum;
 	tTcpIpPacketParsingResult res = known;
 	UDPHeader *pUdpHeader = (UDPHeader *)RtlOffsetToPointer(pIpHeader, res.ipHeaderSize);
 	USHORT saved = pUdpHeader->udp_xsum;
@@ -459,16 +465,21 @@  VerifyUdpChecksum( IPHeader *pIpHeader, ULONG len, tTcpIpPacketParsingResult kno
 		{
 			if (res.xxpFull)
 			{
-				pUdpHeader->udp_xsum = phcs;
-				CalculateUdpChecksumGivenPseudoCS(pUdpHeader, xxpHeaderAndPayloadLen);
-				if (saved == pUdpHeader->udp_xsum)
-					res.xxpCheckSum = ppresCSOK;
-
 				if (!(whatToFix & pcrFixXxpChecksum))
-					pUdpHeader->udp_xsum = saved;
+				{
+					checksum = CheckSumCalculator(phcs, pUdpHeader, xxpHeaderAndPayloadLen);
+					if (checksum == 0)
+						res.xxpCheckSum = ppresCSOK;
+				}
 				else
+				{
+					pUdpHeader->udp_xsum = phcs;
+					CalculateUdpChecksumGivenPseudoCS(pUdpHeader, xxpHeaderAndPayloadLen);
+					if (saved == pUdpHeader->udp_xsum)
+						res.xxpCheckSum = ppresCSOK;
 					res.fixedXxpCS =
 						res.xxpCheckSum == ppresCSBad || res.xxpCheckSum == ppresPCSOK;
+				}
 			}
 			else
 				res.xxpCheckSum = ppresXxpIncomplete;
@@ -478,10 +489,9 @@  VerifyUdpChecksum( IPHeader *pIpHeader, ULONG len, tTcpIpPacketParsingResult kno
 			// we have correct PHCS and we do not need to fix anything
 			// there is a very small chance that it is also good UDP CS
 			// in such rare case we give a priority to UDP CS
-			CalculateUdpChecksumGivenPseudoCS(pUdpHeader, xxpHeaderAndPayloadLen);
-			if (saved == pUdpHeader->udp_xsum)
+			checksum = CheckSumCalculator(phcs, pUdpHeader, xxpHeaderAndPayloadLen);
+			if (checksum == 0)
 				res.xxpCheckSum = ppresCSOK;
-			pUdpHeader->udp_xsum = saved;
 		}
 	}
 	else

Comment 5 guo jiang 2013-06-17 10:14:54 UTC
Tested this issue on virtio-win-prewhql-0.1.49
Tested this issue on virtio-win-prewhql-0.1.64

Steps:
1.Boot guest with CLI:
/usr/libexec/qemu-kvm -M rhel6.4.0 -m 6G -smp 8 -cpu cpu64-rhel6,+x2apic,+sep,family=0xf -usb -device usb-tablet -drive file=win2k8-R2.qcow2,format=qcow2,if=none,id=drive-ide0-0-0,werror=stop,rerror=stop,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,sndbuf=0,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=no -device virtio-net-pci,netdev=hostnet0,mac=00:13:23:43:22:41,bus=pci.0,addr=0x4,id=virtio-net-pci0 -uuid 744e5d6b-8a99-4754-a27a-ae7f3b73844a -rtc-td-hack -no-kvm-pit-reinjection -rtc base=localtime,clock=host,driftfix=slew -chardev socket,id=111a,path=/tmp/monitor-win2k8-R2-network,server,nowait -mon chardev=111a,mode=readline -name win2k8-R2-network -spice port=5931,disable-ticketing -vga qxl -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor stdio

2.Install netperf on Linux-host and Windows-guest,respectively.
on guest: first install cygwin and then install netperf on cygwin.

3.Disable TX-checksum offloading both host and guest:
on guest:right clicking "Computer" and choosing "Properties" --> "Device Manager" --> "Network adapters" -->right clicking "Red Hat VirtIO Ethernet Adapter" and choosing "Properties" --> "Advanced" --> setting the value of "Offload.Tx.Checksum" is disabled.
on host:#ethtool -K eth0 tx off

4.Transfer from host to guest and from guest to host with netperf2.4.5.
Send side:#netserver start
Receive side:#netperf -H <guest_addr> -t TCP_RR

5.Disable Linux-host's Rx-offloading with command "ethtool -K eth0 rx off", and repeat step 4.

Actual Result:
on virtio-win-prewhql-0.1.49
  wireshark show the following error:
 Header checksum: 0x0000 [incorrect,should be ox9582(may be caused by "IP checksum offload"?)]
  [Good: False]
  [Bad: True]
    [Message: Bad checksum]
    [Severity level: Error]
    [Group: checksum]

on virtio-win-prewhql-0.1.64 step4 and step5 no error report.

Yan,QE want to know how to reproduce this bug? Can you give a detailed steps?Thanks!

Comment 6 guo jiang 2013-06-18 07:45:13 UTC
Yan,Dima

Any updates ?

Comment 7 Dmitry Fleytman 2013-06-18 10:40:08 UTC
Hello,

Steps are following:
1. Run 2 windows guests
2. Disable TX checksum offload in NetKVM's property page on both sides ("Properties" --> "Device Manager" --> "Network adapters" -->right clicking "Red Hat VirtIO Ethernet Adapter" and choosing "Properties" --> "Advanced" --> setting the value of "Offload.Tx.Checksum")
3. Check that RX checksum offlload is enabled in NetKVM's property page on both sides ("Properties" --> "Device Manager" --> "Network adapters" -->right clicking "Red Hat VirtIO Ethernet Adapter" and choosing "Properties" --> "Advanced" --> setting the value of "Offload.Rx.Checksum")
4. Run TCP stream between guests with any traffic generator
5. Check packets flow with Wireshark - on old version packets with TCP checksum 0xFFFF being dropped by receiver side and TCP re-transmissions occur. On new version packets with this checksum pass smoothly.

Dmitry.

Comment 8 guo jiang 2013-06-19 09:14:05 UTC
Tested this issue on virtio-win-prewhql-0.1.49
Tested this issue on virtio-win-prewhql-0.1.64

Steps:
1.Boot two windows guests with CLI:
guest 1:
/usr/libexec/qemu-kvm -m 2G -smp 2,cores=2 -cpu cpu64-rhel6,+x2apic -usb -device usb-tablet -drive file=win7-64-nic1.raw,if=none,id=drive-ide0-0-0,werror=stop,rerror=stop,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,sndbuf=0,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=no -device virtio-net-pci,netdev=hostnet0,mac=00:12:34:42:32:11,bus=pci.0,addr=0x4,id=virtio-net-pci0 -uuid ab5f8aec-e2d8-4fdf-803b-d73184b28d56 -no-kvm-pit-reinjection -chardev socket,id=111a,path=/tmp/monitor-win2012-nic1,server,nowait -mon chardev=111a,mode=readline -vnc :1 -vga cirrus -name win7-64-nic1-64-HCK -rtc base=localtime,clock=host,driftfix=slew -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor stdio
guest 2:
/usr/libexec/qemu-kvm -m 2G -smp 2,cores=2 -cpu cpu64-rhel6,+x2apic -usb -device usb-tablet -drive file=win7-64-nic2.raw,if=none,id=drive-ide0-0-0,werror=stop,rerror=stop,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,sndbuf=0,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=no -device virtio-net-pci,netdev=hostnet0,mac=00:12:43:c2:32:21,bus=pci.0,addr=0x4,id=virtio-net-pci0 -uuid f05e3ecd-b9c2-4ee7-913e-e4059f803743 -no-kvm-pit-reinjection -chardev socket,id=111a,path=/tmp/monitor-win2012-nic2,server,nowait -mon chardev=111a,mode=readline -vnc :2 -vga cirrus -name win7-64-nic2-64-HCK -rtc base=localtime,clock=host,driftfix=slew -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor stdio

2.Install NetKVM driver, wireshark and netperf on both guests,respectively
For netperf: install cygwin firstly and then install netperf on cygwin.

3.Disable TX-checksum offloading on both windows guests:
right clicking "Computer" and choosing "Properties" --> "Device Manager" --> "Network adapters" -->right clicking "Red Hat VirtIO Ethernet Adapter" and choosing "Properties" --> "Advanced" --> setting the value of "Offload.Tx.Checksum" is "disabled".

4.Check RX-checksum offlloading is enable on both windows guests
right clicking "Computer" and choosing "Properties" --> "Device Manager" --> "Network adapters" -->right clicking "Red Hat VirtIO Ethernet Adapter" and choosing "Properties" --> "Advanced" --> setting the value of "Offload.Rx.Checksum" is "all".

5.Transfer from guest1 to guest2 with netperf2.4.5.
guest2 side:#netserver
guest1 side:#netperf -H <guest2_addr> -t TCP_RR

Actual Result:
on virtio-win-prewhql-0.1.49
QE tested four times, everytime wireshark(guest1) show the following error:
Flags   Checksum  times(error)
0x018    Oxffff       4
0x019    0xfffd       1
0x010    random       1
Note: 
1.Every time, error information show the same Flags and error-times, four Oxffff errors have the same Flags-0x018.
2.For error Flags-0x010, every time has different checksum, so I use "random", my test result: 0x999c,0xd031,0xdf82 and 0xe931,respectively.

on virtio-win-prewhql-0.1.64 step4 and step5 no error report.
The Error screenshot can be found in attachments.

Comment 9 guo jiang 2013-06-19 09:30:19 UTC
Created attachment 762815 [details]
wireshark_error_screenshot

After unzipping, there are nine files.
for example:49-Flags0x018-0xffff.png,64-Flags0x018.png
49 -- virtio-win-prewhql-0.1.49
64 -- virtio-win-prewhql-0.1.64
Flags0x018 and 0xffff is the relevant information of error.

Comment 10 Dmitry Fleytman 2013-06-19 14:23:26 UTC
This is OK that Wireshark shows error for packets with checksum 0xFFFF, their verification algorithm is not 100% compatible with RFC.
You have to verify that packet with checksum 0xFFFF passes successfully and you see no TCP retransmission after it.

Comment 11 Mike Cao 2013-06-19 16:31:26 UTC
(In reply to Dmitry Fleytman from comment #10)
> This is OK that Wireshark shows error for packets with checksum 0xFFFF,
> their verification algorithm is not 100% compatible with RFC.
> You have to verify that packet with checksum 0xFFFF passes successfully and
> you see no TCP retransmission after it.

Sorry to ask again

Could you show QE more details how to to verify that packet with checksum 0xFFFF passes successfully how to verify no TCP retransmission after it?

Thanks,
Mike

Comment 15 Dmitry Fleytman 2013-07-21 08:31:12 UTC
(In reply to Mike Cao from comment #11)
> (In reply to Dmitry Fleytman from comment #10)
> > This is OK that Wireshark shows error for packets with checksum 0xFFFF,
> > their verification algorithm is not 100% compatible with RFC.
> > You have to verify that packet with checksum 0xFFFF passes successfully and
> > you see no TCP retransmission after it.
> 
> Sorry to ask again
> 
> Could you show QE more details how to to verify that packet with checksum
> 0xFFFF passes successfully how to verify no TCP retransmission after it?
> 
> Thanks,
> Mike

Hi Mike,

No problem.
I'm attaching 2 Wireshark screenshots:
 - "virtio-win-build-49" - old version without fix
 - "virtio-win-build-64" - new version with fix

From "virtio-win-build-49" one can see following sequence:
 1. Packet #56593 being sent with TCP checksum 0xFFFF and TCP sequence number 66005889
 2. Receiver side does no acknowledges it, this means the packet was dropped (see packets #56606-#56602, they acknowledge up to sequence number 66005889)
 3. Transmitter side starts re-transmissions for the same sequence number (66005889), see packets #56610, #56613, #56615, #56616, they all fail due to the same problem with check-sum

Sequence of "virtio-win-build-64" is different:
 1. Packet #40126 is transmitted with check-sum 0xFFFF and sequence number 49980877
 2. It is immediately acknowledged by packet #40131 which acknowledges up to sequence number 49987609
 3. No re-transmissions observed in sniffer

Pay attention that in order to reproduce this issue you need to disable all IP4 checksum offloads on transmitter side.

Basically that is it.
I hope it is clear enough,
if you have any other issues please let me know.

Comment 16 Dmitry Fleytman 2013-07-21 08:32:58 UTC
Created attachment 776381 [details]
Wireshark screenshot for version before the bugfix (build 49)

Comment 17 Dmitry Fleytman 2013-07-21 08:33:42 UTC
Created attachment 776382 [details]
Wireshark screenshot for version after the bugfix (build 64)

Comment 18 guo jiang 2013-07-22 11:51:36 UTC
(In reply to Dmitry Fleytman from comment #15)
> (In reply to Mike Cao from comment #11)
> > (In reply to Dmitry Fleytman from comment #10)
> > > This is OK that Wireshark shows error for packets with checksum 0xFFFF,
> > > their verification algorithm is not 100% compatible with RFC.
> > > You have to verify that packet with checksum 0xFFFF passes successfully and
> > > you see no TCP retransmission after it.
> > 
> > Sorry to ask again
> > 
> > Could you show QE more details how to to verify that packet with checksum
> > 0xFFFF passes successfully how to verify no TCP retransmission after it?
> > 
> > Thanks,
> > Mike
> 
> Hi Mike,
> 
> No problem.
> I'm attaching 2 Wireshark screenshots:
>  - "virtio-win-build-49" - old version without fix
>  - "virtio-win-build-64" - new version with fix
> 
> From "virtio-win-build-49" one can see following sequence:
>  1. Packet #56593 being sent with TCP checksum 0xFFFF and TCP sequence
> number 66005889
>  2. Receiver side does no acknowledges it, this means the packet was dropped
> (see packets #56606-#56602, they acknowledge up to sequence number 66005889)
>  3. Transmitter side starts re-transmissions for the same sequence number
> (66005889), see packets #56610, #56613, #56615, #56616, they all fail due to
> the same problem with check-sum
> 
> Sequence of "virtio-win-build-64" is different:
>  1. Packet #40126 is transmitted with check-sum 0xFFFF and sequence number
> 49980877
>  2. It is immediately acknowledged by packet #40131 which acknowledges up to
> sequence number 49987609
>  3. No re-transmissions observed in sniffer
> 
> Pay attention that in order to reproduce this issue you need to disable all
> IP4 checksum offloads on transmitter side.
> 
> Basically that is it.
> I hope it is clear enough,
> if you have any other issues please let me know.
Hi Dmitry,

Very sorry, QE cannot reproduce this issue, QE found no packet is transmitted with check-sum 0xFFFF. 

Steps

1.Boot two windows guest with CLI as comment #8

2.Install NetKVM driver, wireshark and netperf on both guests,respectively.
wireshark version: Version 1.10.0(SVN Rev 49790 from /trunk-1.10)

3.Tx&Rx setting(for all two guest)
right clicking "Computer" and choosing "Properties" --> "Device Manager" --> "Network adapters" -->right clicking "Red Hat VirtIO Ethernet Adapter" and choosing "Properties" --> "Advanced" --> setting the value what you want to setting 
  1).IPv4 Checksum Offload: Rx Enabled
  2).Offload.Rx.CheckSum: All
  3).Offload.Tx.CheckSum: Disable
  4).TCP Checksum Offload(IPv4): Rx Enable

4. Start wireshark on both sides and run TCP stream between guests with netperf.  
   on guest2(10.66.9.71): running netserver.exe in command
   on guest1(10.66.10.189): running netperf.exe
   >netperf.exe -H 10.66.9.71 -t TCP_STREAM

If my steps and setting is right, do you have any ideas about it? thanks.

Best regards,
Jiang Guo

Comment 19 guo jiang 2013-07-23 04:36:02 UTC
Created attachment 777152 [details]
virtio-win-prewhql49-screenshot

Comment 20 guo jiang 2013-07-23 04:37:15 UTC
Reproduced this issue on virtio-win-prewhql-49
Verified this issue on virtio-win-prewhql-65

Steps as comment #18 and comment #8
Additional setting: on send side, Offload.Tx.LSO: Disable

Actual Result
on build 49: packet with checksum 0xFFFF TCP re-transmissions(screenshot attached)  
on build 65: packet with checksum 0xFFFF passes successfully and no TCP retransmission after it.

Based on the above, this issue has been fixed already,thanks.

Comment 21 Mike Cao 2013-08-07 09:23:59 UTC
Move Status to VERIFIED according to comment #20

Comment 23 errata-xmlrpc 2013-11-22 00:08:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1729.html


Note You need to log in before you can comment on or make changes to this bug.