Bug 648333

Summary: TCP checksum overflows in qemu's e1000 emulation code when TSO is enabled in guest OS
Product: Red Hat Enterprise Linux 6 Reporter: Mark Wu <dwu>
Component: qemu-kvmAssignee: Alex Williamson <alex.williamson>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: high    
Version: 6.0CC: akong, ehabkost, mjenner, mkenneth, virt-maint, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.119.el6 Doc Type: Bug Fix
Doc Text:
Cause: when TSO is enabled, the e1000 emulation code was not properly accounting for overflow when adding the length to the pseudo header. Consequence: poor performance of e1000 emulation when using TSO. Fix: Fix TCP checksum overflow with TSO on e1000 emulation code. Result: improved performance of e1000 emulation when using TSO.
Story Points: ---
Clone Of: 648328 Environment:
Last Closed: 2011-05-19 11:23:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 648328    
Bug Blocks: 580954    

Description Mark Wu 2010-11-01 02:59:10 UTC
+++ This bug was initially created as a clone of Bug #648328 +++

Description of problem:
It shows a really bad performance(approximately 10k/s)when I download a big file via ftp from a kvm guest which enables tso. But it works fine (approx 10M/s) after changing IP address to another subnet(172.18.33.X)

In the bad case, almost all ftp-data packets need to be re-transmitted because all first packets have a bad checksum, like this "Checksum: 0x8906 [incorrect, should be 0x8905 (maybe caused by "TCP checksum offload"?)]" Then the ftp client wouldn't send ack to the kvm guest. 

The following two packets were captured from ftp client:

No.     Time                       Source                Destination           Protocol Info
    456 2010-11-01 23:08:25.027685 172.18.36.42          192.168.106.106       FTP-DATA FTP Data: 1448 bytes

Frame 456 (1514 bytes on wire, 1514 bytes captured)
Ethernet II, Src: 3com_0f:39:0f (00:50:04:0f:39:0f), Dst: Usi_55:6e:f6 (00:16:41:55:6e:f6)
Internet Protocol, Src: 172.18.36.42 (172.18.36.42), Dst: 192.168.106.106 (192.168.106.106)
Transmission Control Protocol, Src Port: 15319 (15319), Dst Port: 47404 (47404), Seq: 176657, Ack: 1, Len: 1448
    Source port: 15319 (15319)
    Destination port: 47404 (47404)
    [Stream index: 3]
    Sequence number: 176657    (relative sequence number)
    [Next sequence number: 178105    (relative sequence number)]
    Acknowledgement number: 1    (relative ack number)
    Header length: 32 bytes
    Flags: 0x10 (ACK)
    Window size: 5888 (scaled)
    Checksum: 0x8906 [incorrect, should be 0x8905 (maybe caused by "TCP checksum offload"?)]
    Options: (12 bytes)
    [SEQ/ACK analysis]
FTP Data

...

No.     Time                       Source                Destination           Protocol Info
    459 2010-11-01 23:08:25.228855 172.18.36.42          192.168.106.106       FTP-DATA [TCP Retransmission] FTP Data: 1448 bytes

Frame 459 (1514 bytes on wire, 1514 bytes captured)
Ethernet II, Src: 3com_0f:39:0f (00:50:04:0f:39:0f), Dst: Usi_55:6e:f6 (00:16:41:55:6e:f6)
Internet Protocol, Src: 172.18.36.42 (172.18.36.42), Dst: 192.168.106.106 (192.168.106.106)
Transmission Control Protocol, Src Port: 15319 (15319), Dst Port: 47404 (47404), Seq: 176657, Ack: 1, Len: 1448
    Source port: 15319 (15319)
    Destination port: 47404 (47404)
    [Stream index: 3]
    Sequence number: 176657    (relative sequence number)
    [Next sequence number: 178105    (relative sequence number)]
    Acknowledgement number: 1    (relative ack number)
    Header length: 32 bytes
    Flags: 0x10 (ACK)
    Window size: 5888 (scaled)
    Checksum: 0x883c [correct]
    Options: (12 bytes)
    [SEQ/ACK analysis]
FTP Data


Version-Release number of selected component (if applicable):
kvm-83-164.el5

How reproducible:
100%

Steps to Reproduce:
1. Set up a kvm guest with e1000 NIC and ip address 172.18.36.42 and enable ftp service
2. Download a big file via ftp from the kvm guest. The ip address of client is 192.168.106.106

  
Actual results:
The speed is about 10k/s.

Expected results:
The speed should be about 10M/s.

Additional info:

--- Additional comment from dwu on 2010-10-31 22:46:21 EDT ---

Created attachment 456783 [details]
Fix the checksum overflow

--- Additional comment from dwu on 2010-10-31 22:51:14 EDT ---

I have verified that this patch can fix the overflow when adding tcp length to the pseudo-header checksum.

Comment 5 Amos Kong 2011-01-11 03:52:25 UTC
Could not reproduce this bug with qemu-kvm-0.12.1.2-2.113.el6.x86_64 and qemu-kvm-0.12.1.2-2.129.el6.x86_64.

The speeds are all about 12 M/s.

This bug is cloned from 5.5, so can I moving it to VERIFIED ?

Comment 6 Alex Williamson 2011-01-11 04:26:27 UTC
Amos, that seems odd that the original slow speed problem doesn't exist on rhel6, the code is very similar here.  Did you follow the same steps as https://bugzilla.redhat.com/show_bug.cgi?id=648328#c9?

Comment 7 Amos Kong 2011-01-11 04:34:54 UTC
(In reply to comment #6)
> Amos, that seems odd that the original slow speed problem doesn't exist on
> rhel6, the code is very similar here.  Did you follow the same steps as
> https://bugzilla.redhat.com/show_bug.cgi?id=648328#c9?

Yes.

Comment 8 Amos Kong 2011-01-25 08:15:01 UTC
Can I moving this bug to VERIFIED ?

Comment 9 Alex Williamson 2011-01-25 21:36:20 UTC
Yes, let's move to VERIFIED since this was originally cloned from a rhel5 bz.  The change shows no regression and keeps us in sync with both rhel5 and upstream.

Comment 10 Amos Kong 2011-01-26 04:43:16 UTC
According to comment #9, Moving this bug to VERIFIED.

Comment 12 Eduardo Habkost 2011-05-05 19:06:24 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: when TSO is enabled, the e1000 emulation code was not properly accounting for overflow when adding the length to the pseudo header.

Consequence: poor performance of e1000 emulation when using TSO.

Fix: Fix TCP checksum overflow with TSO on e1000 emulation code.

Result: improved performance of e1000 emulation when using TSO.

Comment 13 errata-xmlrpc 2011-05-19 11:23:41 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0534.html

Comment 14 errata-xmlrpc 2011-05-19 12:49:53 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0534.html