Red Hat Bugzilla – Bug 648333
TCP checksum overflows in qemu's e1000 emulation code when TSO is enabled in guest OS
Last modified: 2011-05-19 08:49:53 EDT
+++ This bug was initially created as a clone of Bug #648328 +++ Description of problem: It shows a really bad performance(approximately 10k/s)when I download a big file via ftp from a kvm guest which enables tso. But it works fine (approx 10M/s) after changing IP address to another subnet(172.18.33.X) In the bad case, almost all ftp-data packets need to be re-transmitted because all first packets have a bad checksum, like this "Checksum: 0x8906 [incorrect, should be 0x8905 (maybe caused by "TCP checksum offload"?)]" Then the ftp client wouldn't send ack to the kvm guest. The following two packets were captured from ftp client: No. Time Source Destination Protocol Info 456 2010-11-01 23:08:25.027685 172.18.36.42 192.168.106.106 FTP-DATA FTP Data: 1448 bytes Frame 456 (1514 bytes on wire, 1514 bytes captured) Ethernet II, Src: 3com_0f:39:0f (00:50:04:0f:39:0f), Dst: Usi_55:6e:f6 (00:16:41:55:6e:f6) Internet Protocol, Src: 172.18.36.42 (172.18.36.42), Dst: 192.168.106.106 (192.168.106.106) Transmission Control Protocol, Src Port: 15319 (15319), Dst Port: 47404 (47404), Seq: 176657, Ack: 1, Len: 1448 Source port: 15319 (15319) Destination port: 47404 (47404) [Stream index: 3] Sequence number: 176657 (relative sequence number) [Next sequence number: 178105 (relative sequence number)] Acknowledgement number: 1 (relative ack number) Header length: 32 bytes Flags: 0x10 (ACK) Window size: 5888 (scaled) Checksum: 0x8906 [incorrect, should be 0x8905 (maybe caused by "TCP checksum offload"?)] Options: (12 bytes) [SEQ/ACK analysis] FTP Data ... No. Time Source Destination Protocol Info 459 2010-11-01 23:08:25.228855 172.18.36.42 192.168.106.106 FTP-DATA [TCP Retransmission] FTP Data: 1448 bytes Frame 459 (1514 bytes on wire, 1514 bytes captured) Ethernet II, Src: 3com_0f:39:0f (00:50:04:0f:39:0f), Dst: Usi_55:6e:f6 (00:16:41:55:6e:f6) Internet Protocol, Src: 172.18.36.42 (172.18.36.42), Dst: 192.168.106.106 (192.168.106.106) Transmission Control Protocol, Src Port: 15319 (15319), Dst Port: 47404 (47404), Seq: 176657, Ack: 1, Len: 1448 Source port: 15319 (15319) Destination port: 47404 (47404) [Stream index: 3] Sequence number: 176657 (relative sequence number) [Next sequence number: 178105 (relative sequence number)] Acknowledgement number: 1 (relative ack number) Header length: 32 bytes Flags: 0x10 (ACK) Window size: 5888 (scaled) Checksum: 0x883c [correct] Options: (12 bytes) [SEQ/ACK analysis] FTP Data Version-Release number of selected component (if applicable): kvm-83-164.el5 How reproducible: 100% Steps to Reproduce: 1. Set up a kvm guest with e1000 NIC and ip address 172.18.36.42 and enable ftp service 2. Download a big file via ftp from the kvm guest. The ip address of client is 192.168.106.106 Actual results: The speed is about 10k/s. Expected results: The speed should be about 10M/s. Additional info: --- Additional comment from dwu@redhat.com on 2010-10-31 22:46:21 EDT --- Created attachment 456783 [details] Fix the checksum overflow --- Additional comment from dwu@redhat.com on 2010-10-31 22:51:14 EDT --- I have verified that this patch can fix the overflow when adding tcp length to the pseudo-header checksum.
Could not reproduce this bug with qemu-kvm-0.12.1.2-2.113.el6.x86_64 and qemu-kvm-0.12.1.2-2.129.el6.x86_64. The speeds are all about 12 M/s. This bug is cloned from 5.5, so can I moving it to VERIFIED ?
Amos, that seems odd that the original slow speed problem doesn't exist on rhel6, the code is very similar here. Did you follow the same steps as https://bugzilla.redhat.com/show_bug.cgi?id=648328#c9?
(In reply to comment #6) > Amos, that seems odd that the original slow speed problem doesn't exist on > rhel6, the code is very similar here. Did you follow the same steps as > https://bugzilla.redhat.com/show_bug.cgi?id=648328#c9? Yes.
Can I moving this bug to VERIFIED ?
Yes, let's move to VERIFIED since this was originally cloned from a rhel5 bz. The change shows no regression and keeps us in sync with both rhel5 and upstream.
According to comment #9, Moving this bug to VERIFIED.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: when TSO is enabled, the e1000 emulation code was not properly accounting for overflow when adding the length to the pseudo header. Consequence: poor performance of e1000 emulation when using TSO. Fix: Fix TCP checksum overflow with TSO on e1000 emulation code. Result: improved performance of e1000 emulation when using TSO.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0534.html