Bug 814898
Summary: | virtio net driver does not properly checksum | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Steve Meyers <steve-redhat> |
Component: | kernel | Assignee: | Vlad Yasevich <vyasevic> |
Status: | CLOSED WORKSFORME | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.2 | CC: | areis, bcao, davem, herbert.xu, juzhang, mst, rhod |
Target Milestone: | rc | Flags: | vyasevic:
needinfo+
|
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-06-25 14:39:59 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Steve Meyers
2012-04-21 04:59:33 UTC
Hi Steve, Thank you for taking the time to enter a bug report with us. We do appreciate the feedback and look to use reports such as this to guide our efforts at improving our products. That being said, this bug tracking system is not a mechanism for getting support, and as such we are not able to make any guarantees as to the timeliness or suitability of a resolution. If this issue is critical or in any way time sensitive, please raise a ticket through your regular Red Hat support channels to make certain that it gets the proper attention and prioritization to assure a timely resolution. For information on how to contact the Red Hat production support team, please see: https://www.redhat.com/support/process/production/#howto We will look into it anyhow. Thanks, Ronen. Thank you, Ronen. We've already worked around the bug in this case by moving our OpenVPN installation to a physical server with no virtual hosts, but I wanted to make you aware of the bug anyway so it could be addressed. (In reply to comment #0) > > Always > > Steps to Reproduce: > 1. Install 2 VMs on a single RHEL 6 machine. At least one should be running > RHEL 6. > 2. On the other one, run OpenVPN to connect to another network. > 3. Try to connect to the RHEL 6 machine from the other network. Ping works, > but UDP and TCP do not, due to checksum errors. > Hi Steve I'd like to understand a little how the 2 VMs are connected. Can you provide a block or ascii diagram with base connectivity between the VMS and how they tie to the physical network. Also, I'd like make sure that I understand what you are trying to do correctly. As I understand it, you wish to connect RHEL6 VM to a remote network and route the traffic through OpenVPN that is running on the second VM. Both VMs are running on the same physical host. Is that correct? Thanks -vlad +--------------------------+ +------------------+ | PHYSICAL HOST "A" | | | |--------------------------| | Computer "H" | | | | | | +---------+ +---------+ | | | | | VM "B" | | VM "C" | | +---------+--------+ | | | | OpenVPN | | | | +----+----+ +----+--+-+ | | | | | | | | | +-----+------+ | | OpenVPN +---------+--------+ | Bridged|to eth0 +-----------------+OpenWRT Router "G"| | | | tunnel | w/ OpenVPN server| +------------|-------------+ +---------+--------+ | | | +--------+ | +-----+-----+ | | | | Switch +----+ Router +-----------------+ +-----+-----+ | | Internet | +--------+ | +------------|-------------+ | PHYSICAL|HOST "D" | |------------|-------------| | | | | +-----+------+ | | | Bridged to | | | | eth0 | | | | | | | +----+----+ +----+----+ | | | VM "E" | | VM "F" | | | | | | | | | +----+----+ +----+----+ | +--------------------------+ Given this setup, hosts C, D, E, and F can talk to G and H across the OpenVPN tunnel, while A and B cannot. A and B cannot send properly checksummed TCP or UDP packets across the OpenVPN tunnel because they are on the same physical host as the OpenVPN client on C. My understanding is that they communicate with C using the virtio network driver, which does not do a proper checksum if the packet will not be leaving the physical host. This makes sense in most cases, since doing a checksum when no actual network is involved is redundant. VM server C forwards the packets unchanged onto its tunnel interface, so they end up over at G without ever having received a proper checksum. > VM server C forwards the packets unchanged onto its tunnel interface
As packets lack checksum VM server C should checksum the packets before
sending them out of tun device.
Michael - I agree. As I stated in the original, I tried the "-j CHECKSUM --checksum-fill" option for iptables, but that did not seem to have any effect. I understand the benefit of not doing the checksum in the virtio network driver. Assuming that continues to be the case, I can't think of a good way to take care of this problem inside the virtio network driver, since it has no way to know that the packet will be tunneled to another network. CHECKSUM target only handles CHECKSUM_PARTIAL packets, so one reason for it to be ineffective would be some other value for ip checksum. A systemtap script within guest C, printing skb->ip_summed at netif_receive_skb and tun_net_xmit points would tell us whether that is the case. Do you know how to do it or need us to write you such a script? You've surpassed my expertise now. :) I'm happy to run whatever script you want me to, but it's not something I could write myself. something like this then: cat <<EOF>smeyers.stp probe module("tun").function("tun_net_xmit@drivers/net/tun.c") { printf("tun_net_xmit %x\n", $skb->ip_summed); } probe kernel.function("dev_hard_start_xmit@net/core/dev.c"){ printf("dev_queue_start_xmit %x\n", $skb->ip_summed); } probe kernel.function("netif_receive_skb@net/core/dev.c"){ printf("netif_receive_skb %x\n", $skb->ip_summed); } EOF stap smeyers.stp run it for a bit then ctrl-c to cancel note you need kernel debuginfo kernel deve and stap installed. or the below more elaborate one to also see callers of each function: function caller_n(n) { f = backtrace () sym = tokenize(f, " ") for (i=1; i<n; i++) sym = tokenize("", " ") return sym } function my_caller() { return (symdata(strtol(caller_n(1),16))) } probe module("tun").function("tun_net_xmit@drivers/net/tun.c") { printf("tun_net_xmit %x from %s\n", $skb->ip_summed, my_caller()); } probe kernel.function("dev_hard_start_xmit@net/core/dev.c"){ printf("dev_queue_start_xmit %x from %s\n", $skb->ip_summed, my_caller()); } probe kernel.function("netif_receive_skb@net/core/dev.c"){ printf("netif_receive_skb %x from %s\n", $skb->ip_summed, my_caller()); } I've spent quite a bit of time trying to reproduce and isolate this issue and I can't get it to happen. In my configuration, I've set up RHEL6.2 host with 2 RHEL 6.2 guests. On one of the guests I've install openvpn client and created a openvpn tunnel to a Fedora 16 system acting as a VPN server. The configuration assumed a routed VPN configuration with a subnet behind the VPN client. In this configuration, I can successfully send tcp traffic to the Fedora VPN server and any VMs running on it. The traffic to VMs is routed through the VPN server. To get a bit more information, I've written a small utility that establishes an unencrypted TCP tunnel (similar to openvpn) and this utility also attempts to validate TCP checksums that it receives. The utility shows that any time data is read from the tunnel interface, it already has a fully computed checksum. As another eperiment, I've tried adding mssfix option to force openvpn to change the checksum and that worked as spected also. If you can still reproduce this issue, can you please run the SystemTap scripts that Michael has provided. Thanks -vlad Closing, since we do not seem to be able to reproduce it. We'll be happy to re-open once we have additional info. Thanks, Ronen. |