Description of problem: Xen guest name resolution fails to work with TX checksumming enabled. Possibly related to update xen-3.0.3-3.fc6 or kernel-xen-2.6.19-1.2895.fc6 as it used to work previously. Guests have bridged networking setup and kernel-xen-2.6.18-1.2869.fc6 running. Connections opened with ip-address work as expected. How reproducible: Always with TX checksumming enabled.
I've looked at the patches in xen-3.0.3-3.fc6 and there is nothing there which should affect the guest networking. So I suspect the problem will be in the kernel instead. Can you downgrade your kernel, but keep userspace on xen-3.0.3-3.fc6 and confirm networking then works ? If so, we can re-assign this ticket to the kernel
I had kernel-xen-2.6.18-1.2868.fc6 installed for dom0 already so I tested with it and guest networking seems to work ok with it. So it seems kernel related to me. Although I can ssh out of the guests running within problematic kernel using ip-addresses, it is unusually slow. Name resolution probably fails because of that.
NFS mounting fails with "mount.nfs: Input/output error" even when TX checksumming is disabled. nfslock also dies immediately after startup. Failure is related to kernel-xen-2.6.19-1.2895.fc6 update too.
Please disregard previous post, user error.
I just tried reproducing this with 2911 and failed. Can you still reproduce this with the latest fc6 kernel? Thanks.
2911 seems to work ok with other guest systems but the one running bind service for local network still works only when tx checksumming is off. This sounds so strange that I'll have to check that system more closely when I have time available.
Hmm, I thought you were talking about DNS clients failing as Xen guests. Was the problem with running a DNS server inside a Xen guest all along? Can you please generate a packet dump inside the Xen guest running bind? Thanks.
This is about DNS clients failing as Xen guests. Domain0: Queries to remote DNS servers work and if tx checksumming is off in DomU1, queries to DomU1 work. DomU1 running bind: If tx checksumming is on, local and remote DNS queries fail. DomU2: If tx checksumming is on and tx checksumming is off in DomU1, queries to DomU1 work, remote queries fail (I didn't test remote queries from this DomU earlier this day). If tx checksumming is off, remote queries work too. I'll find out how to generate those packet dumps a bit later.
Can you still reproduce this with the 2911 kernel in both dom0 and domU2? What NIC are you using in dom0?
Domain0 and DomU:s were all running 2911 kernel during tests yesterday, so problem is still there. There are 2 NIC:s in dom0, both Unex ND010 with 8139too driver: 00:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) Subsystem: Unex Technology Corp. ND010 Flags: bus master, medium devsel, latency 32, IRQ 12 I/O ports at dc00 [size=256] Memory at ea021000 (32-bit, non-prefetchable) [size=256] Capabilities: [50] Power Management version 2 eth0 is used in domU:s with default bridged setup.
Hmm I still can't reproduce this. Unfortunately I don't have a test machine with an 8139too here. Could you try a new test? Leave tx checksums on everywhere except for peth0. This should tell us whether we've got a problem with the 8139too's checksum offload emulation. Thanks.
When trying to get information from peth0 about current settings: [user@host ~]# ethtool -k peth0 Offload parameters for peth0: Cannot get device rx csum settings: Operation not supported Cannot get device tx csum settings: Operation not supported Cannot get device scatter-gather settings: Operation not supported Cannot get device tcp segmentation offload settings: Operation not supported no offload info available [user@host ~]# ethtool -k eth0 Offload parameters for eth0: Cannot get device rx csum settings: Operation not supported rx-checksumming: off tx-checksumming: on scatter-gather: on tcp segmentation offload: on So I can't change settings for peth0.
Ah yes I had forgotten that 8139too doesn't let you tune the option. I'll try getting you a version that has tx checksum disabled.
Created attachment 150551 [details] Modified 8139too driver OK, I've modified the 8139too driver for 2.6.19-1.2911.fc6xen so that checksums are always disabled. Please let me know if this lets the domU's work with checksums enabled.
I have problems loading that module, module signature verification fails. Modinfo gives same properties for this and the original one. As far as I know this kernel should load unsigned modules so this one must have a signature which differs from the kernels signature. I even tried stripping the signature without really knowing what I am doing and all I got was ELF verification errors :) Could you make an unsigned module or is there another way around this?
Created attachment 150639 [details] New module with sig removed Sorry, I forgot to remove the signature. This one should work better.
I'm sorry to report that the problem is still there when using provided driver. One thing that appeared when trying this driver is that when querying from DomU1 which has bind running (193.229.0.40 being one of my service provides nameservers): [user@host ~]# nslookup www.google.fi 193.229.0.40 ;; reply from unexpected source: 127.0.0.1#53, expected 193.229.0.40#53 ;; Warning: ID mismatch: expected ID 61472, got 15736 ;; reply from unexpected source: 127.0.0.1#53, expected 193.229.0.40#53 ;; Warning: ID mismatch: expected ID 61472, got 15736 ;; reply from unexpected source: 127.0.0.1#53, expected 193.229.0.40#53 ;; Warning: ID mismatch: expected ID 61472, got 15736 ;; connection timed out; no servers could be reached Previously nslookup just timed out. Again when tx-checksumming in DomU:s is switched off everything works as expected.
OK, with that change your driver-side is now identical to mine. So it's probably something above that. Do you have any custom netfilter rules? How about getting a packet dump in dom0 as well as domU to show what the checksum says? Thanks.
Firewalls in all hosts are quite simple: INPUT policy changed to DROP -A INPUT -i lo -j ACCEPT -A INPUT -i eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT and required service ports opened as needed by host for example with: -A INPUT -i eth0 -m state --state NEW -p tcp -m tcp --dport ssh -j ACCEPT No port redirects that might be able to produce output like in my previous posts. I'll start searching for info how to do packet dumps, pointers welcome.
As root in dom0 do tcpdump -s 1600 -w dom0.dump -p -i peth0 In domU do tcpdump -s 1600 -w domU.dump -p -i eth0
Dumps have been sent directly to mr. Xu because of possibly sensitive information. I just discovered a program called Wireshark and I have been monitoring ethernet traffic in one of the other system running same os which doesn't use xen but has the same network adapter and there seems to be quite a lot of bad checksums in network traffic also but somehow it just works.
OK there's definitely something bad going on here. Your packets aren't showing up in the peth0 dump at all so dom0 isn't forwarding them on. Please do another dump for the same situation: tcpdump -s 1600 -w vifX.dump -p -i vifX.0 where vifX.0 the interface for domU2 I'd also like to see the actual iptables rules with iptables -v -L -n and iptables -t nat -v -L -n in dom0. Thanks.
Hmm, your network setup is more complicated than I thought. Could you please include the output of these commands in dom0: ifconfig ip ru ip r l brctl show Thanks.
OK, what does brctl showmacs xenbr0 say after the DNS requests to an external site? Also what does ip r l say in domU2?
brctl showmacs xenbr0: port no mac addr is local? ageing timer 2 00:10:a7:05:a8:a8 no 1.68 1 00:10:a7:05:a8:ae no 0.68 2 00:10:a7:05:af:cf no 91.36 2 00:15:99:03:60:19 no 23.92 4 00:16:3e:13:9b:f0 no 1.60 3 00:16:3e:13:9b:f1 no 0.64 1 fe:ff:ff:ff:ff:ff yes 0.00 ip r l in domU2: 192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.6 169.254.0.0/16 dev eth0 scope link default via 192.168.1.1 dev eth0
OK, this makes more sense. You're not bridging the traffic but routing it through dom0. Since I was testing bridging that explains why I didn't see it. I presume eth1 is 8139too as well? I'll test this scenario. Thanks.
My nonexistant network knowledge says that domU:s are bridged to 192.168.1.0 network and remote connections are then routed trough eth1 which is also rtl8139 based card. Sorry if I gave too much misinformation.
OK I've reproduced it now. The new checksum code in 2.6.19 has not been merged properly with Xen. I'll try to fix it up.
Created attachment 152098 [details] Fix Xen checksum with 2.6.19 and beyond Here's the aggregate patch that will get FC6 to operate correctly with checksums. I've verified that it fixes the original problem for me.
*** Bug 186183 has been marked as a duplicate of this bug. ***
Has this patch been applied to any fc6 kernel updates?
(In reply to comment #31) > Has this patch been applied to any fc6 kernel updates? https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=234008#c54 says that it is not yet.
I've built a test RPM including Herbert's checksum patches. They are available at: http://raisama.net/rh/bz223258/ md5sums: 37783626bc8c13f98d21ea83f6dfeb02 kernel-2.6.20-1.2954.fc6.src.rpm c8f3a603d12c7c0e9381c93fa00ba3a6 kernel-xen-2.6.20-1.2954.fc6.i686.rpm Could you test if this kernel solves the problems?
Did you test the kernel-xen-2.6.20-1.2954.fc6 package?
I just did the testing. Patch has solved the problem I was experiencing.
Marking as MODIFIED. Fix on kernel-xen 2.6.20-1.2954.fc6 worked.
change QA contact