Bug 186183

Summary: Masqueraded tcp connections from guest get stuck after syn/ack - checksum problem?
Product: [Fedora] Fedora Reporter: Robin Green <greenrd>
Component: xenAssignee: Herbert Xu <herbert.xu>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: ehabkost, katzj, mcepl, mcepl, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-04-24 22:50:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 179629    

Description Robin Green 2006-03-22 01:11:11 UTC
Description of problem:     
Masqueraded tcp connections from a xen guest get stuck after the first packet     
after the syn/ack. This did not occur on a fc4 host with stock xen and xen    
kernel from XenSource. It does occur with Fedora xen and XenSource   
kernel-xenU. Unfortunately, I cannot easily retest with the stock xen and    
stock xen kernels any more because the stock xen0 kernel is incompatible with   
udev in fc5.   
     
ethereal on the host shows that the syn/ack has a correct checksum, but the     
next packet doesn't, either coming in on vif or going out on eth0.     
     
Networking between guest and host works fine - ethereal on the host shows that  
some of the checksums are bogus there too, but I suspect no-one's checking  
them because it's a virtual interface maybe?   
   
I'm aware that ethereal and tcpdump can give bogus checksum errors for   
outgoing packets, but this is for incoming packets as well.   
   
Version-Release number of selected component (if applicable):     
xen-3.0.1-4  
     
How reproducible:     
Always  
     
Steps to Reproduce:     
1. Create a xen guest that uses NAT networking.  
2. Boot the guest and start networking  
3. In the host, run ethereal and filter on port 80  
4. From the guest, do links http://www.google.ie/  
       
Actual results:     
Browser hangs forever waiting for a reply.   
     
Expected results:     
Quick reply

Comment 1 Robin Green 2006-03-22 01:13:03 UTC
Oh, I should have made clear - this occurs whether you use the xenU kernel 
from xensource or the Fedora xenU kernel. 

Comment 2 Robin Green 2006-03-22 02:35:28 UTC
I connected to another machine here on campus (secure.ucd.ie), from the xen 
guest, and took a tcpdump -v on secure.ucd.ie. The packets are getting to the 
server, but after the syn/ack handshaking, the server isn't replying. 
 
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes 
01:50:58.313460 IP (tos 0x0, ttl  63, id 7455, offset 0, flags [DF], proto 6, 
length: 60) greenrd.ucd.ie.57464 > secure.ucd.ie.http: S [tcp sum ok] 
3318021448:3318021448(0) win 5840 <mss 1460,sackOK,timestamp 4294959191 
0,nop,wscale 2> 
01:50:58.316452 IP (tos 0x0, ttl  64, id 0, offset 0, flags [DF], proto 6, 
length: 60) secure.ucd.ie.http > greenrd.ucd.ie.57464: S [tcp sum ok] 
3457191582:3457191582(0) ack 3318021449 win 5792 <mss 1460,sackOK,timestamp 
19629476 4294959191,nop,wscale 2> 
01:50:58.313689 IP (tos 0x0, ttl  63, id 7456, offset 0, flags [DF], proto 6, 
length: 52) greenrd.ucd.ie.57464 > secure.ucd.ie.http: . [tcp sum ok] ack 1 
win 1460 <nop,nop,timestamp 4294959192 19629476> 
01:50:58.314112 IP (tos 0x0, ttl  63, id 7457, offset 0, flags [DF], proto 6, 
length: 227) greenrd.ucd.ie.57464 > secure.ucd.ie.http: P 1:176(175) ack 1 win 
1460 <nop,nop,timestamp 4294959193 19629476> 
01:50:58.518081 IP (tos 0x0, ttl  63, id 7458, offset 0, flags [DF], proto 6, 
length: 227) greenrd.ucd.ie.57464 > secure.ucd.ie.http: P 1:176(175) ack 1 win 
1460 <nop,nop,timestamp 4294959244 19629476> 
01:50:58.926064 IP (tos 0x0, ttl  63, id 7459, offset 0, flags [DF], proto 6, 
length: 227) greenrd.ucd.ie.57464 > secure.ucd.ie.http: P 1:176(175) ack 1 win 
1460 <nop,nop,timestamp 4294959346 19629476> 
01:50:59.742032 IP (tos 0x0, ttl  63, id 7460, offset 0, flags [DF], proto 6, 
length: 227) greenrd.ucd.ie.57464 > secure.ucd.ie.http: P 1:176(175) ack 1 win 
1460 <nop,nop,timestamp 4294959550 19629476> 

Comment 3 Robin Green 2006-03-22 16:52:26 UTC
This is indeed a bad checksum generated in the guest, which is modified but 
not corrected by the masq on the host.  
  
Workaround: modprobe iptable_nat in the _guest_.   
   
I found this workaround at     
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=495 which may or may    
not be the same bug. However it looks like recent patches in xen CVS may fix    
problems related to checksumming.   
   

Comment 4 Herbert Straub 2007-01-21 22:57:57 UTC
I can confirm this bug: Version: Fedora Core 6 with
kernel-xen-2.6.19-1.2895.fc6. I can workaround with

ethtool -K eth0 tx off

in the xen Guest Domain. I found this net references:

http://wiki.xensource.com/xenwiki/XenFaq#head-4ce9767df34fe1c9cf4f85f7e07cb10110eae9b7
--> 3.5 TCP and UDP checksum errors, ping but nothing else, ipsec tunnels don't
form, DNAT translation doesn't work

http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=447

http://www.redhat.com/archives/fedora-xen/2006-June/msg00020.html

Comment 5 Matěj Cepl 2007-04-24 12:42:42 UTC
Fully reproducible with kernel-xen-2.6.20-1.2944.fc6.x86_64 and RHEL4 as a guest.

Comment 6 Herbert Xu 2007-04-24 22:50:41 UTC

*** This bug has been marked as a duplicate of 223258 ***