Bug 189112

Summary:	XenU guest kernel reports "Received packet is 10 bytes before head."
Product:	[Fedora] Fedora	Reporter:	lannet
Component:	xen	Assignee:	Herbert Xu <herbert.xu>
Status:	CLOSED WONTFIX	QA Contact:	Martin Jenner <mjenner>
Severity:	high	Docs Contact:
Priority:	medium
Version:	5	CC:	bstein, herbert.xu, hps, katzj, russell, xen-maint
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2008-02-26 23:04:35 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description lannet 2006-04-16 12:18:02 UTC

Description of problem:
The kernel of a XenU guest reports and logs "Received packet is 10 bytes before
head." for all packets received via the Internet.

This is occuring on an i386 single CPU (AMD Sempron(tm) Processor 2800+) system
 with 768Mb RAM and two Ethernet interfaces.
eth0 (VIA Technologies, Inc. VT6102 [Rhine-II]) is the connection to the local
intranet and could heve either a static IP or one assigned by a DHCP server;
eth1 (D-Link System Inc RTL8139 Ethernet) does not have an IP and is the
connection to the DSL modem to the Internet such that the DSL connection is
bridged to the ppp0 interface on the Xen0 host.

This problem does not occur in conjunction with a Xen0 host that uses only one
Ethernet interface.

In the /etc/xen/xend-config.sxp configuration file the following is the only
modification made:
#(network-script network-bridge)
(network-script 'network-bridge netdev=eth0')
this modification is necessary because eth0 is not the default route, otherwise
the default route (ppp0) gets zapped when /etc/init.d/xend is started.

A typical XenU guest config file, generated by xenguest-install.py and then
slightly modified, looks like:
name    =       "NS_Server"
memory  =       "64"
disk    =       [ 'file:/opt/vns,xvda,w' ]
vif     =       ['mac=AA:00:C0:A8:FF:1C']
bootloader      =       "/usr/bin/pygrub"
on_poweroff     =       'destroy'
on_reboot       =       'restart'
on_crash        =       'restart'

A study of the packet traffic at the ppp0 interface on the Xen0 host using
ethereal shows that, at least, the received TCP packets have incorrect checksums.

Trying various combinations of interface MTU did not alter the problem.

Version-Release number of selected component (if applicable):
(from /boot/grub/grub.conf)
        kernel /boot/xen.gz-2.6.16-1.2080_FC5 dom0_mem=128M
        module /boot/vmlinuz-2.6.16-1.2080_FC5xen0 ro root=LABEL=/
        module /boot/initrd-2.6.16-1.2080_FC5xen0.img


How reproducible:
For all XenU guests on Xen0 host with this configuration of interfaces.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Russell McOrmond 2006-04-19 19:49:55 UTC

I have a similar setup:


I have 2 DSL connections (Both PPPoE connections), one that is on a separate
router box (running FC4) that I connect to over ethernet (eth0) and the other
that is on xen0.

Kernel: 2.6.16-1.2080_FC5xen0


From the 2.6.16-1.2080_FC5xenU:

- If I route via the separate router box over the ethernet port, I don't get the
error
- If I route via the PPPoE connection I get the error.  Connection to the
network also seems extremely slow, likely as the packets causing the error are
being tossed.

I am assuming the issue is specific to bridging code when relaying packets
coming from the PPP connection.

eth0: RealTek RTL-8029 found at 0xcc00, IRQ 17, 00:60:67:4E:01:46.
eth1: VIA Rhine II at 0xfebff400, 00:15:f2:6f:37:7e, IRQ 18.

Comment 2 Russell McOrmond 2006-04-20 02:48:53 UTC

I just tested with the latest updates-released versions of the xen0 and xenU,
and can confirm I see the same problem.

2.6.16-1.2096_FC5xen0 #1 SMP Wed Apr 19 05:49:52 EDT 2006 i686 athlon i386 GNU/Linux

2.6.16-1.2096_FC5xenU #1 SMP Wed Apr 19 06:07:11 EDT 2006 i686 athlon i386 GNU/Linux

...
Received packet is 10 bytes before head.
printk: 17 messages suppressed.
Received packet is 10 bytes before head.


I also changed eth1 cards (eth0 is from the ASUS A8V-MX motherboard) to see if
that would make a difference. I expect this is a problem unique to the
combination of PPPoE and VIF drivers.

eth1: RealTek RTL8139 at 0xf486ec00, 00:50:ba:50:62:53, IRQ 17
eth1:  Identified 8139 chip type 'RTL-8139C'

Comment 3 lannet 2006-05-25 00:09:02 UTC

Is there any estimate of when this bug will be looked at?

Comment 4 Brian Stein 2006-05-25 13:00:27 UTC

There is active work in this area now, so the bz is queued.  Can you confirm
this is the case with the later xen and kernel-xen packages for FC5 (or rawhide)?

Comment 5 Herbert Xu 2006-05-26 06:20:35 UTC

Yes this is a bug in the way Xen tries to avoid crossing a page boundary when
going from dom0 to domU.  As part of my work in adding scatter and gather
support this problem should go away.

Comment 6 Henning Schmiedehausen 2006-07-12 07:15:41 UTC

I think, this problem is related to iptables / firewall. I have the following setup:

+---------------------------------+
| Host box                        |
|                                 |
|                         ppp0    | - Internet Uplink
|       Unused Bridge --  eth1    | - private subnet (192.168.2.0/24)
|                         eth3    | - Internet Subnet (a.b.c.d/28)
|                           |     |
| +---------+               |     | eth3 has an IP address from the
| | Virtual |             Bridge  | a.b.c.d/28 range
| | System  |               |     |
| |         | eth0 ---------+     | eth0 has another IP address from the 
| +---------+                     | a.b.c.d/28 range, Default route for virtual
|                                 | points to the address on host/eth3
+---------------------------------+

                 Host/eth1     Host/eth3    Host/ppp0  Virtual/eth0
Host/eth1           -            No NAT        NAT       No NAT
Host/eth3         No NAT            -        No NAT      No NAT
Host/ppp0           NAT          No NAT         -        No NAT
Virtual/eth0      No NAT         No NAT      No NAT      No NAT

The virtual host must route all its traffic through the host system (which
should act as firewall for the virtual host). So its default route points to the
IP address on Host/eth3 (connected to the internal bridge).

When I have no iptable rules loaded in the Host system, no error messages are
logged. As soon as I load my firewall rules (and also the NAT Rules for the eth1
interface, which does have a bridge connected but this is not used in the
virtual host), the logging starts:

Received packet is 10 bytes before head.
Received packet is 10 bytes before head.
[ad infinitum]

This happens _only_ for traffic from the virtual system to the internet (which
ironically enough isn't currently affected by any firewall rule). It does not
happen for traffic from the virtual host to systems connected to eth1 (in the
private subnet). 

Both, Host and Virtual server have all current Fedora Core 5 updates installed
(running Kernel 2.6.17-1.2145_FC5)

Comment 7 Herbert Xu 2006-07-12 07:19:24 UTC

The problem is simply that Xen is assuming that all packets passing from dom0 to
domU has a 16-byte headroom which simply isn't the case.  My SG patches for
dom0=>domU removes this assumption.

Comment 8 Henning Schmiedehausen 2006-07-12 07:40:02 UTC

Now that we know where the problem lies, can we assume that we get an updated
Kernel / xen version for FC5 soon? Basically the printk output limits the rate
of packets that a virtual server can deliver. And it messes with my log files. :-)

Comment 9 Henning Schmiedehausen 2006-07-15 09:22:20 UTC

2.6.17-1.2157_FC5 does _NOT_ change this problem for me.

Comment 10 Russell McOrmond 2006-07-27 19:13:48 UTC

Herbert,

Is your SG patch in 2.6.17-1.2157_FC5?

I have been tracking a different problem which might be related, although
because it is so intermittant it is hard to test. 

See comment #1 on https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199944#c1

Shortform:  Sockets get into a state where programs which are listening on a
port no longer answer.  Other ports are working at that point.  Restarting the
program temporarily fixes the problem.

Everything was working until a recent upgrade.  I've tried moving back to
2.6.17-1.2145 , but don't yet know for certain if this avoids the problem.

Comment 11 Herbert Xu 2006-08-02 12:22:10 UTC

The SG support for dom0 => domU has been merged upstream so hopefully we won't
see this bug anymore.  Well at least we won't see it in its current form since I
deleted that printk :)

Russell, this may be related in the sense that the SG patches may have fixed
latent bugs in the networking code.  So once the patches have filtered through I
encourage you to test it and see if you can still reproduce that problem.

Comment 12 Russell McOrmond 2006-08-18 14:54:27 UTC

Herbert,

Do you have a sense of when the SG patches will make it into a kernel I can
receive via Yum?

Just so we have it somewhere, I created a separate bug for that listen()
problem.  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=203122

Comment 13 Henning Schmiedehausen 2006-08-19 08:36:12 UTC

This issue has still not been fixed with 2.6.17-1.2174_FC5

Comment 14 Herbert Xu 2006-08-21 10:25:21 UTC

Sorry it's taking such a long time for these fixes to filter through.  Part of
the reason is we want to make very sure that the new code does not end up
causing bigger problems than the old bugs :)

Hopefully the kernels might be ready this week.

Comment 15 Herbert Xu 2006-09-27 10:49:33 UTC

The 2.6.18 (2189) kernel in FC5 testing should fix this.

Comment 16 Brian Stein 2006-10-26 20:59:03 UTC

Please confirm this is fixed in the current release.

Comment 17 Henning Schmiedehausen 2006-10-29 21:54:51 UTC

Would love to, however #211090 blocks me currently (no xen at all ATM).

Comment 18 Red Hat Bugzilla 2007-07-24 23:54:00 UTC

change QA contact

Comment 19 Chris Lalancette 2008-02-26 23:04:35 UTC

This report targets FC5, which is now end-of-life.

Please re-test against Fedora 7 or later, and if the issue persists, open a new bug.

Thanks