Description of problem: * Network throughput drops seriously on DomU to DomU node traffic on RHEL5.3 Xen when NIC performs RSC. * The issue can be seen on any RHEL5.3 Xen host with intra-node traffic (between peer DomUs). * Dom-0 and Dom-U display errors that indicate why performance dropped (shown below). * Looks like the netback to netfront packets (receive side traffic) throws this error when packet size > 4K. * Not seen on PV guests. Version-Release number of selected component (if applicable): * Red Hat Enterprise Linux 5.3 * Xen How reproducible: Consistently Steps to Reproduce: This is a customer reported issue and Red Hat Support hasn't been able to reproduce this. Reproducer from customer: 1)Configure two HVM Linux guests on RHEL5.3 Xen host and test intra-Xen node network bandwidth (nerperf or such).The errors will be seen easily with intra-droplet traffic. 2)Please use TCP. No special parameters needed. Just a regular TCP stream to the default port. E.g. 'netserver on onside; 'netperf -H <IP address>' on the otherside. Actual results: Network throughput drops seriously on DomU to DomU node traffic on RHEL5.3 Xen when NIC performs RSC. Dom-0 and Dom-U display errors that indicate why performance dropped. Dom-U: [root@domU-12-31-39-0D-13-05 ~]# tail /var/log/messages Mar 1 14:01:13 domU-12-31-39-0D-13-05 kernel: printk: 11 messages suppressed. Mar 1 14:01:13 domU-12-31-39-0D-13-05 kernel: netfront: rx->offset: 0, size: 4294967295 Mar 1 14:01:15 domU-12-31-39-0D-13-05 kernel: printk: 4 messages suppressed. Mar 1 14:01:15 domU-12-31-39-0D-13-05 kernel: netfront: rx->offset: 0, size: 4294967295 Mar 1 14:01:21 domU-12-31-39-0D-13-05 kernel: printk: 3 messages suppressed. Mar 1 14:01:21 domU-12-31-39-0D-13-05 kernel: netfront: rx->offset: 0, size: 4294967295 Mar 1 14:01:27 domU-12-31-39-0D-13-05 kernel: printk: 2 messages suppressed. Mar 1 14:01:27 domU-12-31-39-0D-13-05 kernel: netfront: rx->offset: 0, size: 4294967295 Mar 1 14:01:31 domU-12-31-39-0D-13-05 kernel: printk: 3 messages suppressed. Mar 1 14:01:31 domU-12-31-39-0D-13-05 kernel: netfront: rx->offset: 0, size: 4294967295 Dom0: (XEN) printk: 8 messages suppressed. (XEN) grant_table.c:1373:d0 copy beyond page area. (XEN) printk: 11 messages suppressed. (XEN) grant_table.c:1373:d0 copy beyond page area. (XEN) printk: 4 messages suppressed. (XEN) grant_table.c:1373:d0 copy beyond page area. (XEN) printk: 3 messages suppressed. (XEN) grant_table.c:1373:d0 copy beyond page area. (XEN) printk: 2 messages suppressed. (XEN) grant_table.c:1373:d0 copy beyond page area. (XEN) printk: 3 messages suppressed. (XEN) grant_table.c:1373:d0 copy beyond page area. Expected results: Fully functional network stack Additional info: Following patch seems to have resolved the issue for the customer. http://article.gmane.org/gmane.comp.emulators.xen.cvs/12093 diff -up ./drivers/xen/netback/netback.c.orig1 ./drivers/xen/netback/netback.c --- ./drivers/xen/netback/netback.c.orig1 2010-03-13 17:28:49.000000000 +0000 +++ ./drivers/xen/netback/netback.c 2010-03-13 17:31:41.000000000 +0000 @@ -250,7 +250,11 @@ int netif_be_start_xmit(struct sk_buff * * Copy the packet here if it's destined for a flipping interface * but isn't flippable (e.g. extra references to data). */ - if (!netif->copying_receiver) { + /* Current netback grant copy code doesn't seem to handle the case + where headlen crosses page boundary. Handling that here - Pradeep + Vincent*/ + if (!netif->copying_receiver || + ((skb_headlen(skb) + offset_in_page(skb->data)) >= PAGE_SIZE)) { struct sk_buff *nskb = netbk_copy_skb(skb); if ( unlikely(nskb == NULL) ) goto drop;
This is c/s 14893 in upstream Xen.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-206.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
I can't reproduce this bug against 2.6.18-194 kernel, confirmed with Paolo, it's safe to include this patch in the build, and it's ok to verify it with sanity check. linux-2.6-virt-xen-netback-copy-skbuffs-if-head-crosses-pages.patch is applied correctly in this build, and also do the test against 2.6.18-238 build with the steps mentioned by customer, no issue found. Change this bug to VERIFIED.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html