Bug 578259 - Network throughput drops seriously on DomU to DomU node traffic on RHEL5.3 Xen when NIC performs RSC.
Summary: Network throughput drops seriously on DomU to DomU node traffic on RHEL5.3 Xe...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.3
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Paolo Bonzini
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 514489
TreeView+ depends on / blocked
 
Reported: 2010-03-30 18:00 UTC by Nandini Chandra
Modified: 2018-11-14 20:13 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-13 21:22:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 10:37:42 UTC

Description Nandini Chandra 2010-03-30 18:00:01 UTC
Description of problem:
* Network throughput drops seriously on DomU to DomU node traffic on RHEL5.3 Xen when NIC performs RSC.
* The issue can be seen on any RHEL5.3 Xen host with intra-node traffic (between peer DomUs).
* Dom-0 and Dom-U display errors that indicate why performance dropped (shown below).
* Looks like the netback to netfront packets (receive side traffic) throws this error when packet size > 4K.
* Not seen on PV guests.


Version-Release number of selected component (if applicable):
* Red Hat Enterprise Linux 5.3
* Xen

How reproducible:
Consistently


Steps to Reproduce:
This is a customer reported issue and Red Hat Support hasn't been able to reproduce this.
Reproducer from customer:
1)Configure two HVM Linux guests on RHEL5.3 Xen host and test intra-Xen node network bandwidth (nerperf or such).The errors will be seen easily with intra-droplet traffic.
2)Please use TCP. No special parameters needed. Just a regular TCP stream to the default port. E.g. 'netserver on onside; 'netperf -H <IP address>' on the otherside.

  
Actual results:
Network throughput drops seriously on DomU to DomU node traffic on RHEL5.3 Xen when NIC performs RSC.

Dom-0 and Dom-U display errors that indicate why performance dropped. 

Dom-U:

[root@domU-12-31-39-0D-13-05 ~]# tail /var/log/messages
Mar  1 14:01:13 domU-12-31-39-0D-13-05 kernel: printk: 11 messages suppressed.
Mar  1 14:01:13 domU-12-31-39-0D-13-05 kernel: netfront: rx->offset: 0, size: 4294967295
Mar  1 14:01:15 domU-12-31-39-0D-13-05 kernel: printk: 4 messages suppressed.
Mar  1 14:01:15 domU-12-31-39-0D-13-05 kernel: netfront: rx->offset: 0, size: 4294967295
Mar  1 14:01:21 domU-12-31-39-0D-13-05 kernel: printk: 3 messages suppressed.
Mar  1 14:01:21 domU-12-31-39-0D-13-05 kernel: netfront: rx->offset: 0, size: 4294967295
Mar  1 14:01:27 domU-12-31-39-0D-13-05 kernel: printk: 2 messages suppressed.
Mar  1 14:01:27 domU-12-31-39-0D-13-05 kernel: netfront: rx->offset: 0, size: 4294967295
Mar  1 14:01:31 domU-12-31-39-0D-13-05 kernel: printk: 3 messages suppressed.
Mar  1 14:01:31 domU-12-31-39-0D-13-05 kernel: netfront: rx->offset: 0, size: 4294967295

Dom0:

(XEN) printk: 8 messages suppressed.
(XEN) grant_table.c:1373:d0 copy beyond page area.
(XEN) printk: 11 messages suppressed.
(XEN) grant_table.c:1373:d0 copy beyond page area.
(XEN) printk: 4 messages suppressed.
(XEN) grant_table.c:1373:d0 copy beyond page area.
(XEN) printk: 3 messages suppressed.
(XEN) grant_table.c:1373:d0 copy beyond page area.
(XEN) printk: 2 messages suppressed.
(XEN) grant_table.c:1373:d0 copy beyond page area.
(XEN) printk: 3 messages suppressed.
(XEN) grant_table.c:1373:d0 copy beyond page area.



Expected results:
Fully functional network stack


Additional info:
Following patch seems to have resolved the issue for the customer.

http://article.gmane.org/gmane.comp.emulators.xen.cvs/12093

diff -up ./drivers/xen/netback/netback.c.orig1 ./drivers/xen/netback/netback.c
--- ./drivers/xen/netback/netback.c.orig1    2010-03-13 17:28:49.000000000 +0000
+++ ./drivers/xen/netback/netback.c    2010-03-13 17:31:41.000000000 +0000
@@ -250,7 +250,11 @@ int netif_be_start_xmit(struct sk_buff *
    * Copy the packet here if it's destined for a flipping interface
    * but isn't flippable (e.g. extra references to data).
    */
-    if (!netif->copying_receiver) {
+    /* Current netback grant copy code doesn't seem to handle the case
+       where headlen crosses page boundary. Handling that here - Pradeep
+                                Vincent*/
+        if (!netif->copying_receiver ||
+            ((skb_headlen(skb) + offset_in_page(skb->data)) >= PAGE_SIZE)) {
       struct sk_buff *nskb = netbk_copy_skb(skb);
       if ( unlikely(nskb == NULL) )
           goto drop;

Comment 1 Paolo Bonzini 2010-03-31 10:54:47 UTC
This is c/s 14893 in upstream Xen.

Comment 6 RHEL Program Management 2010-06-22 16:19:18 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 8 Jarod Wilson 2010-07-12 15:45:28 UTC
in kernel-2.6.18-206.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 10 Qixiang Wan 2010-12-24 06:13:11 UTC
I can't reproduce this bug against 2.6.18-194 kernel, confirmed with Paolo, it's safe to include this patch in the build, and it's ok to verify it with sanity check. linux-2.6-virt-xen-netback-copy-skbuffs-if-head-crosses-pages.patch is applied correctly in this build, and also do the test against 2.6.18-238 build with the steps mentioned by customer, no issue found. Change this bug to VERIFIED.

Comment 12 errata-xmlrpc 2011-01-13 21:22:25 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html


Note You need to log in before you can comment on or make changes to this bug.