Bug 668934 - UDP transmit under VLAN causes guest freeze
Summary: UDP transmit under VLAN causes guest freeze
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5.z
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Paolo Bonzini
QA Contact: Liang Zheng
URL:
Whiteboard:
Depends On:
Blocks: 514489
TreeView+ depends on / blocked
 
Reported: 2011-01-12 04:44 UTC by Douglas Schilling Landgraf
Modified: 2018-11-14 19:07 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-21 09:22:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1065 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.7 kernel security and bug fix update 2011-07-21 09:21:37 UTC

Description Douglas Schilling Landgraf 2011-01-12 04:44:55 UTC
Description of problem:

UDP transmit under VLAN causes guest freeze

If VM is started with VLAN as the vif, UDP transmit in VM will
causes very high cpu overload and host is almost frozen until
the UDP transmit is finished.

Version-Release number of selected component (if applicable):

Red Hat Enterprise Linux Version Number: RHEL5
Release Number: 5.4GA
Architecture: x86_64
Kernel Version: kernel-2.6.18-164.el5xen
Related Package Version: none
Related Middleware / Application: none

Step to Reproduce:

Test this with two machine, link with e1000 network device. Run
XEN on both machines.

The kernel option of Dom0 should be set to use only one vcpu as
following in /boot/grub/grub.conf:
...
kernel /boot/xen.gz-2.6.18-164.el5 dom0_max_vcpus=1 dom0_mem=1024M
...

The DomU is PV guest, and the kernel version is 2.6.18-164.el5xen,
which is the same as Dom0.

1. start VM1 and VM2 with the default xenbr0 as vif.
2. set VM1 and VM2 to use only one vcpu.
# xm vcpu-pin DOM-ID 0 0,1
3. start netserver on VM1.
# netserver
4. On VM2 use netperf to send UDP datagram to VM1.
# netperf -c -H 10.167.100.43 -l 30 -t UDP_STREAM -- -m 1472
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.167.100.43 (10.167.100.43) port 0 AF_INET
Socket Message Elapsed Messages CPU Service
Size Size Time Okay Errors Throughput Util Demand
bytes bytes secs # # 10^6bits/sec % SU us/KB

110592 1472 30.00 2437319 0 956.7 13.86 1.187
129024 30.00 2437319 956.7 -1.00 -1.000

We get the CPU overload as 13.86%

5. shutdown VM1 and VM2.
6. change the xenbr0 of Machine1 and Machine2 to use VLAN device
instead of physical network device.
# /etc/xen/scripts/network-bridge stop netdev=eth0
# vconfig add eth0 1001
# /etc/xen/scripts/network-bridge start netdev=eth0.1001
# ifconfig eth0.1001 up
# brctl show
bridge name bridge id STP enabled interfaces
xenbr0 8000.feffffffffff no peth0.1001

7. start VM1 and VM2 with the xenbr0 as vif.
8. repeat the step 2~4, and then we will get high cpu overload, and
Machine2 is frozen until netperf is finished.

# netperf -c -H 10.167.105.43 -l 30 -t UDP_STREAM -- -m 1472
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.167.105.43 (10.167.105.43) port 0 AF_INET
Socket Message Elapsed Messages CPU Service
Size Size Time Okay Errors Throughput Util Demand
bytes bytes secs # # 10^6bits/sec % SU us/KB

110592 1472 30.00 8969064 0 3520.2 100.00 8.578
129024 30.00 2433305 955.0 -1.00 -1.000

Actual Results:
VM under VLAN device causes very high CPU overload and it is frozen
until netperf finished.

Expected Results:
VM under VLAN device should have almost the same CPU overload
as it under physical network device.

Comment 4 Paolo Bonzini 2011-01-18 13:47:36 UTC
Assigning to myself for triaging.

Comment 7 Paolo Bonzini 2011-03-22 17:52:24 UTC
I can reproduce it with a tg3.  On the machine running netperf:

xentop output for VLANs:
  Domain-0 -----r         78   91.4    1048764   12.5   no limit       n/a 
rhel55-64pv --b---         56   90.5    1048192   12.5    1048576      12.5

xentop output for no VLANs:
  Domain-0 -----r         87   36.0    1048612   12.5   no limit       n/a 
rhel55-64pv --b---         66   37.1    1048216   12.5    1048576      12.5

and CPU utilization is 3% without VLANs, 97% with.

On the machine running netserver:

VLAN:
  Domain-0 -----r        125   50.7    1048756   33.3   no limit       n/a 
rhel55-64pv -----r        100   60.5    1048056   33.3    1048576      33.3

no VLAN:
  Domain-0 -----r        127   12.5    1048628   33.3   no limit       n/a 
rhel55-64pv --b---        102   13.9    1048136   33.3    1048576      33.3

but the CPU utilization here is 6% with VLANs and 1% without (so no substantial variation).

Comment 8 Paolo Bonzini 2011-03-23 12:28:47 UTC
systemtap shows a vastly higher number of event channel notifications from dom0 to domU.  The first number is the number of calls to force_evtchn_callback, the second is the number of calls to evtchn_do_upcall, the third is the number of packets sent:

with vlans:
  42493 77045 100000
  84276 152564 200000
  128419 229934 300000
  172263 307474 400000

without vlans:
  163 2223 100000
  199 4067 200000
  236 5735 300000
  282 7378 400000

systemtap script:
  global force_evtchn_callback, sys_sendto, evtchn_do_upcall
  probe kernel.function("evtchn_do_upcall").call {
    evtchn_do_upcall++
  }
  probe kernel.function("force_evtchn_callback").call {
    force_evtchn_callback++
  }
  probe kernel.function("sys_sendto").call {
    sys_sendto++
    if (sys_sendto % 100000 == 0)
      printf ("%d %d %d\n",force_evtchn_callback, evtchn_do_upcall, sys_sendto)
  }

Comment 9 Paolo Bonzini 2011-03-23 13:36:16 UTC
Similar results on the host show that notify_remote_via_irq is called more often in the vlan case:

vlans:
  8630 37606 100000
  17145 74748 200000
  25441 111629 300000
  33666 148509 400000

no vlans:
  932 50749 100000
  1850 101209 200000
  2766 151573 300000
  3679 202078 400000

script:
  global notify_remote_via_irq, dev_queue_xmit, evtchn_do_upcall
  probe kernel.function("notify_remote_via_irq").call {
    notify_remote_via_irq++
  }
  probe kernel.function("evtchn_do_upcall").call {
    evtchn_do_upcall++
  }
  probe kernel.function("dev_queue_xmit").call {
    dev_queue_xmit++
    if (dev_queue_xmit % 100000 == 0)
      printf ("%d %d %d\n",notify_remote_via_irq, evtchn_do_upcall, 
              dev_queue_xmit)
  }

Profiling also shows skb_copy_bits relatively high in the profile, but it doesn't show in the no-vlan case, so we're hitting a different code path.

Comment 10 Paolo Bonzini 2011-03-23 13:42:18 UTC
Even better results with s/evtchn_do_upcall/skb_copy_bits/g from the script in comment 9.

vlans:
7148 49996 100000
13937 99994 200000
20715 149993 300000
27562 199991 400000

no vlans:
926 33 100000
1823 40 200000
2718 45 300000
3604 50 400000

Comment 11 Paolo Bonzini 2011-03-23 15:01:42 UTC
More systemtap...

global skb_copy_bits
probe kernel.function("skb_copy_bits").call {
  skb_copy_bits++
  if (skb_copy_bits % 100000 == 0)
    print_stack(backtrace())
}

shows:

   skb_copy_bits
   __pskb_pull_tail
   dev_queue_xmit+0x1c2

This is the second call to __pskb_pull_tail in dev_queue_xmit...

0xffffffff80230d3b <dev_queue_xmit+391>:        mov    0x8c(%rbp),%esi
0xffffffff80230d41 <dev_queue_xmit+397>:        mov    %rbp,%rdi
0xffffffff80230d44 <dev_queue_xmit+400>:        callq  0xffffffff8041f5a7 <__pskb_pull_tail>
...
0xffffffff80230d68 <dev_queue_xmit+436>:        mov    0x8c(%rbp),%esi
0xffffffff80230d6e <dev_queue_xmit+442>:        mov    %rbp,%rdi
0xffffffff80230d71 <dev_queue_xmit+445>:        callq  0xffffffff8041f5a7 <__pskb_pull_tail>
0xffffffff80230d76 <dev_queue_xmit+450>:

... and comparison with the source shows that __pskb_pull_tail is really __skb_linearize:

        /* Fragmented skb is linearized if device does not support SG,
         * or if at least one of fragments is in highmem and device
         * does not support DMA from it.
         */
        if (skb_shinfo(skb)->nr_frags &&
            (!(dev->features & NETIF_F_SG) || illegal_highdma(dev, skb)) &&
            __skb_linearize(skb))
                goto out_kfree_skb;

Indeed, peth0.100 does not support scatter-gather:

# ethtool -k peth0.100
Offload parameters for peth0.100:
Cannot get device tx csum settings: Operation not supported
Cannot get device scatter-gather settings: Operation not supported
Cannot get device udp large send offload settings: Operation not supported
rx-checksumming: on
tx-checksumming: off
scatter-gather: off

Changing component, but keeping the bug.

Comment 12 RHEL Program Management 2011-03-23 15:10:09 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 15 Paolo Bonzini 2011-03-24 15:14:33 UTC
systemtap output with the patch looks much more like the output without VLANs, especially wrt skb_copy_bits:

on the guest (forced_evtchn_callback, evtchn_do_upcall, packets):
  381 1645 100000
  724 3326 200000
  1245 5514 300000
  1637 7288 400000

on the host (notify_remote_via_irq, skb_copy_bits, packets):
  684 6 100000
  1444 10 200000
  2297 15 300000
  3027 19 400000
  3787 25 500000

Comment 21 Jarod Wilson 2011-04-01 22:04:49 UTC
Patch(es) available in kernel-2.6.18-254.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5
Detailed testing feedback is always welcomed.

Comment 37 errata-xmlrpc 2011-07-21 09:22:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html


Note You need to log in before you can comment on or make changes to this bug.