Description of problem: UDP transmit under VLAN causes guest freeze If VM is started with VLAN as the vif, UDP transmit in VM will causes very high cpu overload and host is almost frozen until the UDP transmit is finished. Version-Release number of selected component (if applicable): Red Hat Enterprise Linux Version Number: RHEL5 Release Number: 5.4GA Architecture: x86_64 Kernel Version: kernel-2.6.18-164.el5xen Related Package Version: none Related Middleware / Application: none Step to Reproduce: Test this with two machine, link with e1000 network device. Run XEN on both machines. The kernel option of Dom0 should be set to use only one vcpu as following in /boot/grub/grub.conf: ... kernel /boot/xen.gz-2.6.18-164.el5 dom0_max_vcpus=1 dom0_mem=1024M ... The DomU is PV guest, and the kernel version is 2.6.18-164.el5xen, which is the same as Dom0. 1. start VM1 and VM2 with the default xenbr0 as vif. 2. set VM1 and VM2 to use only one vcpu. # xm vcpu-pin DOM-ID 0 0,1 3. start netserver on VM1. # netserver 4. On VM2 use netperf to send UDP datagram to VM1. # netperf -c -H 10.167.100.43 -l 30 -t UDP_STREAM -- -m 1472 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.167.100.43 (10.167.100.43) port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SU us/KB 110592 1472 30.00 2437319 0 956.7 13.86 1.187 129024 30.00 2437319 956.7 -1.00 -1.000 We get the CPU overload as 13.86% 5. shutdown VM1 and VM2. 6. change the xenbr0 of Machine1 and Machine2 to use VLAN device instead of physical network device. # /etc/xen/scripts/network-bridge stop netdev=eth0 # vconfig add eth0 1001 # /etc/xen/scripts/network-bridge start netdev=eth0.1001 # ifconfig eth0.1001 up # brctl show bridge name bridge id STP enabled interfaces xenbr0 8000.feffffffffff no peth0.1001 7. start VM1 and VM2 with the xenbr0 as vif. 8. repeat the step 2~4, and then we will get high cpu overload, and Machine2 is frozen until netperf is finished. # netperf -c -H 10.167.105.43 -l 30 -t UDP_STREAM -- -m 1472 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.167.105.43 (10.167.105.43) port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SU us/KB 110592 1472 30.00 8969064 0 3520.2 100.00 8.578 129024 30.00 2433305 955.0 -1.00 -1.000 Actual Results: VM under VLAN device causes very high CPU overload and it is frozen until netperf finished. Expected Results: VM under VLAN device should have almost the same CPU overload as it under physical network device.
Assigning to myself for triaging.
I can reproduce it with a tg3. On the machine running netperf: xentop output for VLANs: Domain-0 -----r 78 91.4 1048764 12.5 no limit n/a rhel55-64pv --b--- 56 90.5 1048192 12.5 1048576 12.5 xentop output for no VLANs: Domain-0 -----r 87 36.0 1048612 12.5 no limit n/a rhel55-64pv --b--- 66 37.1 1048216 12.5 1048576 12.5 and CPU utilization is 3% without VLANs, 97% with. On the machine running netserver: VLAN: Domain-0 -----r 125 50.7 1048756 33.3 no limit n/a rhel55-64pv -----r 100 60.5 1048056 33.3 1048576 33.3 no VLAN: Domain-0 -----r 127 12.5 1048628 33.3 no limit n/a rhel55-64pv --b--- 102 13.9 1048136 33.3 1048576 33.3 but the CPU utilization here is 6% with VLANs and 1% without (so no substantial variation).
systemtap shows a vastly higher number of event channel notifications from dom0 to domU. The first number is the number of calls to force_evtchn_callback, the second is the number of calls to evtchn_do_upcall, the third is the number of packets sent: with vlans: 42493 77045 100000 84276 152564 200000 128419 229934 300000 172263 307474 400000 without vlans: 163 2223 100000 199 4067 200000 236 5735 300000 282 7378 400000 systemtap script: global force_evtchn_callback, sys_sendto, evtchn_do_upcall probe kernel.function("evtchn_do_upcall").call { evtchn_do_upcall++ } probe kernel.function("force_evtchn_callback").call { force_evtchn_callback++ } probe kernel.function("sys_sendto").call { sys_sendto++ if (sys_sendto % 100000 == 0) printf ("%d %d %d\n",force_evtchn_callback, evtchn_do_upcall, sys_sendto) }
Similar results on the host show that notify_remote_via_irq is called more often in the vlan case: vlans: 8630 37606 100000 17145 74748 200000 25441 111629 300000 33666 148509 400000 no vlans: 932 50749 100000 1850 101209 200000 2766 151573 300000 3679 202078 400000 script: global notify_remote_via_irq, dev_queue_xmit, evtchn_do_upcall probe kernel.function("notify_remote_via_irq").call { notify_remote_via_irq++ } probe kernel.function("evtchn_do_upcall").call { evtchn_do_upcall++ } probe kernel.function("dev_queue_xmit").call { dev_queue_xmit++ if (dev_queue_xmit % 100000 == 0) printf ("%d %d %d\n",notify_remote_via_irq, evtchn_do_upcall, dev_queue_xmit) } Profiling also shows skb_copy_bits relatively high in the profile, but it doesn't show in the no-vlan case, so we're hitting a different code path.
Even better results with s/evtchn_do_upcall/skb_copy_bits/g from the script in comment 9. vlans: 7148 49996 100000 13937 99994 200000 20715 149993 300000 27562 199991 400000 no vlans: 926 33 100000 1823 40 200000 2718 45 300000 3604 50 400000
More systemtap... global skb_copy_bits probe kernel.function("skb_copy_bits").call { skb_copy_bits++ if (skb_copy_bits % 100000 == 0) print_stack(backtrace()) } shows: skb_copy_bits __pskb_pull_tail dev_queue_xmit+0x1c2 This is the second call to __pskb_pull_tail in dev_queue_xmit... 0xffffffff80230d3b <dev_queue_xmit+391>: mov 0x8c(%rbp),%esi 0xffffffff80230d41 <dev_queue_xmit+397>: mov %rbp,%rdi 0xffffffff80230d44 <dev_queue_xmit+400>: callq 0xffffffff8041f5a7 <__pskb_pull_tail> ... 0xffffffff80230d68 <dev_queue_xmit+436>: mov 0x8c(%rbp),%esi 0xffffffff80230d6e <dev_queue_xmit+442>: mov %rbp,%rdi 0xffffffff80230d71 <dev_queue_xmit+445>: callq 0xffffffff8041f5a7 <__pskb_pull_tail> 0xffffffff80230d76 <dev_queue_xmit+450>: ... and comparison with the source shows that __pskb_pull_tail is really __skb_linearize: /* Fragmented skb is linearized if device does not support SG, * or if at least one of fragments is in highmem and device * does not support DMA from it. */ if (skb_shinfo(skb)->nr_frags && (!(dev->features & NETIF_F_SG) || illegal_highdma(dev, skb)) && __skb_linearize(skb)) goto out_kfree_skb; Indeed, peth0.100 does not support scatter-gather: # ethtool -k peth0.100 Offload parameters for peth0.100: Cannot get device tx csum settings: Operation not supported Cannot get device scatter-gather settings: Operation not supported Cannot get device udp large send offload settings: Operation not supported rx-checksumming: on tx-checksumming: off scatter-gather: off Changing component, but keeping the bug.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
systemtap output with the patch looks much more like the output without VLANs, especially wrt skb_copy_bits: on the guest (forced_evtchn_callback, evtchn_do_upcall, packets): 381 1645 100000 724 3326 200000 1245 5514 300000 1637 7288 400000 on the host (notify_remote_via_irq, skb_copy_bits, packets): 684 6 100000 1444 10 200000 2297 15 300000 3027 19 400000 3787 25 500000
Patch(es) available in kernel-2.6.18-254.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html