See also bug #495863 which contains the kernel side of this fix. With a configuration like this: Net | Host | Guest | | | +-----+ +--------+ +-----+ | +------------+ +---------+ ----|--+ NIC +--+ Bridge +--+ TAP |--|--+ virtio_net |--| UDP app | | +-----+ +--------+ +-----+ | +------------+ +---------+ | | When the UDP application blasts packets as fast as it can, the TX queue on the NIC overflows and packets get dropped just before they reach the NIC TX queue. Currently, there is no way for the UDP app to be notified that packets are being dropped and that it should slow down. The TUNSETSNDBUF ioctl() added recently allows qemu to set a limit on the packets which can be waiting to be sent. When this limit is hit, write() to the tap device returns EAGAIN and qemu can stop processing packets from the virtio_net queue, which in turn causes the UDP socket queue in the guest to fill up, which in turn causes the UDP app to block. Testing this is quite straightforward: 1) Run netperf -t UDP_STREAM -f m -H <dest_ip> -l 10 2) Observe that without the patch, you see a result like: 124928 1024 10.00 843206 0 690.69 124928 10.00 115031 94.23 i.e. >85% of the packets sent by netperf are dropped 3) Check that /proc/sys/net/bridge/bridge-nf-call-iptables is set to zero 4) Check that the txqueuelen on the physical NIC is set to 1000 5) Retry the test with the patch applied and you should see e.g. 124928 1024 10.00 115297 0 94.44 124928 10.00 115297 94.44 i.e. no packets are lost, the app in the guest is correctly constrained by the physical NIC in the host The patch adds a sndbuf= parameter for '-net tap' and set the default to 1048576. The sndbuf= parameter is upstream as of this commit: http://git.savannah.gnu.org/cgit/qemu.git/commit/?id=0df0ff6de7 I've just now sent a patch upstream to set a default value. The proposed patch for 5.4 isn't a simple cherry-pick of the code from upstream, as upstream has an internal API for buffering which would have been too invasive to backport.
Created attachment 349933 [details] net-add-net-tap-sndbuf-with-a-sensible-default.patch
1. command used: #netperf -H 10.66.70.31 -t UDP_STREAM -l 10 -- -m 2048 2. reproduce on kvm-83-83.el5, but not as serious as comment #0 129024 1024 10.00 162582 0 133.17 129024 10.00 162274 132.91 129024 2048 10.00 93171 0 152.63 129024 10.00 93158 152.61 129024 65507 10.00 5041 0 264.12 129024 10.00 5026 263.33 3. check on kvm-83-90.el5 129024 1024 10.00 231046 0 189.26 129024 10.00 231046 189.26 129024 2048 10.00 99867 0 163.60 129024 10.00 99867 163.60 129024 65507 10.00 5272 0 276.22 129024 10.00 5272 276.22 Can I *VERIFIED* this issue, according to the test result.
shuang: yep, that looks good - no packets were dropped and performance was improved
issue reproduce on kvm-83-94.el5, packets were dropped from 1473 start vm with virtio network interface guest->host for i in 32 64 128 256 512 1024 1278 1407 1472 1473 1475 2048 4096 8192 16834 32768; do netperf -t UDP_STREAM -f m -H 192.168.20.6 -P 0 -l 10 -- -m $i; done 129024 32 10.00 1514106 0 38.76 129024 10.00 1514106 38.76 129024 64 10.00 1536076 0 78.64 129024 10.00 1536076 78.64 129024 128 10.00 1361436 0 139.40 129024 10.00 1361436 139.40 129024 256 10.00 1359981 0 278.51 129024 10.00 1359981 278.51 129024 512 10.00 1304934 0 534.37 129024 10.00 1304934 534.37 129024 1024 10.00 992948 0 813.29 129024 10.00 992948 813.29 129024 1278 10.00 867703 0 887.02 129024 10.00 867703 887.02 129024 1407 10.00 816792 0 919.26 129024 10.00 816792 919.26 129024 1472 10.00 793871 0 934.75 129024 10.00 793871 934.75 129024 1473 10.00 941008 0 1108.76 129024 10.00 551144 649.40 129024 1475 10.00 877103 0 1034.87 129024 10.00 505451 596.37 129024 2048 10.00 789477 0 1293.39 129024 10.00 276077 452.29 129024 4096 10.00 595492 0 1951.04 129024 10.00 89496 293.22 129024 8192 10.00 310392 0 2033.95 129024 10.00 38422 251.77 129024 16834 10.00 158573 0 2135.31 129024 10.00 12290 165.49 129024 32768 10.00 85726 0 2246.81 129024 10.00 1121 29.38 kvm-83-90.el5: 129024 32 10.00 1718361 0 43.99 129024 10.00 1718361 43.99 129024 64 10.00 1414834 0 72.43 129024 10.00 1414834 72.43 129024 128 10.00 1675115 0 171.51 129024 10.00 1675115 171.51 129024 256 10.00 1250219 0 256.00 129024 10.00 1250183 255.99 129024 512 10.00 1196838 0 490.18 129024 10.00 1196838 490.18 129024 1024 10.00 464854 0 380.74 129024 10.00 464854 380.74 129024 1278 10.00 390095 0 398.76 129024 10.00 390095 398.76 129024 1407 10.00 365242 0 411.04 129024 10.00 365242 411.04 129024 1472 10.00 365555 0 430.45 129024 10.00 365555 430.45 129024 1473 10.00 300405 0 353.93 129024 10.00 300405 353.93 129024 1475 10.00 299979 0 353.91 129024 10.00 299979 353.91 129024 2048 10.00 244993 0 401.37 129024 10.00 244993 401.37 129024 4096 10.00 130086 0 426.19 129024 10.00 130086 426.19 129024 8192 10.00 112190 0 735.19 129024 10.00 112190 735.19 129024 16834 10.00 35714 0 480.88 129024 10.00 35714 480.88 129024 32768 10.00 20015 0 524.64 129024 10.00 20015 524.64
host->host: 129024 32 10.00 2836039 0 72.60 129024 10.00 2831649 72.49 129024 64 10.00 2825320 0 144.64 129024 10.00 2821296 144.44 129024 128 10.00 2099014 0 214.92 129024 10.00 2099014 214.92 129024 256 10.00 2065180 0 422.92 129024 10.00 2065180 422.92 129024 512 10.00 1618174 0 662.78 129024 10.00 1618174 662.78 129024 1024 10.00 993023 0 813.45 129024 10.00 993023 813.45 129024 1278 10.00 864798 0 884.16 129024 10.00 864798 884.16 129024 1407 10.00 813930 0 916.08 129024 10.00 813930 916.08 129024 1472 10.00 791482 0 931.97 129024 10.00 791482 931.97 129024 1473 10.00 645177 0 760.22 129024 10.00 645177 760.22 129024 1475 10.00 644466 0 760.43 129024 10.00 644466 760.43 129024 1056 10.00 969495 0 819.01 129024 10.00 969495 819.01 129024 2048 10.00 526325 0 862.31 129024 10.00 526325 862.31 129024 4096 10.00 277663 0 909.76 129024 10.00 277663 909.76 129024 8192 10.00 141111 0 924.70 129024 10.00 141111 924.70 129024 16834 10.00 68699 0 925.11 129024 10.00 68699 925.11 129024 32768 10.00 35372 0 927.20 129024 10.00 35372 927.20
Hi Mark, is it because of cancelling the tx timer? Since it is not super blocker, I tend to postpone it to 5.5
(In reply to comment #11) > issue reproduce on kvm-83-94.el5, packets were dropped from 1473 > > start vm with virtio network interface > > guest->host This bug is not about guest->host UDP packets being dropped, it is about guest->external UDP packets being dropped With guest->host, the guest can send packets faster that the host can receive them and the host drops them. This is a known issue and the fix for this bug does not help it. With guest->external, without the fix for this bug, you'll see the dropped packets accounted for in 'tc -s qdisc' output for the NIC who's txqueuelen we're exceeding With guest->host, you'll see the dropped packets accounted for in the output of 'awk '/^Udp: / { print $4; }' /proc/net/snmp'. This is Udp/InErrors and means that we are exceeding the receiver's socket buffer (see net.core.rmem_default) Please re-test guest->external and move back to VERIFIED if there hasn't been a regression since comment #9
And just to explain further why shuang's figures look like a regression, but they're not: with kvm-83-90.el5 we see: 129024 32768 10.00 20015 0 524.64 129024 10.00 20015 524.64 i.e. the guest is only managing to send 524Mbit/s to the host in kvm-83-94.el5 we removed the tx mitigation timer (bug #504647) allowing the guest to send much much faster: 129024 32768 10.00 85726 0 2246.81 129024 10.00 1121 29.38 except that because it's sending so fast now, the host is dropping heaps of packets But again, the send buffer limit only helps guest->external, not guest->host
the result above on comment#11 is tested on guest->external. and I test again: 1. stop iptables on external machine: #service iptables stop 2. on external machine: [root@dhcp-66-70-31 ~]# sysctl net.bridge.bridge-nf-call-iptables=0 net.bridge.bridge-nf-call-iptables = 0 [root@dhcp-66-70-31 ~]# sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-iptables = 0 3. Start vm at another machine with virtio network interface and run: #for i in 32 64 128 256 512 1024 1278 1407 1472 1473 1475 2048 4096 8192 16834 32768; do netperf -t UDP_STREAM -f m -H 192.168.20.6 -P 0 -l 10 -- -m $i; done result: 129024 32 10.00 1884503 0 48.24 129024 10.00 1272852 32.58 129024 64 10.00 1810618 0 92.68 129024 10.00 959737 49.13 129024 128 10.00 1726797 0 176.81 129024 10.00 645245 66.07 129024 256 10.00 2394939 0 490.47 129024 10.00 388075 79.48 129024 512 10.00 2557121 0 1047.14 129024 10.00 217027 88.87 129024 1024 10.00 115435 0 94.56 129024 10.00 115435 94.56 129024 1278 10.00 93703 0 95.78 129024 10.00 93703 95.78 129024 1407 10.00 84475 0 95.07 129024 10.00 84475 95.07 129024 1472 10.00 81074 0 95.46 129024 10.00 81074 95.46 129024 1473 10.00 1388354 0 1635.83 129024 10.00 11133 13.12 udp_send: data send error: Message too long 129024 4096 10.00 911530 0 2986.44 129024 10.00 3170 10.39 129024 8192 10.00 477551 0 3129.31 129024 10.00 429 2.81
run Start vm at another machine with virtio network interface and run: #for i in 32 64 128 256 512 1024 1278 1407 1472 1473 1475 2048 4096 8192 16834 32768; do netperf -t UDP_STREAM -f m -H 10.66.70.31 -P 0 -l 10 -- -m $i; done
comment #11, comment #12 and this one are tested with crossover kvm-83-94.el5 #for i in 32 64 128 256 512 1024 1278 1407 1472 1473 1475 2048 4096 8192 16834 32768; do netperf -t UDP_STREAM -f m -H 192.168.20.8 -P 0 -l 10 -- -m $i; done 129024 32 10.00 1570558 0 40.20 129024 10.00 1570558 40.20 129024 64 10.00 1615932 0 82.72 129024 10.00 1615932 82.72 129024 128 10.00 1405154 0 143.88 129024 10.00 1405154 143.88 129024 256 10.00 1325752 0 271.49 129024 10.00 1325752 271.49 129024 512 10.00 1355074 0 555.02 129024 10.00 1355074 555.02 129024 1024 10.00 993922 0 814.07 129024 10.00 993922 814.07 129024 1278 10.00 871561 0 890.93 129024 10.00 871561 890.93 129024 1407 10.00 821584 0 924.75 129024 10.00 821584 924.75 129024 1472 10.00 799208 0 941.09 129024 10.00 799208 941.09 129024 1473 10.00 926606 0 1091.74 129024 10.00 545836 643.11 129024 1056 10.00 970946 0 820.15 129024 10.00 970946 820.15 129024 2048 10.00 795668 0 1303.47 129024 10.00 272712 446.76 129024 4096 10.00 545874 0 1788.40 129024 10.00 83105 272.27 129024 8192 10.00 292311 0 1915.35 129024 10.00 35432 232.17 129024 16834 10.00 152800 0 2057.62 129024 10.00 11023 148.44 129024 32768 10.00 78703 0 2062.89 129024 10.00 686 17.98
(In reply to comment #16) > 2. on external machine: > [root@dhcp-66-70-31 ~]# sysctl net.bridge.bridge-nf-call-iptables=0 > net.bridge.bridge-nf-call-iptables = 0 > [root@dhcp-66-70-31 ~]# sysctl net.bridge.bridge-nf-call-iptables > net.bridge.bridge-nf-call-iptables = 0 Please run these two sysctl commands on the host - i.e. the machine the VM is running on Also do the following: 1) On the external machine run: $> awk '/^Udp: / { print $4; }' /proc/net/snmp 2) On the host machine (i.e. the machine the vm is running on) run: $> tc -s qdisc 3) Run e.g. $> netperf -t UDP_STREAM -f m -H 192.168.20.8 -P 0 -l 10 -- 16834 4) Repeat (1) and (2)
stop iptables and run sysctl net.bridge.bridge-nf-call-iptables=0 on both host and externel machine. #awk '/^Udp: / { print $4; }' /proc/net/snmp 1. before transfer Udp: InDatagrams NoPorts InErrors OutDatagrams Udp: 8 297 0 305 2. after transfer Udp: InDatagrams NoPorts InErrors OutDatagrams Udp: 13306163 489 968 497 #tc -s qdisc 1. before transfer qdisc pfifo_fast 0: dev eth0 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 42216 bytes 345 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 2. after transfer qdisc pfifo_fast 0: dev eth0 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 11277062809 bytes 17277606 pkt (dropped 776750, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 qdisc pfifo_fast 0: dev tap0 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 31530 bytes 254 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 129024 32 10.00 2210611 0 56.58 129024 10.00 2210250 56.58 129024 64 10.00 2371876 0 121.42 129024 10.00 2371753 121.42 129024 128 10.00 1808021 0 185.12 129024 10.00 1808021 185.12 129024 256 10.00 1770731 0 362.62 129024 10.00 1770731 362.62 129024 512 10.00 2444342 0 1001.07 129024 10.00 1667592 682.95 129024 1024 10.00 991550 0 812.12 129024 10.00 991550 812.12 129024 1472 10.00 790218 0 930.40 129024 10.00 790218 930.40 129024 1473 10.00 646686 0 761.94 129024 10.00 646686 761.94 129024 2048 10.00 526457 0 862.43 129024 10.00 526457 862.43 129024 4096 10.00 277568 0 909.44 129024 10.00 277568 909.44 129024 8192 10.00 141145 0 924.82 129024 10.00 141145 924.82 129024 16834 10.00 68698 0 924.99 129024 10.00 68698 924.99 129024 32768 10.00 35486 0 930.09 129024 10.00 35486 930.09
lihuang makes a good point - bridge-nf-call-iptables=0 needs to be the default for rhev-h. I'll file a new bug (In reply to comment #20) > 129024 512 10.00 2444342 0 1001.07 > 129024 10.00 1667592 682.95 this data point is strange; but all the other data points show the fix is working, I think we have enough to mark this as VERIFIED
setting to *VERIFIED* according comment #20 and comment #21.
(In reply to comment #21) > lihuang makes a good point - bridge-nf-call-iptables=0 needs to be the default > for rhev-h. I'll file a new bug Filed as bug #514905
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-1272.html