Bug 508861

Summary: kvm: add tap send buffer limit to help UDP networking
Product: Red Hat Enterprise Linux 5 Reporter: Mark McLoughlin <markmc>
Component: kvmAssignee: Mark McLoughlin <markmc>
Status: CLOSED ERRATA QA Contact: Lawrence Lim <llim>
Severity: medium Docs Contact:
Priority: high    
Version: 5.4CC: herbert.xu, jiabwang, lihuang, mwagner, sghosh, shuang, syeghiay, tburke, tools-bugs, virt-maint, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kvm-83-90.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 05:27:45 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On: 495863    
Bug Blocks:    
Attachments:
Description Flags
net-add-net-tap-sndbuf-with-a-sensible-default.patch none

Description Mark McLoughlin 2009-06-30 05:54:15 EDT
See also bug #495863 which contains the kernel side of this fix.

With a configuration like this:
 
 Net |           Host                 |      Guest
     |                                |
     |  +-----+  +--------+  +-----+  |  +------------+  +---------+
 ----|--+ NIC +--+ Bridge +--+ TAP |--|--+ virtio_net |--| UDP app |
     |  +-----+  +--------+  +-----+  |  +------------+  +---------+
     |                                |

When the UDP application blasts packets as fast as it can, the TX queue on the NIC overflows and packets get dropped just before they reach the NIC TX queue.

Currently, there is no way for the UDP app to be notified that packets are being dropped and that it should slow down.

The TUNSETSNDBUF ioctl() added recently allows qemu to set a limit on the packets which can be waiting to be sent. When this limit is hit, write() to the tap device returns EAGAIN and qemu can stop processing packets from the virtio_net queue, which in turn causes the UDP socket queue in the guest to fill up, which in turn causes the UDP app to block.

Testing this is quite straightforward:

  1) Run netperf -t UDP_STREAM -f m -H <dest_ip> -l 10

  2) Observe that without the patch, you see a result like:

       124928    1024   10.00      843206      0     690.69
       124928           10.00      115031             94.23

     i.e. >85% of the packets sent by netperf are dropped

  3) Check that /proc/sys/net/bridge/bridge-nf-call-iptables is set to zero

  4) Check that the txqueuelen on the physical NIC is set to 1000

  5) Retry the test with the patch applied and you should see e.g.

       124928    1024   10.00      115297      0      94.44
       124928           10.00      115297             94.44

     i.e. no packets are lost, the app in the guest is correctly constrained
     by the physical NIC in the host


The patch adds a sndbuf= parameter for '-net tap' and set the default to 1048576.

The sndbuf= parameter is upstream as of this commit:

  http://git.savannah.gnu.org/cgit/qemu.git/commit/?id=0df0ff6de7

I've just now sent a patch upstream to set a default value.

The proposed patch for 5.4 isn't a simple cherry-pick of the code from upstream, as upstream has an internal API for buffering which would have been too invasive to backport.
Comment 1 Mark McLoughlin 2009-06-30 06:02:46 EDT
Created attachment 349933 [details]
net-add-net-tap-sndbuf-with-a-sensible-default.patch
Comment 9 Suqin Huang 2009-07-20 05:41:06 EDT
1. command used:
#netperf -H 10.66.70.31 -t UDP_STREAM -l 10  -- -m 2048

2. reproduce on kvm-83-83.el5, but not as serious as comment #0

129024    1024   10.00      162582      0     133.17
129024           10.00      162274            132.91

129024    2048   10.00       93171      0     152.63
129024           10.00       93158            152.61

129024   65507   10.00        5041      0     264.12
129024           10.00        5026            263.33


3. check on kvm-83-90.el5

129024    1024   10.00      231046      0     189.26
129024           10.00      231046            189.26


129024    2048   10.00       99867      0     163.60
129024           10.00       99867            163.60


129024   65507   10.00        5272      0     276.22
129024           10.00        5272            276.22


Can I *VERIFIED* this issue, according to the test result.
Comment 10 Mark McLoughlin 2009-07-22 08:51:51 EDT
shuang: yep, that looks good - no packets were dropped and performance was improved
Comment 11 Suqin Huang 2009-07-29 22:56:58 EDT
issue reproduce on kvm-83-94.el5, packets were dropped from 1473

start vm with virtio network interface

guest->host

for i in 32 64 128 256 512 1024 1278 1407 1472 1473 1475 2048 4096 8192 16834 32768; do netperf -t UDP_STREAM -f m -H 192.168.20.6 -P 0 -l 10 -- -m $i; done


129024      32   10.00     1514106      0      38.76
129024           10.00     1514106             38.76

129024      64   10.00     1536076      0      78.64
129024           10.00     1536076             78.64

129024     128   10.00     1361436      0     139.40
129024           10.00     1361436            139.40

129024     256   10.00     1359981      0     278.51
129024           10.00     1359981            278.51

129024     512   10.00     1304934      0     534.37
129024           10.00     1304934            534.37

129024    1024   10.00      992948      0     813.29
129024           10.00      992948            813.29

129024    1278   10.00      867703      0     887.02
129024           10.00      867703            887.02

129024    1407   10.00      816792      0     919.26
129024           10.00      816792            919.26

129024    1472   10.00      793871      0     934.75
129024           10.00      793871            934.75

129024    1473   10.00      941008      0    1108.76
129024           10.00      551144            649.40

129024    1475   10.00      877103      0    1034.87
129024           10.00      505451            596.37

129024    2048   10.00      789477      0    1293.39
129024           10.00      276077            452.29

129024    4096   10.00      595492      0    1951.04
129024           10.00       89496            293.22

129024    8192   10.00      310392      0    2033.95
129024           10.00       38422            251.77

129024   16834   10.00      158573      0    2135.31
129024           10.00       12290            165.49

129024   32768   10.00       85726      0    2246.81
129024           10.00        1121             29.38




kvm-83-90.el5:

129024      32   10.00     1718361      0      43.99
129024           10.00     1718361             43.99

129024      64   10.00     1414834      0      72.43
129024           10.00     1414834             72.43

129024     128   10.00     1675115      0     171.51
129024           10.00     1675115            171.51

129024     256   10.00     1250219      0     256.00
129024           10.00     1250183            255.99

129024     512   10.00     1196838      0     490.18
129024           10.00     1196838            490.18

129024    1024   10.00      464854      0     380.74
129024           10.00      464854            380.74

129024    1278   10.00      390095      0     398.76
129024           10.00      390095            398.76

129024    1407   10.00      365242      0     411.04
129024           10.00      365242            411.04

129024    1472   10.00      365555      0     430.45
129024           10.00      365555            430.45

129024    1473   10.00      300405      0     353.93
129024           10.00      300405            353.93

129024    1475   10.00      299979      0     353.91
129024           10.00      299979            353.91

129024    2048   10.00      244993      0     401.37
129024           10.00      244993            401.37

129024    4096   10.00      130086      0     426.19
129024           10.00      130086            426.19

129024    8192   10.00      112190      0     735.19
129024           10.00      112190            735.19

129024   16834   10.00       35714      0     480.88
129024           10.00       35714            480.88

129024   32768   10.00       20015      0     524.64
129024           10.00       20015            524.64
Comment 12 Suqin Huang 2009-07-29 22:58:59 EDT
host->host:

129024      32   10.00     2836039      0      72.60
129024           10.00     2831649             72.49

129024      64   10.00     2825320      0     144.64
129024           10.00     2821296            144.44

129024     128   10.00     2099014      0     214.92
129024           10.00     2099014            214.92

129024     256   10.00     2065180      0     422.92
129024           10.00     2065180            422.92

129024     512   10.00     1618174      0     662.78
129024           10.00     1618174            662.78

129024    1024   10.00      993023      0     813.45
129024           10.00      993023            813.45

129024    1278   10.00      864798      0     884.16
129024           10.00      864798            884.16

129024    1407   10.00      813930      0     916.08
129024           10.00      813930            916.08

129024    1472   10.00      791482      0     931.97
129024           10.00      791482            931.97

129024    1473   10.00      645177      0     760.22
129024           10.00      645177            760.22

129024    1475   10.00      644466      0     760.43
129024           10.00      644466            760.43

129024    1056   10.00      969495      0     819.01
129024           10.00      969495            819.01

129024    2048   10.00      526325      0     862.31
129024           10.00      526325            862.31

129024    4096   10.00      277663      0     909.76
129024           10.00      277663            909.76

129024    8192   10.00      141111      0     924.70
129024           10.00      141111            924.70

129024   16834   10.00       68699      0     925.11
129024           10.00       68699            925.11

129024   32768   10.00       35372      0     927.20
129024           10.00       35372            927.20
Comment 13 Dor Laor 2009-07-30 03:55:26 EDT
Hi Mark, is it because of cancelling the tx timer?
Since it is not super blocker, I tend to postpone it to 5.5
Comment 14 Mark McLoughlin 2009-07-30 04:48:07 EDT
(In reply to comment #11)
> issue reproduce on kvm-83-94.el5, packets were dropped from 1473
> 
> start vm with virtio network interface
> 
> guest->host

This bug is not about guest->host UDP packets being dropped, it is about guest->external UDP packets being dropped

With guest->host, the guest can send packets faster that the host can receive them and the host drops them. This is a known issue and the fix for this bug does not help it.

With guest->external, without the fix for this bug, you'll see the dropped packets accounted for in 'tc -s qdisc' output for the NIC who's txqueuelen we're exceeding

With guest->host, you'll see the dropped packets accounted for in the output of 'awk '/^Udp: / { print $4; }' /proc/net/snmp'. This is Udp/InErrors and means that we are exceeding the receiver's socket buffer (see net.core.rmem_default)

Please re-test guest->external and move back to VERIFIED if there hasn't been a regression since comment #9
Comment 15 Mark McLoughlin 2009-07-30 05:05:32 EDT
And just to explain further why shuang's figures look like a regression, but they're not:

with kvm-83-90.el5 we see:

129024   32768   10.00       20015      0     524.64
129024           10.00       20015            524.64

i.e. the guest is only managing to send 524Mbit/s to the host

in kvm-83-94.el5 we removed the tx mitigation timer (bug #504647) allowing the guest to send much much faster:

129024   32768   10.00       85726      0    2246.81
129024           10.00        1121             29.38

except that because it's sending so fast now, the host is dropping heaps of packets

But again, the send buffer limit only helps guest->external, not guest->host
Comment 16 Suqin Huang 2009-07-30 23:28:22 EDT
the result above on comment#11 is tested on guest->external.

and I test again:
1. stop iptables on external machine: #service iptables stop
2. on external machine:  
[root@dhcp-66-70-31 ~]# sysctl net.bridge.bridge-nf-call-iptables=0
net.bridge.bridge-nf-call-iptables = 0
[root@dhcp-66-70-31 ~]# sysctl net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-iptables = 0

3. Start vm at another machine with virtio network interface and run: 
#for i in 32 64 128 256 512 1024 1278 1407 1472 1473 1475 2048 4096 8192 16834
32768; do netperf -t UDP_STREAM -f m -H 192.168.20.6 -P 0 -l 10 -- -m $i; done

result:
129024      32   10.00     1884503      0      48.24
129024           10.00     1272852             32.58

129024      64   10.00     1810618      0      92.68
129024           10.00      959737             49.13

129024     128   10.00     1726797      0     176.81
129024           10.00      645245             66.07

129024     256   10.00     2394939      0     490.47
129024           10.00      388075             79.48

129024     512   10.00     2557121      0    1047.14
129024           10.00      217027             88.87

129024    1024   10.00      115435      0      94.56
129024           10.00      115435             94.56

129024    1278   10.00       93703      0      95.78
129024           10.00       93703             95.78

129024    1407   10.00       84475      0      95.07
129024           10.00       84475             95.07

129024    1472   10.00       81074      0      95.46
129024           10.00       81074             95.46

129024    1473   10.00     1388354      0    1635.83
129024           10.00       11133             13.12

udp_send: data send error: Message too long
129024    4096   10.00      911530      0    2986.44
129024           10.00        3170             10.39

129024    8192   10.00      477551      0    3129.31
129024           10.00         429              2.81
Comment 17 Suqin Huang 2009-07-30 23:31:30 EDT
run Start vm at another machine with virtio network interface and run: 
#for i in 32 64 128 256 512 1024 1278 1407 1472 1473 1475 2048 4096 8192 16834
32768; do netperf -t UDP_STREAM -f m -H 10.66.70.31 -P 0 -l 10 -- -m $i; done
Comment 18 Suqin Huang 2009-07-31 00:46:12 EDT
comment #11, comment #12 and this one are tested with crossover

kvm-83-94.el5

#for i in 32 64 128 256 512 1024 1278 1407 1472 1473 1475 2048 4096 8192 16834
32768; do netperf -t UDP_STREAM -f m -H 192.168.20.8 -P 0 -l 10 -- -m $i; done

129024      32   10.00     1570558      0      40.20
129024           10.00     1570558             40.20

129024      64   10.00     1615932      0      82.72
129024           10.00     1615932             82.72

129024     128   10.00     1405154      0     143.88
129024           10.00     1405154            143.88

129024     256   10.00     1325752      0     271.49
129024           10.00     1325752            271.49

129024     512   10.00     1355074      0     555.02
129024           10.00     1355074            555.02

129024    1024   10.00      993922      0     814.07
129024           10.00      993922            814.07

129024    1278   10.00      871561      0     890.93
129024           10.00      871561            890.93

129024    1407   10.00      821584      0     924.75
129024           10.00      821584            924.75

129024    1472   10.00      799208      0     941.09
129024           10.00      799208            941.09

129024    1473   10.00      926606      0    1091.74
129024           10.00      545836            643.11

129024    1056   10.00      970946      0     820.15
129024           10.00      970946            820.15

129024    2048   10.00      795668      0    1303.47
129024           10.00      272712            446.76

129024    4096   10.00      545874      0    1788.40
129024           10.00       83105            272.27

129024    8192   10.00      292311      0    1915.35
129024           10.00       35432            232.17

129024   16834   10.00      152800      0    2057.62
129024           10.00       11023            148.44

129024   32768   10.00       78703      0    2062.89
129024           10.00         686             17.98
Comment 19 Mark McLoughlin 2009-07-31 03:44:05 EDT
(In reply to comment #16)
> 2. on external machine:  
> [root@dhcp-66-70-31 ~]# sysctl net.bridge.bridge-nf-call-iptables=0
> net.bridge.bridge-nf-call-iptables = 0
> [root@dhcp-66-70-31 ~]# sysctl net.bridge.bridge-nf-call-iptables
> net.bridge.bridge-nf-call-iptables = 0

Please run these two sysctl commands on the host - i.e. the machine the VM is running on

Also do the following:

  1) On the external machine run:

       $> awk '/^Udp: / { print $4; }' /proc/net/snmp

  2) On the host machine (i.e. the machine the vm is running on) run:

       $> tc -s qdisc

  3) Run e.g.

       $> netperf -t UDP_STREAM -f m -H 192.168.20.8 -P 0 -l 10 -- 16834

  4) Repeat (1) and (2)
Comment 20 Suqin Huang 2009-07-31 07:38:28 EDT
stop iptables and run sysctl net.bridge.bridge-nf-call-iptables=0 on both host and externel machine.


#awk '/^Udp: / { print $4; }' /proc/net/snmp

1. before transfer
Udp: InDatagrams NoPorts InErrors OutDatagrams
Udp: 8 297 0 305


2. after transfer
Udp: InDatagrams NoPorts InErrors OutDatagrams
Udp: 13306163 489 968 497


#tc -s qdisc

1. before transfer
qdisc pfifo_fast 0: dev eth0 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 42216 bytes 345 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 


2. after transfer

qdisc pfifo_fast 0: dev eth0 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 11277062809 bytes 17277606 pkt (dropped 776750, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 
qdisc pfifo_fast 0: dev tap0 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 31530 bytes 254 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 



129024      32   10.00     2210611      0      56.58
129024           10.00     2210250             56.58

129024      64   10.00     2371876      0     121.42
129024           10.00     2371753            121.42

129024     128   10.00     1808021      0     185.12
129024           10.00     1808021            185.12

129024     256   10.00     1770731      0     362.62
129024           10.00     1770731            362.62

129024     512   10.00     2444342      0    1001.07
129024           10.00     1667592            682.95

129024    1024   10.00      991550      0     812.12
129024           10.00      991550            812.12

129024    1472   10.00      790218      0     930.40
129024           10.00      790218            930.40

129024    1473   10.00      646686      0     761.94
129024           10.00      646686            761.94

129024    2048   10.00      526457      0     862.43
129024           10.00      526457            862.43

129024    4096   10.00      277568      0     909.44
129024           10.00      277568            909.44

129024    8192   10.00      141145      0     924.82
129024           10.00      141145            924.82

129024   16834   10.00       68698      0     924.99
129024           10.00       68698            924.99

129024   32768   10.00       35486      0     930.09
129024           10.00       35486            930.09
Comment 21 Mark McLoughlin 2009-07-31 08:54:12 EDT
lihuang makes a good point - bridge-nf-call-iptables=0 needs to be the default for rhev-h. I'll file a new bug

(In reply to comment #20)
 
> 129024     512   10.00     2444342      0    1001.07
> 129024           10.00     1667592            682.95

this data point is strange; but all the other data points show the fix is working, I think we have enough to mark this as VERIFIED
Comment 22 lihuang 2009-07-31 09:05:20 EDT
setting to *VERIFIED* according comment #20 and comment #21.
Comment 23 Mark McLoughlin 2009-07-31 09:07:25 EDT
(In reply to comment #21)
> lihuang makes a good point - bridge-nf-call-iptables=0 needs to be the default
> for rhev-h. I'll file a new bug

Filed as bug #514905
Comment 25 errata-xmlrpc 2009-09-02 05:27:45 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1272.html