Bug 2061622 - ovn qos doesn't work on rt-kernel
Summary: ovn qos doesn't work on rt-kernel
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn-2021
Version: FDP 22.B
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: OVN Team
QA Contact: LiLiang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-08 03:30 UTC by LiLiang
Modified: 2022-03-15 00:31 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-15 00:31:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1821 0 None None None 2022-03-08 03:32:21 UTC

Description LiLiang 2022-03-08 03:30:28 UTC
Description of problem:
ovn qos rate limitation doesn't work on rt-kernel

Version-Release number of selected component (if applicable):
[root@dell-per740-07 basic]# uname -r
4.18.0-369.rt7.154.el8.x86_64

[root@dell-per740-07 basic]# rpm -qa|grep ovn
ovn-2021-host-21.12.0-30.el8fdp.x86_64
ovn-2021-21.12.0-30.el8fdp.x86_64
ovn-2021-central-21.12.0-30.el8fdp.x86_64

[root@dell-per740-07 basic]# rpm -qa|grep openv
openvswitch-selinux-extra-policy-1.0-29.el8fdp.noarch
python3-openvswitch2.16-2.16.0-58.el8fdp.x86_64
openvswitch2.16-2.16.0-58.el8fdp.x86_64


How reproducible:
always

Steps to Reproduce:
1. create two vms on a host
```
# define default vnet
virsh net-define /usr/share/libvirt/networks/default.xml
virsh net-start default
virsh net-autostart default

# define vm name and mac
vm_name=v0
mac4vm=a4:a4:a4:a4:a4:a0

# download image
wget http://netqe-bj.usersys.redhat.com/share/vms/rhel7.9-rt.qcow2 -O /var/lib/libvirt/images/$vm_name.qcow2

# install vm
virt-install \
        --name $vm_name \
        --vcpus=2 \
        --ram=2048 \
        --disk path=/var/lib/libvirt/images/$vm_name.qcow2,device=disk,bus=virtio,format=qcow2 \
        --network bridge=virbr0,model=virtio,mac=$mac4vm \
        --boot hd \
        --accelerate \
        --graphics vnc,listen=0.0.0.0 \
        --force \
        --os-type=linux \
        --noautoconsol

# define vm name and mac
vm_name=v1
mac4vm=a4:a4:a4:a4:a4:a1

# download image
wget http://netqe-bj.usersys.redhat.com/share/vms/rhel7.9-rt.qcow2 -O /var/lib/libvirt/images/$vm_name.qcow2

# install vm
virt-install \
        --name $vm_name \
        --vcpus=2 \
        --ram=2048 \
        --disk path=/var/lib/libvirt/images/$vm_name.qcow2,device=disk,bus=virtio,format=qcow2 \
        --network bridge=virbr0,model=virtio,mac=$mac4vm \
        --boot hd \
        --accelerate \
        --graphics vnc,listen=0.0.0.0 \
        --force \
        --os-type=linux \
        --noautoconsol
```

2. start ovn and attch virtual ports to vms
```
systemctl start openvswitch
systemctl start ovn-northd
ovn-sbctl set-connection ptcp:6642
ovn-nbctl set-connection ptcp:6641

ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=tcp:20.0.32.25:6642
ovs-vsctl set Open_vSwitch . external-ids:ovn-encap-type=geneve
ovs-vsctl set Open_vSwitch . external-ids:ovn-encap-ip=20.0.32.25
systemctl restart ovn-controller


mac_v0_vnet1=04:ac:10:ff:01:94
mac_v0_vnet2=04:ac:10:ff:01:95
mac_v1_vnet1=04:ac:10:ff:01:96
mac_v1_vnet2=04:ac:10:ff:01:97

cat <<-EOF > v0-vnet1.xml 
<interface type='bridge'>
        <target dev='h0_v0_vnet1'/>
        <mac address='${mac_v0_vnet1}'/>
        <source bridge='br-int'/>
        <virtualport type='openvswitch'/>
        <model type='virtio'/>
</interface>
EOF


cat <<-EOF > v0-vnet2.xml 
<interface type='bridge'>
        <target dev='h0_v0_vnet2'/>
        <mac address='${mac_v0_vnet2}'/>
        <source bridge='br-int'/>
        <virtualport type='openvswitch'/>
        <model type='virtio'/>
</interface>
EOF

cat <<-EOF > v1-vnet1.xml 
<interface type='bridge'>
        <target dev='h0_v1_vnet1'/>
        <mac address='${mac_v1_vnet1}'/>
        <source bridge='br-int'/>
        <virtualport type='openvswitch'/>
        <model type='virtio'/>
</interface>
EOF

cat <<-EOF > v1-vnet2.xml 
<interface type='bridge'>
        <target dev='h0_v1_vnet2'/>
        <mac address='${mac_v1_vnet2}'/>
        <source bridge='br-int'/>
        <virtualport type='openvswitch'/>
        <model type='virtio'/>
</interface>
EOF

virsh attach-device v0 v0-vnet1.xml
virsh attach-device v0 v0-vnet2.xml
virsh attach-device v1 v1-vnet1.xml
virsh attach-device v1 v1-vnet2.xml

ovs-vsctl set interface h0_v0_vnet1 external-ids:iface-id=h0_v0_vnet1
ovs-vsctl set interface h0_v0_vnet2 external-ids:iface-id=h0_v0_vnet2
ovs-vsctl set interface h0_v1_vnet1 external-ids:iface-id=h0_v1_vnet1
ovs-vsctl set interface h0_v1_vnet2 external-ids:iface-id=h0_v1_vnet2
```

3. create ovn topology
```
mac_h0_v0_vnet1=04:ac:10:ff:01:94
mac_h0_v0_vnet2=04:ac:10:ff:01:95
mac_h0_v1_vnet1=04:ac:10:ff:01:96
mac_h0_v1_vnet2=04:ac:10:ff:01:97
mac_h1_v0_vnet1=02:ac:10:ff:01:94
mac_h1_v0_vnet2=02:ac:10:ff:01:95
mac_h1_v1_vnet1=02:ac:10:ff:01:96
mac_h1_v1_vnet2=02:ac:10:ff:01:97

# add logical switch
ovn-nbctl ls-add ls1 -- add Logical_Switch ls1 other_config subnet=172.16.1.0/24
ovn-nbctl ls-add ls2 -- add Logical_Switch ls2 other_config subnet=172.16.2.0/24

# setup ls ipv6_prefix
ovn-nbctl set Logical-switch ls1 other_config:ipv6_prefix=2001:db8:1::0
ovn-nbctl set Logical-switch ls2 other_config:ipv6_prefix=2001:db8:2::0

# create dhcp_options
dhcp_options1=$(ovn-nbctl create DHCP_Options cidr=172.16.1.0/24 \
        options="\"server_id\"=\"172.16.1.254\" \"server_mac\"=\"00:00:00:00:01:00\" \
        \"lease_time\"=\"$((36000 + RANDOM % 3600))\" \"router\"=\"172.16.1.254\" \"dns_server\"=\"172.16.1.254\"")
dhcp_options2=$(ovn-nbctl create DHCP_Options cidr=172.16.2.0/24 \
        options="\"server_id\"=\"172.16.2.254\" \"server_mac\"=\"00:00:00:00:02:00\" \
        \"lease_time\"=\"$((36000 + RANDOM % 3600))\" \"router\"=\"172.16.2.254\" \"dns_server\"=\"172.16.2.254\"")

dhcpv6_options1=$(ovn-nbctl create DHCP_Options cidr="2001\:db8\:1\:\:0/64" \
                                options="\"server_id\"=\"00:00:00:00:01:00\" \"dns_server\"=\"2001:db8:1::254\"")
dhcpv6_options2=$(ovn-nbctl create DHCP_Options cidr="2001\:db8\:2\:\:0/64" \
                                options="\"server_id\"=\"00:00:00:00:02:00\" \"dns_server\"=\"2001:db8:2::254\"")

# create logical switch port and setup dhcp_option
lsp_name=h0_v0_vnet1
mac=$mac_h0_v0_vnet1
ovn-nbctl lsp-add ls1 $lsp_name
ovn-nbctl lsp-set-addresses $lsp_name "$mac dynamic"
ovn-nbctl lsp-set-dhcpv4-options $lsp_name ${dhcp_options1}
ovn-nbctl add Logical_Switch_Port $lsp_name dhcpv6_options ${dhcpv6_options1}

lsp_name=h0_v0_vnet2
mac=$mac_h0_v0_vnet2
ovn-nbctl lsp-add ls2 $lsp_name
ovn-nbctl lsp-set-addresses $lsp_name "$mac dynamic"
ovn-nbctl lsp-set-dhcpv4-options $lsp_name ${dhcp_options2}
ovn-nbctl add Logical_Switch_Port $lsp_name dhcpv6_options ${dhcpv6_options2}

lsp_name=h0_v1_vnet1
mac=$mac_h0_v1_vnet1
ovn-nbctl lsp-add ls1 $lsp_name
ovn-nbctl lsp-set-addresses $lsp_name "$mac dynamic"
ovn-nbctl lsp-set-dhcpv4-options $lsp_name ${dhcp_options1}
ovn-nbctl add Logical_Switch_Port $lsp_name dhcpv6_options ${dhcpv6_options1}

lsp_name=h0_v1_vnet2
mac=$mac_h0_v1_vnet2
ovn-nbctl lsp-add ls2 $lsp_name
ovn-nbctl lsp-set-addresses $lsp_name "$mac dynamic"
ovn-nbctl lsp-set-dhcpv4-options $lsp_name ${dhcp_options2}
ovn-nbctl add Logical_Switch_Port $lsp_name dhcpv6_options ${dhcpv6_options2}
```

4. create qos, limit rate to 500M from h0_v0_vnet1 to other
```
mac_v0_vnet1=04:ac:10:ff:01:94
mac_v0_vnet2=04:ac:10:ff:01:95
mac_v1_vnet1=04:ac:10:ff:01:96
mac_v1_vnet2=04:ac:10:ff:01:97
ip_v0_vnet1=172.16.1.2
ip_v1_vnet1=172.16.1.3

ofport1=$(ovs-vsctl --bare --columns=ofport find  interface external-ids:attached-mac=\"$mac_v0_vnet1\")
ofport2=$(ovs-vsctl --bare --columns=ofport find  interface external-ids:attached-mac=\"$mac_v1_vnet1\")
echo "openflow port for h0_v0_vnet1 is $ofport1"
echo "openflow port for h0_v1_vnet1 is $ofport2"


qos_id=`ovn-nbctl --wait=hv -- --id=@lp1-qos create QoS priority=100 action=dscp=48 match="inport\=\=\"h0_v0_vnet1\"\ &&\ is_chassis_resident(\"h0_v0_vnet1\")" direction="from-lport" -- set Logical_Switch ls1 qos_rules=@lp1-qos`

# 500Mbps
ovn-nbctl --wait=hv set QoS $qos_id bandwidth=rate=500000,burst=4294967295
```

5. check qos using iperf3 in vm h0_v0
[root@dell-per740-07 ~]# virsh console v0
Connected to domain 'v0'
Escape character is ^] (Ctrl + ])

[root@localhost ~]# iperf3 -c 172.16.1.3
Connecting to host 172.16.1.3, port 5201
[  4] local 172.16.1.2 port 33982 connected to 172.16.1.3 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  2.19 GBytes  18.8 Gbits/sec    0   2.77 MBytes       
[  4]   1.00-2.00   sec  2.48 GBytes  21.3 Gbits/sec    0   2.99 MBytes       
[  4]   2.00-3.00   sec  2.09 GBytes  17.9 Gbits/sec    0   3.01 MBytes       
[  4]   3.00-4.00   sec  2.20 GBytes  18.9 Gbits/sec    0   3.01 MBytes       
[  4]   4.00-5.00   sec  1.59 GBytes  13.7 Gbits/sec    0   3.01 MBytes       
[  4]   5.00-6.00   sec  1.60 GBytes  13.7 Gbits/sec    0   3.01 MBytes       
[  4]   6.00-7.00   sec  1.56 GBytes  13.4 Gbits/sec    0   3.01 MBytes       
[  4]   7.00-8.00   sec  1.62 GBytes  13.9 Gbits/sec    0   3.01 MBytes       
[  4]   8.00-9.00   sec  1.90 GBytes  16.3 Gbits/sec    0   3.01 MBytes       
[  4]   9.00-10.00  sec  1.89 GBytes  16.3 Gbits/sec    0   3.01 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  19.1 GBytes  16.4 Gbits/sec    0             sender
[  4]   0.00-10.00  sec  19.1 GBytes  16.4 Gbits/sec                  receiver


Actual results:
rate limitation doesn't work

Expected results:
rate limitation works


Additional info:

Comment 1 LiLiang 2022-03-08 03:58:27 UTC
[root@dell-per740-07 ~]# ovn-nbctl show
switch 5c89c144-2e96-41af-9a34-0af86f7c6236 (ls1)
    port h0_v1_vnet1
        addresses: ["04:ac:10:ff:01:96 dynamic"]
    port h0_v0_vnet1
        addresses: ["04:ac:10:ff:01:94 dynamic"]
switch 1d4ae76e-9b11-45ba-a63b-47f25daeaadf (ls2)
    port h0_v0_vnet2
        addresses: ["04:ac:10:ff:01:95 dynamic"]
    port h0_v1_vnet2
        addresses: ["04:ac:10:ff:01:97 dynamic"]

[root@dell-per740-07 ~]# ovn-sbctl show
Chassis "3f5a199a-1b5c-4490-a762-6ce434d67a85"
    hostname: dell-per740-07.rhts.eng.pek2.redhat.com
    Encap geneve
        ip: "20.0.32.25"
        options: {csum="true"}
    Port_Binding h0_v1_vnet1
    Port_Binding h0_v0_vnet1
    Port_Binding h0_v0_vnet2
    Port_Binding h0_v1_vnet2

[root@dell-per740-07 basic]# ovn-nbctl list qos
_uuid               : 580a606c-b0af-4356-ad71-4010c0dcb99b
action              : {dscp=48}
bandwidth           : {burst=4294967295, rate=500000}
direction           : from-lport
external_ids        : {}
match               : "inport==\"h0_v0_vnet1\" && is_chassis_resident(\"h0_v0_vnet1\")"
priority            : 100

[root@dell-per740-07 ~]# ovn-nbctl qos-list ls1
from-lport   100 (inport=="h0_v0_vnet1" && is_chassis_resident("h0_v0_vnet1")) rate=500000 burst=4294967295 dscp=48

Comment 2 Ilya Maximets 2022-03-08 12:01:40 UTC
"burst" is the amount of traffic allowed to pass through without
being rate-limited.  Burst size here is 4TB.  Such configuration
doesn't make a lot of sense.

Comment 3 LiLiang 2022-03-09 01:10:25 UTC
(In reply to Ilya Maximets from comment #2)
> "burst" is the amount of traffic allowed to pass through without
> being rate-limited.  Burst size here is 4TB.  Such configuration
> doesn't make a lot of sense.

llya,

bandwidth: map of string-integer pairs, key either burst or rate, value
       in range 1 to 4,294,967,295
              When specified, matching packets will  have  bandwidth  metering
              applied. Traffic over the limit will be dropped.

              ·      rate: The value of rate limit in kbps.

              ·      burst: The value of burst rate limit in kilobits. This is
                     optional and needs to specify the rate.

From this doc, I see burst should be a rate, not a amount of traffic.

And when I set rate to 5M and burst to 100M, the rate could be limited to 5M correctly:
# ovn-nbctl --wait=hv set QoS 792b3cd8-f706-4ffd-8c82-b5b59f17fd31 bandwidth=rate=5000,burst=100000
# iperf3 -c 172.16.1.3 -t 10
Connecting to host 172.16.1.3, port 5201
[  4] local 172.16.1.2 port 39684 connected to 172.16.1.3 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  13.2 MBytes   110 Mbits/sec  265   45.2 KBytes       
[  4]   1.00-2.00   sec   764 KBytes  6.26 Mbits/sec  108   18.4 KBytes       
[  4]   2.00-3.00   sec   382 KBytes  3.13 Mbits/sec   60   18.4 KBytes       
[  4]   3.00-4.00   sec   764 KBytes  6.26 Mbits/sec   51   18.4 KBytes       
[  4]   4.00-5.00   sec   382 KBytes  3.13 Mbits/sec   60   18.4 KBytes       
[  4]   5.00-6.00   sec   764 KBytes  6.26 Mbits/sec   60   18.4 KBytes       
[  4]   6.00-7.00   sec   382 KBytes  3.13 Mbits/sec   48   18.4 KBytes       
[  4]   7.00-8.00   sec   764 KBytes  6.26 Mbits/sec   61   18.4 KBytes       
[  4]   8.00-9.00   sec   764 KBytes  6.26 Mbits/sec   60   18.4 KBytes       
[  4]   9.00-10.00  sec   382 KBytes  3.13 Mbits/sec   48   18.4 KBytes 
From the iperf3 output, we can see, the first stream's bandwidth reach 110Mbps which is close to our brust value 100Mbps.
And the following streams' bandwitch are close to 5Mbps.

But if i set a very large burst value, then the rate limitation can't work.
# ovn-nbctl --wait=hv set QoS 792b3cd8-f706-4ffd-8c82-b5b59f17fd31 bandwidth=rate=5000,burst=2000000000
# iperf3 -c 172.16.1.3 -t 10
Connecting to host 172.16.1.3, port 5201
[  4] local 172.16.1.2 port 39688 connected to 172.16.1.3 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  2.66 GBytes  22.8 Gbits/sec    0   3.00 MBytes       
[  4]   1.00-2.00   sec  3.17 GBytes  27.2 Gbits/sec    0   3.02 MBytes       
[  4]   2.00-3.00   sec  3.15 GBytes  27.1 Gbits/sec    0   3.02 MBytes       
[  4]   3.00-4.00   sec  2.78 GBytes  23.9 Gbits/sec    0   3.02 MBytes       
[  4]   4.00-5.00   sec  2.22 GBytes  19.1 Gbits/sec    0   3.02 MBytes       
[  4]   5.00-6.00   sec  2.44 GBytes  21.0 Gbits/sec    0   3.02 MBytes       
[  4]   6.00-7.00   sec  2.37 GBytes  20.4 Gbits/sec    0   3.02 MBytes       
[  4]   7.00-8.00   sec  2.24 GBytes  19.2 Gbits/sec    0   3.02 MBytes       
[  4]   8.00-9.00   sec  2.22 GBytes  19.0 Gbits/sec    0   3.02 MBytes       
[  4]   9.00-10.00  sec  2.33 GBytes  20.0 Gbits/sec    0   3.02 MBytes

Comment 4 Ilya Maximets 2022-03-09 12:06:20 UTC
(In reply to LiLiang from comment #3)
> bandwidth: map of string-integer pairs, key either burst or rate, value
>        in range 1 to 4,294,967,295
>               When specified, matching packets will  have  bandwidth 
>               metering applied. Traffic over the limit will be dropped.
>               ·      rate: The value of rate limit in kbps.
>               ·      burst: The value of burst rate limit in kilobits. This
>                      is optional and needs to specify the rate.
> From this doc, I see burst should be a rate, not a amount of traffic.

In the doc above it's stated that burst is in kilobits, not in kbps.
So it's amount, not the rate.

If burst is rate and the rate is rate, what is the difference should be then?

In general in various metering and policing implementations, burst
size means the granularity of the meter.  For example, OpenFlow spec
says: 

  It defines the granularity of the meter band, for all packet or byte
  bursts whose length is greater than burst value, the meter rate will
  always be strictly enforced.

https://opennetworking.org/wp-content/uploads/2014/10/openflow-switch-v1.5.1.pdf

Meaning the rate may not be enforced if the traffic burst didn't
exceed the burst size.

Linux traffic policers work the same way (and they are used by OVN, IIUC).

In practice some variation of a token bucket algorithm is used with
the burst size being the size of the bucket, and the rate is the rate of
regeneration of tokens.  Algorithms are typically starting with a full
bucket.  Meaning that until the initial burst is exhausted, rate limit
is not applied.  That matches with your observations.

You may see a following article about choosing the right burst size:
  https://www.juniper.net/documentation/us/en/software/junos/routing-policy/topics/concept/policer-mx-m120-m320-burstsize-determining.html
It's about policers, but it's largely applicable to all metering
and policing implementations.

Comment 5 LiLiang 2022-03-15 00:31:43 UTC
glad to know this, thank you!


Note You need to log in before you can comment on or make changes to this bug.