The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 2106570 - [RFE] support no-mlockall and CPUAffinity in OVS
Summary: [RFE] support no-mlockall and CPUAffinity in OVS
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: openvswitch2.17
Version: FDP 22.A
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Michael Santana
QA Contact: liting
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-13 03:37 UTC by zenghui.shi
Modified: 2025-02-06 04:25 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-10-08 17:49:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-2117 0 None None None 2022-07-13 03:40:51 UTC

Description zenghui.shi 2022-07-13 03:37:14 UTC
Description of problem:

Ovn-kubernetes is chosen as default CNI solution MicroShift[0] which is optimized and simplified version of openshift built for Internet-of-Things and edge computing that are both CPU and memory constrained. Since ovn-kubernetes uses OVS as the underlying datapath, several optimizations are used in OVS to meet the small footprint requirement:

1) CPUAffinity

CPUAffinity restricts the CPU cores where OVS services are running on. This option is applied to both ovs-vswitchd.service and ovsdb-server.service profiles. This option also helps with reducing the number of n-handler-threads which is not configurable in latest OVS versions, such as openvswitch2.17.

2) no-mlockall

--no-mlockall, when used in starting ovs-vswitchd and/or ovsdb-server services, is observed to reduce the memory pre-allocation significantly. From 70M reduced to 15M in a 2G & 1vcpu VM.

By allowing these two options to be set at ovs-vswitchd and ovsdb-server service startup/restart, it would improve the overall footprint of running ovn-kubernetes in MicroShift.


This RFE is to request official support of no-mlockall and CPUAffinity options in OVS so that layered product such as MicroShift can make use of such options.

[0]: https://github.com/openshift/microshift


Additional info:

An example of applying both options to ovs-vswitchd.service

[Service]
Type=forking
CPUAffinity=0           <=== CPUAffinity
PIDFile=/var/run/openvswitch/ovs-vswitchd.pid
Restart=on-failure
Environment=XDG_RUNTIME_DIR=/var/run/openvswitch
EnvironmentFile=/etc/openvswitch/default.conf
EnvironmentFile=-/etc/sysconfig/openvswitch
EnvironmentFile=-/run/openvswitch.useropts
LimitSTACK=2M
ExecStartPre=-/bin/sh -c '/usr/bin/chown :$${OVS_USER_ID##*:} /dev/hugepages'
ExecStartPre=-/usr/bin/chmod 0775 /dev/hugepages
ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-mlockall \           <=== no-mlockall
          --no-ovsdb-server --no-monitor --system-id=random \
          ${OVS_USER_OPT} \
          start $OPTIONS
ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop
ExecReload=/usr/share/openvswitch/scripts/ovs-ctl --no-mlockall --no-ovsdb-server \           <=== no-mlockall
          --no-monitor --system-id=random \
          ${OVS_USER_OPT} \
          restart $OPTIONS
TimeoutSec=300

Comment 1 Eelco Chaudron 2022-07-13 08:43:38 UTC
From an engineering perspective, the following questions arise:

- Setting CPUAffinity=0, limits the use of a single core. This might impact upcall handling with the new implementation, so from an ENG side, we need some investigation.
- Performance might be lower, as memory will probably be allocated on the fly. The existing performance tests might capture this, QA?
- Same might be true for limiting the CPUs being used. 
- Testing matrix does not need to explode, so how would we test these additional configuration changes? QA?

Comment 2 Christian Trautman 2022-07-13 18:14:58 UTC
On the QA side we can see how the performance is impacted by these changes/options.
Will provide some sample data as time permits to run this.

Comment 3 zenghui.shi 2022-07-15 09:53:18 UTC
Adding performance and scalability requirements.

MicroShift is being designed to run in a single-node configuration including the control plane and node components in one package. Expanding to Highly
Available deployments will cause a resource budget expansion because of the CNI requirements, so we will favor two MicroShift deployments running
active/active where HA is needed.

The expected workload from the early customer in the MicroShift scenarios are:
kubernetes Pods: 12-25
kubernetes Services: 5-10
kubernetes NetworkPolicies: 2-3 (really depends on what customer is trying to do)

With the understanding that there will be performance penalties when enabling these two ovs parameters, customer, who is net-perf focused, will be instructed to use the default openvswitch settings if applicable.

Comment 4 Eelco Chaudron 2022-07-19 08:48:04 UTC
Note that the mlocall option can already be configured from the /etc/sysconfig/openvswitch file:

# Pass or not --mlockall option to ovs-vswitchd.
# This option should be set to "yes" or "no".  The default is "yes".
# Enabling this option can avoid networking interruptions due to
# system memory pressure in extraordinary situations, such as multiple
# concurrent VM import operations.
# --mlockall=yes


Also, note that setting the CPUAffinity=0 in the service file is not really user-friendly.
It should be overwritten using the systemd drop-in feature, this is an example for stack size I used in the past:

$ mkdir -p /etc/systemd/system/ovs-vswitchd.service.d/
$ echo -e "[Service]\nLimitSTACK=8M" > /etc/systemd/system/ovs-vswitchd.service.d/limitstack.conf
$ systemctl daemon-reload
$ systemctl restart openvswitch
$ cat /proc/$(pidof ovs-vswitchd)/limits | grep stack -
Max stack size            8388608              8388608              bytes

Comment 5 liting 2022-07-21 08:47:38 UTC
I run comparing test between our standard container performance results and versus enable the CPUAffinity=0 and --no-mlockall options. The 1q 2pmd case got the similar result. The 2q 4pmd case with enable the options got slightly lower performance than disable the options. The 4q 8pmd case run failed when enable the two options. Please help me have a look. Thanks
The ovs verison and kernel verison:
[root@dell-per740-57 ~]# rpm -qa|grep openvs
kernel-kernel-networking-openvswitch-perf-1.0-334.noarch
openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch
openvswitch2.17-2.17.0-30.el9fdp.x86_64
[root@dell-per740-57 ~]# uname -r
5.14.0-130.el9.x86_64

our standard container performance results
1q 2pmd case: 9863248pps
2q 4pmd case: 19419076pps
4q 8pmd case: 21338436pps
enable CPUAffinity=0 and --no-mlockall options results
1q 2pmd case: 9853706pps
2q 4pmd case: 18831290pps
4q 8pmd case: failed 

Without enable no-mlockall and CPUAffinity=0. The ovs-vswitchd memory is 237916kb.
[root@dell-per740-57 perf]# ps axu|grep ovs
openvsw+  111715  0.0  0.0  45508 20712 ?        S<s  03:21   0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
openvsw+  111765  399  0.3 269263432 237916 ?    S<Lsl 03:21 122:58 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
root      119184  0.0  0.0   6408  2212 pts/1    S+   03:52   0:00 grep --color=auto ovs

I used following steps to enable no-mlockall and CPUAffinity=0. 
Add following CPUAffinity=0 and --no-mlockall to /usr/lib/systemd/system/ovs-vswitchd.service and /usr/lib/systemd/system/ovsdb-server.service.
[Service]
Type=forking
CPUAffinity=0           
PIDFile=/var/run/openvswitch/ovs-vswitchd.pid
Restart=on-failure
Environment=XDG_RUNTIME_DIR=/var/run/openvswitch
EnvironmentFile=/etc/openvswitch/default.conf
EnvironmentFile=-/etc/sysconfig/openvswitch
EnvironmentFile=-/run/openvswitch.useropts
LimitSTACK=2M
ExecStartPre=-/bin/sh -c '/usr/bin/chown :$${OVS_USER_ID##*:} /dev/hugepages'
ExecStartPre=-/usr/bin/chmod 0775 /dev/hugepages
ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-mlockall \           <=== no-mlockall
          --no-ovsdb-server --no-monitor --system-id=random \
          ${OVS_USER_OPT} \
          start $OPTIONS
ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop
ExecReload=/usr/share/openvswitch/scripts/ovs-ctl --no-mlockall --no-ovsdb-server \           <=== no-mlockall
          --no-monitor --system-id=random \
          ${OVS_USER_OPT} \
          restart $OPTIONS
TimeoutSec=300
 
And restart ovs service and check the ovs memory has changed. The ovs-vswitchd memory is 28400kb.
systemctl daemon-reload 
systemctl restart ovs-vswitchd
systemctl restart ovsdb-server
systemctl restart openvswitch

[root@dell-per740-57 perf]# ps axu|grep ovs
openvsw+  119549  259  0.0 269218476 28400 ?     S<Lsl 03:59   4:22 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
openvsw+  119602  0.0  0.0  45416 19768 ?        S<s  03:59   0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
root      119672  0.0  0.0   6408  2308 pts/1    S+   04:01   0:00 grep --color=auto ovs

After enable no-mlockall and CPUAffinity=0, the ovs memory is 56284kb when running 1q 2pmd case. 
[root@dell-per740-57 ~]# ps aux | grep ovs-vswitchd
openvsw+   50766  199  0.0 275663864 56284 ?     S<Lsl 00:15  53:55 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
root       58799  0.0  0.0   6408  2156 pts/0    S+   00:42   0:00 grep --color=auto ovs-vswitchd
After enable no-mlockall and CPUAffinity=0, the ovs memory is 73832kb when running 2q 4pmd case. 
[root@dell-per740-57 ~]# ps aux | grep ovs-vswitchd
openvsw+   80751  397  0.1 275554888 73832 ?     S<Lsl 01:52  27:37 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
root       83224  0.0  0.0   6408  2240 pts/0    S+   01:59   0:00 grep --color=auto ovs-vswitchd

After enable no-mlockall and CPUAffinity=0, the ovs memory is 109468kb when running 4q 8pmd case. 
[root@dell-per740-57 perf]# ps aux | grep ovs-vswitchd
openvsw+   97643  796  0.1 269307624 109468 ?    S<Lsl 02:41  91:50 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
root      101531  0.0  0.0   6408  2148 pts/1    R+   02:53   0:00 grep --color=auto ovs-vswitchd

For 4q 8pmd case run failed, following is the test info.
Build ovs dpdk topo as follows, but use podman start container failed. The command can start successfully with CPUAffinity=0 and --no-mlockall options.  
[root@dell-per740-57 perf]# ovs-vsctl show
072811d6-6da2-4092-891e-74323b7ad2e6
    Bridge ovsbr0
        datapath_type: netdev
        Port ovsbr0
            Interface ovsbr0
                type: internal
        Port vhost1
            Interface vhost1
                type: dpdkvhostuserclient
                options: {vhost-server-path="/tmp/vhostuser/vhost1"}
        Port vhost0
            Interface vhost0
                type: dpdkvhostuserclient
                options: {vhost-server-path="/tmp/vhostuser/vhost0"}
        Port dpdk1
            Interface dpdk1
                type: dpdk
                options: {dpdk-devargs="0000:3b:00.1", n_rxq="4", n_rxq_desc="1024", n_txq_desc="1024"}
        Port dpdk0
            Interface dpdk0
                type: dpdk
                options: {dpdk-devargs="0000:3b:00.0", n_rxq="4", n_rxq_desc="1024", n_txq_desc="1024"}
    ovs_version: "2.17.3"

[root@dell-per740-57 ~]# ovs-vsctl get Open_vSwitch . other_config
{dpdk-init="true", dpdk-lcore-mask="0x1", dpdk-socket-mem="4096", pmd-cpu-mask="555500000000", userspace-tso-enable="false", vhost-iommu-support="true"}

[root@dell-per740-57 perf]# taskset -c 2,4,6,8,10,12,14,16,18 podman run -i -t --privileged -v /tmp/vhostuser:/tmp/vhostuser -v /dev/hugepages:/dev/hugepages 4f4c841655b8 dpdk-testpmd -l 0-8 -n 1 -m 1024 --no-pci --vdev=virtio_user0,path=/tmp/vhostuser/vhost0,queues=4,server=1 --vdev=virtio_user1,path=/tmp/vhostuser/vhost1,queues=4,server=1 -- -i --forward-mode=io --burst=32 --rxd=8192 --txd=8192 --max-pkt-len=9600 --mbuf-size=9728 --nb-cores=8 --rxq=4 --txq=4 --mbcache=512 --auto-start
EAL: Detected 48 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Success
virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success
virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success
virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Success
virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success
virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Success
virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success
virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Success
virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success
virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Resource temporarily unavailable
virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Resource temporarily unavailable
virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Success
virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success
virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success
virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Success
virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success
virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Success
virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success
virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Success
virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success
virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Resource temporarily unavailable
virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Resource temporarily unavailable
EAL: No legacy callbacks, legacy socket not created
Interactive-mode selected
Set io packet forwarding mode
Auto-start selected
Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa.
testpmd: create a new mbuf pool <mb_pool_0>: n=278528, size=9728, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
testpmd: create a new mbuf pool <mb_pool_1>: n=278528, size=9728, socket=1
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
Port 0: DA:A1:88:E1:27:15
Configuring Port 1 (socket 0)
Port 1: D6:D2:AB:DC:28:A4
Checking link statuses...
Done
Start automatic packet forwarding
io packet forwarding - ports=2 - cores=8 - streams=8 - NUMA support enabled, MP allocation mode: native
Logical Core 1 (socket 1) forwards packets on 1 streams:
  RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
Logical Core 2 (socket 0) forwards packets on 1 streams:
  RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
Logical Core 3 (socket 1) forwards packets on 1 streams:
  RX P=0/Q=1 (socket 0) -> TX P=1/Q=1 (socket 0) peer=02:00:00:00:00:01
Logical Core 4 (socket 0) forwards packets on 1 streams:
  RX P=1/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00
Logical Core 5 (socket 1) forwards packets on 1 streams:
  RX P=0/Q=2 (socket 0) -> TX P=1/Q=2 (socket 0) peer=02:00:00:00:00:01
Logical Core 6 (socket 0) forwards packets on 1 streams:
  RX P=1/Q=2 (socket 0) -> TX P=0/Q=2 (socket 0) peer=02:00:00:00:00:00
Logical Core 7 (socket 1) forwards packets on 1 streams:
  RX P=0/Q=3 (socket 0) -> TX P=1/Q=3 (socket 0) peer=02:00:00:00:00:01
Logical Core 8 (socket 0) forwards packets on 1 streams:
  RX P=1/Q=3 (socket 0) -> TX P=0/Q=3 (socket 0) peer=02:00:00:00:00:00

  io packet forwarding packets/burst=32
  nb forwarding cores=8 - nb forwarding ports=2
  port 0: RX queue number: 4 Tx queue number: 4
    Rx offloads=0x800 Tx offloads=0x0
    RX queue: 0
      RX desc=8192 - RX free threshold=0
      RX threshold registers: pthresh=0 hthresh=0  wthresh=0
      RX Offloads=0x800
    TX queue: 0
      TX desc=8192 - TX free threshold=0
      TX threshold registers: pthresh=0 hthresh=0  wthresh=0
      TX offloads=0x0 - TX RS bit threshold=0
  port 1: RX queue number: 4 Tx queue number: 4
    Rx offloads=0x800 Tx offloads=0x0
    RX queue: 0
      RX desc=8192 - RX free threshold=0
      RX threshold registers: pthresh=0 hthresh=0  wthresh=0
      RX Offloads=0x800
    TX queue: 0
      TX desc=8192 - TX free threshold=0
      TX threshold registers: pthresh=0 hthresh=0  wthresh=0
      TX offloads=0x0 - TX RS bit threshold=0
testpmd>

Comment 6 Miguel Angel Ajo 2022-07-21 12:42:16 UTC
@eelco, I'm applying the changes from the microshift-networking packages, as "OPTIONS" in /etc/sysconfig/openvswitch

and as a systemd .d file.

https://github.com/openshift/microshift/pull/787


With this we don't need any changes on OpenvSwitch itself. Everything seems to work,
In our case we don't care about DPDK, and we don't use hugepages.

The lower performance is acceptable and the tweaks could always be reverted by any customers
working with high pps.

The only thing I see on the kernel traces is:

[  924.127699] openvswitch: cpu_id mismatch with handler threads
[  983.870273] openvswitch: cpu_id mismatch with handler threads
[  988.909174] openvswitch: cpu_id mismatch with handler threads

Comment 7 Miguel Angel Ajo 2022-07-21 13:14:19 UTC
@eelco, btw thanks for the suggestions, I didn't know about the systemd .d service modifications.

Comment 8 Flavio Leitner 2022-07-25 17:44:14 UTC
(In reply to liting from comment #5)
> I run comparing test between our standard container performance results and
> versus enable the CPUAffinity=0 and --no-mlockall options. The 1q 2pmd case
> got the similar result. The 2q 4pmd case with enable the options got
> slightly lower performance than disable the options. The 4q 8pmd case run
> failed when enable the two options. Please help me have a look. Thanks
> The ovs verison and kernel verison:

Just to clarify that the environment we want to focus on at the
moment is the kernel data path and not the userspace with DPDK.

fbl

Comment 9 Flavio Leitner 2022-07-25 18:15:12 UTC
(In reply to Miguel Angel Ajo from comment #6)
[...] 
> The only thing I see on the kernel traces is:
> 
> [  924.127699] openvswitch: cpu_id mismatch with handler threads
> [  983.870273] openvswitch: cpu_id mismatch with handler threads
> [  988.909174] openvswitch: cpu_id mismatch with handler threads

This is most probably being fixed by 
https://patchwork.ozlabs.org/project/openvswitch/list/?series=310161

because when systemd sets CPUAffinity, I think not all CPUs get mapped
to handlers and then the warning is issued. I will let Michael S. to
confirm.

Comment 10 Miguel Angel Ajo 2022-07-26 13:00:00 UTC
(In reply to Flavio Leitner from comment #9)
> (In reply to Miguel Angel Ajo from comment #6)
> [...] 
> > The only thing I see on the kernel traces is:
> > 
> > [  924.127699] openvswitch: cpu_id mismatch with handler threads
> > [  983.870273] openvswitch: cpu_id mismatch with handler threads
> > [  988.909174] openvswitch: cpu_id mismatch with handler threads
> 
> This is most probably being fixed by 
> https://patchwork.ozlabs.org/project/openvswitch/list/?series=310161
> 
> because when systemd sets CPUAffinity, I think not all CPUs get mapped
> to handlers and then the warning is issued. I will let Michael S. to
> confirm.

The CPUAffinity fix was in fact a form of limiting the threads created by ovs-vswitchd.
I'm afraid that those patches could bump the memory used by ovs-vswitchd again.

I wonder if we could control the number of threads manually, or add something
to that patch to able to cap the number of threads, and avoid the warning,
of course this is something that affects and will affect performance, but
only targeted for embedded/edge targets.

Comment 11 Michael Santana 2022-07-27 02:27:26 UTC
Hi Miguel,

The patch set in question does remove the 'cpu mismatch' warning. But it also increases the number of handler threads created. But in your case the CPU affinity is 0, so (IIUC) OVS is bound to only run on core. The patch set in question has an edge case where if OVS can only run on 2 or less cores, it will not create any additional threads. It will fallback to the normal behavior to creating as many handler threads as there are cores available for OVS to run (as long as that number is less than or euqal to 2). In your case that would be 1 handler thread and 1 revalidator. So it is possible the patch set would work as is in your use case

Of course, I understand the need of creating as many threads as you would like. We already had this functionality with n-handler-threads, but that is currently broken with per-cpu dispatch mode. I can start working on a patch on top of the current patch set to reenable this functionality in per-cpu dispatch mode.

The cpu mismatch warning is quiet harmless. That warning appears because that part of the OVS kernel module assumes OVS will be running on all online cores. IIUC CPU affinity and CPU isolation wasnt very well considered when developing per-cpu dispatch mode



I was not aware you could change the affinity of OVS service by setting CPUAffinity= on the service file. Someone else had asked for a way to change the number of threads in OVS. If this workaround works well I will recommend it. My understanding is that when OVS is in per-cpu mode it will create as many handler threads as there are CPUs it can run on, and because you change the affinity via that flag you basically have control as to how many threads OVS creates. This is a brilliant workaround!

Also, I just want to say that I am 100% behind this effort. I would like to decrease the memory and CPU usage of OVS. Im not too familiar with no-mlockall, so I will probably have to spend some time researching it

Comment 12 Miguel Angel Ajo 2022-07-28 13:19:58 UTC
(In reply to Michael Santana from comment #11)
> Hi Miguel,
> 
> The patch set in question does remove the 'cpu mismatch' warning. But it
> also increases the number of handler threads created. But in your case the
> CPU affinity is 0, so (IIUC) OVS is bound to only run on core. The patch set
> in question has an edge case where if OVS can only run on 2 or less cores,
> it will not create any additional threads. It will fallback to the normal
> behavior to creating as many handler threads as there are cores available
> for OVS to run (as long as that number is less than or euqal to 2). In your
> case that would be 1 handler thread and 1 revalidator. So it is possible the
> patch set would work as is in your use case
> 

Oh that's great it should totally work then. Sorry I missed those details
on the patch, I was probably doing a diagonal reading at that time.

> Of course, I understand the need of creating as many threads as you would
> like. We already had this functionality with n-handler-threads, but that is
> currently broken with per-cpu dispatch mode. I can start working on a patch
> on top of the current patch set to reenable this functionality in per-cpu
> dispatch mode.
> 
> The cpu mismatch warning is quiet harmless. That warning appears because
> that part of the OVS kernel module assumes OVS will be running on all online
> cores. IIUC CPU affinity and CPU isolation wasnt very well considered when
> developing per-cpu dispatch mode

Ack, good to know it's harmless. It's mostly annoying when working on the device
console, and I suspect customers will ask about it as soon as they face it, I
think we should just include a comment about this, and a link to this bz
on the release notes of MicroShift.

> 
> 
> 
> I was not aware you could change the affinity of OVS service by setting
> CPUAffinity= on the service file. Someone else had asked for a way to change
> the number of threads in OVS. If this workaround works well I will recommend
> it. 

It works like a charm.

> My understanding is that when OVS is in per-cpu mode it will create as
> many handler threads as there are CPUs it can run on, and because you change
> the affinity via that flag you basically have control as to how many threads
> OVS creates. This is a brilliant workaround!

I think it was zshi's idea, not sure.

> 
> Also, I just want to say that I am 100% behind this effort. I would like to
> decrease the memory and CPU usage of OVS. Im not too familiar with
> no-mlockall, so I will probably have to spend some time researching it

Thanks a lot for all the feedback and details, very appreciated.

Comment 13 liting 2022-08-01 03:40:31 UTC
(In reply to Flavio Leitner from comment #8)
> (In reply to liting from comment #5)
> > I run comparing test between our standard container performance results and
> > versus enable the CPUAffinity=0 and --no-mlockall options. The 1q 2pmd case
> > got the similar result. The 2q 4pmd case with enable the options got
> > slightly lower performance than disable the options. The 4q 8pmd case run
> > failed when enable the two options. Please help me have a look. Thanks
> > The ovs verison and kernel verison:
> 
> Just to clarify that the environment we want to focus on at the
> moment is the kernel data path and not the userspace with DPDK.
> 
> fbl

I run comparing test between our standard ovs kernel with container result versus enable the CPUAffinity=0 and --no-mlockall options.When enable the CPUAffinity=0 and --no-mlockall options, the ovs kernel result got slightly lower performance than the without the two options.
The ovs kernel topo as following.
[root@dell-per740-57 perf]# ovs-vsctl show
7c8dccc0-a28e-4f24-9e2a-78be2ae99c5f
    Bridge ovsbr0
        Port p2
            Interface p2
        Port ovsbr0
            Interface ovsbr0
                type: internal
        Port enp59s0f1
            Interface enp59s0f1
        Port p1
            Interface p1
        Port enp59s0f0
            Interface enp59s0f0
    ovs_version: "2.17.3"
The container command:
[root@dell-per740-57 perf]# ps aux|grep podman
root      181871  0.2  0.0 3486656 52992 pts/0   Sl+  22:51   0:04 podman run -i -t --privileged -v /dev/hugepages:/dev/hugepages 4f4c841655b8 dpdk-testpmd -l 2,4,6 -n 1 -m 1024 --no-pci --vdev=net_tap0,iface=p1,mac=00:de:ad:00:00:01 --vdev=net_tap1,iface=p2,mac=00:de:ad:00:00:02 -- -i --forward-mode=io --burst=32 --rxd=8192 --txd=8192 --nb-cores=2 --rxq=1 --txq=1 --auto-start
The ovs and kernel build:
[root@dell-per740-57 perf]# rpm -qa|grep openvs
kernel-kernel-networking-openvswitch-perf-1.0-334.noarch
openvswitch-selinux-extra-policy-1.0-31.el9fdp.noarch
openvswitch2.17-2.17.0-30.el9fdp.x86_64
[root@dell-per740-57 perf]# uname -r
5.14.0-130.el9.x86_64

During the standard test, ovs-vswitchd memory is 287224kb, and run ovs container kernel case three times, the result are 0.236pps,0.238pps,0.244pps.  
[root@dell-per740-57 bash_perf_result]# ps aux | grep ovs
openvsw+  157318  0.0  0.0  45356 20408 ?        S<s  21:35   0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
openvsw+  157367  1.9  0.4 4350716 287224 ?      S<Lsl 21:35   0:50 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
root      170137  0.0  0.0   6408  2240 pts/1    S+   22:17   0:00 grep --color=auto ovs

And then enable the CPUAffinity=0 and --no-mlockall options. Rerun the case three times, the result are 0.206pps,0.201pps,0.193pps.
ovs-vswitchd memory is 21268kb.
[root@dell-per740-57 perf]# ps aux|grep ovs
openvsw+  179135  0.0  0.0  45352 20548 ?        S<s  22:30   0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
openvsw+  179189  0.0  0.0 363064 21268 ?        S<sl 22:30   0:02 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
root      182394  0.0  0.0   6408  2216 pts/1    S+   23:12   0:00 grep --color=auto ovs

Comment 14 Flavio Leitner 2022-08-02 18:20:44 UTC
(In reply to liting from comment #13)
> (In reply to Flavio Leitner from comment #8)
> > Just to clarify that the environment we want to focus on at the
> > moment is the kernel data path and not the userspace with DPDK.
[...]
> The container command:
> [root@dell-per740-57 perf]# ps aux|grep podman
> root      181871  0.2  0.0 3486656 52992 pts/0   Sl+  22:51   0:04 podman
> run -i -t --privileged -v /dev/hugepages:/dev/hugepages 4f4c841655b8
> dpdk-testpmd -l 2,4,6 -n 1 -m 1024 --no-pci
> --vdev=net_tap0,iface=p1,mac=00:de:ad:00:00:01
> --vdev=net_tap1,iface=p2,mac=00:de:ad:00:00:02 -- -i --forward-mode=io
> --burst=32 --rxd=8192 --txd=8192 --nb-cores=2 --rxq=1 --txq=1 --auto-start

Hi Liting, I think there is still a confusion because the test
is using DPDK inside the container. We don't need any DPDK in this
environment.

Therefore, the test needs to be OVS without DPDK connected to
normal VM or normal containers or another hosts, and then run
iperf3 or netperf to test traffic.


Example test #1: host-to-host

               host A                                 host B
+-------------------------------+        +----------------------------------+
Container
+-------+                                                   
IPADDR1                                                               IPADDR2
veth1 --- veth0 --- ovsbr0 -- eth0   <=> eth0 --- ovsbr0 --- veth0 --- veth1


Example test #2: container-to-container
               host A
+-------------------------------+
Container1
+-------+                                                   
IPADDR1
vethc1 --- veth0---------
                        |
                     ovsbr0
Container2              |
+-------+               |
IPADDR2                 |
vethc2 --- veth1 --------


fbl

Comment 15 liting 2022-08-09 03:04:39 UTC
(In reply to Flavio Leitner from comment #14)
> (In reply to liting from comment #13)
> > (In reply to Flavio Leitner from comment #8)
> > > Just to clarify that the environment we want to focus on at the
> > > moment is the kernel data path and not the userspace with DPDK.
> [...]
> > The container command:
> > [root@dell-per740-57 perf]# ps aux|grep podman
> > root      181871  0.2  0.0 3486656 52992 pts/0   Sl+  22:51   0:04 podman
> > run -i -t --privileged -v /dev/hugepages:/dev/hugepages 4f4c841655b8
> > dpdk-testpmd -l 2,4,6 -n 1 -m 1024 --no-pci
> > --vdev=net_tap0,iface=p1,mac=00:de:ad:00:00:01
> > --vdev=net_tap1,iface=p2,mac=00:de:ad:00:00:02 -- -i --forward-mode=io
> > --burst=32 --rxd=8192 --txd=8192 --nb-cores=2 --rxq=1 --txq=1 --auto-start
> 
> Hi Liting, I think there is still a confusion because the test
> is using DPDK inside the container. We don't need any DPDK in this
> environment.
> 
> Therefore, the test needs to be OVS without DPDK connected to
> normal VM or normal containers or another hosts, and then run
> iperf3 or netperf to test traffic.
> 
> 
> Example test #1: host-to-host
> 
>                host A                                 host B
> +-------------------------------+        +----------------------------------+
> Container
> +-------+                                                   
> IPADDR1                                                               IPADDR2
> veth1 --- veth0 --- ovsbr0 -- eth0   <=> eth0 --- ovsbr0 --- veth0 --- veth1
> 
> 
> Example test #2: container-to-container
>                host A
> +-------------------------------+
> Container1
> +-------+                                                   
> IPADDR1
> vethc1 --- veth0---------
>                         |
>                      ovsbr0
> Container2              |
> +-------+               |
> IPADDR2                 |
> vethc2 --- veth1 --------
> 
> 
> fbl

I build topo of example test #1, according to the iperf3 results, enabling the two options has no impact on performance.
ice card on dell740-57 <--> i40e card on dell740-73
[root@dell-per740-57 ~]# rpm -qa|grep openvswitch
kernel-kernel-networking-openvswitch-common-2.0-234.noarch
kernel-kernel-networking-openvswitch-perf-1.0-336.noarch
openvswitch-selinux-extra-policy-1.0-31.el9fdp.noarch
openvswitch2.17-2.17.0-30.el9fdp.x86_64
[root@dell-per740-57 ~]# uname -r
5.14.0-70.17.1.el9_0.x86_64
[root@dell-per740-73 ~]# uname -r
4.18.0-305.25.1.el8_4.x86_64
[root@dell-per740-73 ~]# rpm -qa|grep openvswitch
openvswitch2.15-2.15.0-113.el8fdp.x86_64
kernel-kernel-networking-openvswitch-perf-1.0-336.noarch
openvswitch-selinux-extra-policy-1.0-29.el8fdp.noarch
kernel-kernel-networking-openvswitch-common-2.0-234.noarch

On dell740-57:
ip link add veth1 type veth peer name veth0
ip link set veth1 up
ip link set veth0 up
ovs-vsctl add-br ovsbr0
ovs-vsctl add-port ovsbr0 veth0
ovs-vsctl add-port ovsbr0 ens1f0
ip link set ovsbr0 up
ip netns add ns1
ip link set veth1 netns ns1
ip netns exec ns1 ip add add 20.0.0.1/24 dev veth1
ip netns exec ns1 ip link set veth1 up

On dell740-73
ip link add veth1 type veth peer name veth0
ip link set veth1 up
ip link set veth0 up
ovs-vsctl add-br ovsbr0
ovs-vsctl add-port ovsbr0 veth0
ovs-vsctl add-port ovsbr0 ens2f0
ip link set ovsbr0 up
ip netns add ns1
ip link set veth1 netns ns1
ip netns exec ns1 ip add add 20.0.0.2/24 dev veth1
ip netns exec ns1 ip link set veth1 up

[root@dell-per740-57 ~]# ps aux|grep ovs
openvsw+   73499  0.0  0.0  45248 19540 ?        S<s  22:12   0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
openvsw+   73563  2.0  0.4 4350508 286860 ?      S<Lsl 22:12   0:42 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
root       74007  0.0  0.0   6412  2184 pts/2    S+   22:47   0:00 grep --color=auto ovs

Start iperf server on dell740-73
[root@dell-per740-73 ~]# ip netns exec ns1 iperf3 -s  -B 20.0.0.2
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 20.0.0.1, port 53716
[  5] local 20.0.0.2 port 5201 connected to 20.0.0.1 port 39056
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-1.00   sec   129 KBytes  1.05 Mbits/sec  311462284.854 ms  0/91 (0%)  
[  5]   1.00-2.00   sec   127 KBytes  1.04 Mbits/sec  935018.797 ms  0/90 (0%)  
[  5]   2.00-3.00   sec   129 KBytes  1.05 Mbits/sec  2631.520 ms  0/91 (0%)  
[  5]   3.00-4.00   sec   127 KBytes  1.04 Mbits/sec  7.901 ms  0/90 (0%)  
[  5]   4.00-5.00   sec   129 KBytes  1.05 Mbits/sec  0.023 ms  0/91 (0%)  
[  5]   5.00-6.00   sec   129 KBytes  1.05 Mbits/sec  0.001 ms  0/91 (0%)  
[  5]   6.00-7.00   sec   127 KBytes  1.04 Mbits/sec  0.002 ms  0/90 (0%)  
[  5]   7.00-8.00   sec   129 KBytes  1.05 Mbits/sec  0.001 ms  0/91 (0%)  
[  5]   8.00-9.00   sec   127 KBytes  1.04 Mbits/sec  0.001 ms  0/90 (0%)  
[  5]   9.00-10.00  sec   129 KBytes  1.05 Mbits/sec  0.002 ms  0/91 (0%)  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  1.25 MBytes  1.05 Mbits/sec  0.002 ms  0/906 (0%)  receiver

Run iperf test on dell740-57, the result is 128 KBytes/sec.
[root@dell-per740-57 ~]# ip netns exec ns1 iperf3 -c  20.0.0.2 -4 -f K -u
Connecting to host 20.0.0.2, port 5201
[  5] local 20.0.0.1 port 39298 connected to 20.0.0.2 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   1.00-2.00   sec   127 KBytes   127 KBytes/sec  90  
[  5]   2.00-3.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   3.00-4.00   sec   127 KBytes   127 KBytes/sec  90  
[  5]   4.00-5.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   5.00-6.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   6.00-7.00   sec   127 KBytes   127 KBytes/sec  90  
[  5]   7.00-8.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   8.00-9.00   sec   127 KBytes   127 KBytes/sec  90  
[  5]   9.00-10.00  sec   129 KBytes   129 KBytes/sec  91  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  1.25 MBytes   128 KBytes/sec  0.000 ms  0/906 (0%)  sender
[  5]   0.00-10.00  sec  1.25 MBytes   128 KBytes/sec  0.001 ms  0/906 (0%)  receiver

iperf Done.

Then enable the CPUAffinity=0 and --no-mlockall options for ovs service.
[root@dell-per740-57 ~]# ps aux|grep ovs
openvsw+   74366  0.0  0.0 362848 20444 ?        S<sl 22:50   0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
openvsw+   74421  0.0  0.0  45260 19940 ?        S<s  22:50   0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
root       74454  0.0  0.0   6412  2168 pts/2    S+   22:52   0:00 grep --color=auto ovs

Rerun the iperf test, the iperf test still is 128 KBytes/sec.
[root@dell-per740-57 ~]# ip netns exec ns1 iperf3 -c  20.0.0.2 -4 -f K -u -t 10
Connecting to host 20.0.0.2, port 5201
[  5] local 20.0.0.1 port 39056 connected to 20.0.0.2 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   1.00-2.00   sec   127 KBytes   127 KBytes/sec  90  
[  5]   2.00-3.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   3.00-4.00   sec   127 KBytes   127 KBytes/sec  90  
[  5]   4.00-5.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   5.00-6.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   6.00-7.00   sec   127 KBytes   127 KBytes/sec  90  
[  5]   7.00-8.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   8.00-9.00   sec   127 KBytes   127 KBytes/sec  90  
[  5]   9.00-10.00  sec   129 KBytes   129 KBytes/sec  91  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  1.25 MBytes   128 KBytes/sec  0.000 ms  0/906 (0%)  sender
[  5]   0.00-10.00  sec  1.25 MBytes   128 KBytes/sec  0.002 ms  0/906 (0%)  receiver

iperf Done.

Comment 16 liting 2022-08-09 03:25:32 UTC
I also test example test #2, enabling the two options has no impact on performance.
on dell740-57:
     ip link add vethc1 type veth peer name veth0
     ip link add vethc2 type veth peer name veth1
     ovs-vsctl add-br ovsbr0
     ovs-vsctl add-port ovsbr0 veth0
     ovs-vsctl add-port ovsbr0 veth1
     ip netns add ns1
     ip netns add ns2
     ip link set vethc1 netns ns1
     ip link set vethc2 netns ns2
     ip netns exec ns1 ip add add 20.0.0.1/24 dev vethc1
     ip netns exec ns2 ip add add 20.0.0.2/24 dev vethc2
     ip link set veth0 up
     ip link set veth1 up
     ip link set ovsbr0 up
     ip netns exec ns2 ip link set vethc2 up
     ip netns exec ns1 ip link set vethc1 up

[root@dell-per740-57 ~]# ps aux|grep ovs
openvsw+   75137  2.6  0.3 2458092 229572 ?      S<Lsl 23:17   0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
openvsw+   75188  0.0  0.0  45260 19908 ?        S<s  23:17   0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
root       75248  0.0  0.0   6412  2372 pts/0    R+   23:17   0:00 grep --color=auto ovs

Start iperf server.
[root@dell-per740-57 ~]# ip netns exec ns1 iperf3 -s  -B 20.0.0.1
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 20.0.0.2, port 47974
[  5] local 20.0.0.1 port 5201 connected to 20.0.0.2 port 34476
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-1.00   sec   129 KBytes  1.05 Mbits/sec  0.001 ms  0/91 (0%)  
[  5]   1.00-2.00   sec   127 KBytes  1.04 Mbits/sec  0.001 ms  0/90 (0%)  
[  5]   2.00-3.00   sec   129 KBytes  1.05 Mbits/sec  0.001 ms  0/91 (0%)  
[  5]   3.00-4.00   sec   127 KBytes  1.04 Mbits/sec  0.000 ms  0/90 (0%)  
[  5]   4.00-5.00   sec   129 KBytes  1.05 Mbits/sec  0.001 ms  0/91 (0%)  
[  5]   5.00-6.00   sec   129 KBytes  1.05 Mbits/sec  0.001 ms  0/91 (0%)  
[  5]   6.00-7.00   sec   127 KBytes  1.04 Mbits/sec  0.001 ms  0/90 (0%)  
[  5]   7.00-8.00   sec   129 KBytes  1.05 Mbits/sec  0.001 ms  0/91 (0%)  
[  5]   8.00-9.00   sec   127 KBytes  1.04 Mbits/sec  0.001 ms  0/90 (0%)  
[  5]   9.00-10.00  sec   129 KBytes  1.05 Mbits/sec  0.001 ms  0/91 (0%)  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  1.25 MBytes  1.05 Mbits/sec  0.001 ms  0/906 (0%)  receiver
-----------------------------------------------------------

Run iperf test, result is 128 KBytes/sec.
[root@dell-per740-57 ~]# ip netns exec ns2 iperf3 -c  20.0.0.1 -4 -f K -u
Connecting to host 20.0.0.1, port 5201
[  5] local 20.0.0.2 port 34476 connected to 20.0.0.1 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   1.00-2.00   sec   127 KBytes   127 KBytes/sec  90  
[  5]   2.00-3.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   3.00-4.00   sec   127 KBytes   127 KBytes/sec  90  
[  5]   4.00-5.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   5.00-6.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   6.00-7.00   sec   127 KBytes   127 KBytes/sec  90  
[  5]   7.00-8.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   8.00-9.00   sec   127 KBytes   127 KBytes/sec  90  
[  5]   9.00-10.00  sec   129 KBytes   129 KBytes/sec  91  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  1.25 MBytes   128 KBytes/sec  0.000 ms  0/906 (0%)  sender
[  5]   0.00-10.00  sec  1.25 MBytes   128 KBytes/sec  0.001 ms  0/906 (0%)  receiver

iperf Done.

Enable the CPUAffinity=0 and --no-mlockall options for ovs service.
[root@dell-per740-57 ~]# ps aux|grep ovs
openvsw+   74366  0.0  0.0 362916 21248 ?        S<sl 22:50   0:01 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
openvsw+   74421  0.0  0.0  45260 19940 ?        S<s  22:50   0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
root       74764  0.0  0.0   6412  2372 pts/1    R+   23:16   0:00 grep --color=auto ovs

Rerun iperf test, the result still is 128 KBytes/sec.
[root@dell-per740-57 ~]# ip netns exec ns2 iperf3 -c  20.0.0.1 -4 -f K -u
Connecting to host 20.0.0.1, port 5201
[  5] local 20.0.0.2 port 34476 connected to 20.0.0.1 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   1.00-2.00   sec   127 KBytes   127 KBytes/sec  90  
[  5]   2.00-3.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   3.00-4.00   sec   127 KBytes   127 KBytes/sec  90  
[  5]   4.00-5.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   5.00-6.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   6.00-7.00   sec   127 KBytes   127 KBytes/sec  90  
[  5]   7.00-8.00   sec   129 KBytes   129 KBytes/sec  91  
[  5]   8.00-9.00   sec   127 KBytes   127 KBytes/sec  90  
[  5]   9.00-10.00  sec   129 KBytes   129 KBytes/sec  91  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  1.25 MBytes   128 KBytes/sec  0.000 ms  0/906 (0%)  sender
[  5]   0.00-10.00  sec  1.25 MBytes   128 KBytes/sec  0.001 ms  0/906 (0%)  receiver

iperf Done.

Comment 17 Flavio Leitner 2022-08-10 14:28:02 UTC
Hi Liting,

Thanks for testing those scenarios. It looks promising.
Can you also test with TCP traffic?
Also, can we add that as part of usual FDP QE cycle?

Thanks,
fbl

Comment 18 liting 2022-08-15 09:09:38 UTC
(In reply to Flavio Leitner from comment #17)
> Hi Liting,
> 
> Thanks for testing those scenarios. It looks promising.
> Can you also test with TCP traffic?
> Also, can we add that as part of usual FDP QE cycle?
> 
> Thanks,
> fbl

I run tcp test for the example test topo #2. The tcp performance of enable the CPUAffinity=0 and --no-mlockall options for ovs service is slightly higher than standard ovs setting.

With the default options for ovs service. The result still is 4983922 KBytes/sec.
[root@dell-per740-57 ~]# ps aux|grep ovs
openvsw+   23052  0.3  0.0  45248 19952 ?        S<s  04:54   0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
openvsw+   23101  1.6  0.2 159724 159616 ?       S<Ls 04:54   0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
root       23112  0.0  0.0   6412  2176 pts/0    S+   04:54   0:00 grep --color=auto ovs

 [root@dell-per740-57 ~]# ip netns exec ns2 iperf3 -c  20.0.0.1 -4 -f K 
Connecting to host 20.0.0.1, port 5201
[  5] local 20.0.0.2 port 35736 connected to 20.0.0.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  4.73 GBytes  4960586 KBytes/sec    0    563 KBytes       
[  5]   1.00-2.00   sec  4.82 GBytes  5057487 KBytes/sec    0    563 KBytes       
[  5]   2.00-3.00   sec  4.80 GBytes  5029721 KBytes/sec    0    563 KBytes       
[  5]   3.00-4.00   sec  4.82 GBytes  5053176 KBytes/sec    0    563 KBytes       
[  5]   4.00-5.00   sec  4.83 GBytes  5060285 KBytes/sec    0    563 KBytes       
[  5]   5.00-6.00   sec  4.83 GBytes  5063124 KBytes/sec    0    563 KBytes       
[  5]   6.00-7.00   sec  4.84 GBytes  5074082 KBytes/sec    0    563 KBytes       
[  5]   7.00-8.00   sec  4.62 GBytes  4838558 KBytes/sec    0    563 KBytes       
[  5]   8.00-9.00   sec  4.63 GBytes  4849993 KBytes/sec    0    563 KBytes       
[  5]   9.00-10.00  sec  4.63 GBytes  4852224 KBytes/sec    0    563 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  47.5 GBytes  4983922 KBytes/sec    0             sender
[  5]   0.00-10.00  sec  47.5 GBytes  4983919 KBytes/sec                  receiver

iperf Done.

Enable the CPUAffinity=0 and --no-mlockall options for ovs service. The result is 5453743 KBytes/sec.
[root@dell-per740-57 ~]# ps aux|grep ovs
openvsw+   23854  0.2  0.0 362848 20392 ?        S<sl 04:58   0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
openvsw+   23909  0.0  0.0  45260 19996 ?        S<s  04:58   0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
root       23935  0.0  0.0   6412  2300 pts/0    R+   04:58   0:00 grep --color=auto ovs

[root@dell-per740-57 ~]# ip netns exec ns2 iperf3 -c  20.0.0.1 -4 -f K 
Connecting to host 20.0.0.1, port 5201
[  5] local 20.0.0.2 port 34972 connected to 20.0.0.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  4.70 GBytes  4927113 KBytes/sec    0    563 KBytes       
[  5]   1.00-2.00   sec  5.15 GBytes  5397225 KBytes/sec    0    563 KBytes       
[  5]   2.00-3.00   sec  5.26 GBytes  5515245 KBytes/sec    0    563 KBytes       
[  5]   3.00-4.00   sec  5.28 GBytes  5532652 KBytes/sec    0    563 KBytes       
[  5]   4.00-5.00   sec  5.26 GBytes  5517109 KBytes/sec    0    563 KBytes       
[  5]   5.00-6.00   sec  5.27 GBytes  5528342 KBytes/sec    0    563 KBytes       
[  5]   6.00-7.00   sec  5.27 GBytes  5530524 KBytes/sec    0    563 KBytes       
[  5]   7.00-8.00   sec  5.28 GBytes  5531651 KBytes/sec    0    563 KBytes       
[  5]   8.00-9.00   sec  5.28 GBytes  5537053 KBytes/sec    0    563 KBytes       
[  5]   9.00-10.00  sec  5.26 GBytes  5520596 KBytes/sec    0    662 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  52.0 GBytes  5453743 KBytes/sec    0             sender
[  5]   0.00-10.00  sec  52.0 GBytes  5453740 KBytes/sec                  receiver

iperf Done.

We can add that as part of usual FDP QE cycle later.

thanks,
Li Ting

Comment 19 zenghui.shi 2022-08-24 04:10:08 UTC
@Flavio, do you have an estimation on which FDP release this feature will be supported?

@Liting, where/how do we track the inclusion of the above testing in FDP?

Comment 20 liting 2022-08-25 08:10:29 UTC
(In reply to zenghui.shi from comment #19)
> @Flavio, do you have an estimation on which FDP release this feature will be
> supported?
> 
> @Liting, where/how do we track the inclusion of the above testing in FDP?

@Zenghui, We consider add this test to our topo(networking/openvswitch/topo) case. And jiqiu will help to write this case to the topo case.

thanks,
Li Ting

Comment 21 Flavio Leitner 2022-08-25 14:41:23 UTC
(In reply to zenghui.shi from comment #19)
> @Flavio, do you have an estimation on which FDP release this feature will be
> supported?

There is a pending question for Michael S. with regards to the per-cpu dispatch
mode and the handlers threads with cpu affinity.

Michael, it seems we need the patch applied. Is that correct? If so, we could
enable support in the next FDP release 22.H:
https://source.redhat.com/groups/public/fast_datapath_release_planning_and_scheduling/fdp_releases_2022

BTW, all this is assuming the support is limited to OVS kernel data path, so it
does not include DPDK support. Zenghui, please confirm.

Comment 22 Michael Santana 2022-08-26 04:43:12 UTC
Hi all

The handlers fix has been applied so you should not see a mismatch warning.

With that said, I should make a correction over a mistake I made in my previous comment. The comment is:
"The patch set in question has an edge case where if OVS can only runon 2 or less cores, it will not create any additional threads.
"

This comment is false. The edge case is if the TOTAL number of configured cores is less than or equal to 2, OVS will not create any additional threads. It will fall back to creating threads based on affinity.

The point is, just setting CPUAffinity=0 is not enough to restrict the number of handler threads to one single handler thread. They will also need to be running on a system with 2 or less cores for that to happen

i.e.
1 Core system, CPUAffinity=0
Makes 1 handlers

2 Core system, CPUAffinity=0
Makes 1 handler

3 or more Core system, CPUAffinity=0
Makes 2 handlers - This number does not change if we add more cores as long as we set CPUAffinity=0


Based on some documentation I saw, the microshift systems are using 2-4 cores. When using CPUAffinity=0 this will create 2 handler threads as shown above (or one thread in the case of 2 cores)

If you absolutely need to have exactly 1 handler threads as opposed to 2 handler threads, then I think the best solution is to fix n-handler-threads for per-cpu mode. I can start looking into proposing a patch

Comment 23 zenghui.shi 2022-08-26 09:34:08 UTC
(In reply to Flavio Leitner from comment #21)
> (In reply to zenghui.shi from comment #19)
> > @Flavio, do you have an estimation on which FDP release this feature will be
> > supported?
> 
> There is a pending question for Michael S. with regards to the per-cpu
> dispatch
> mode and the handlers threads with cpu affinity.
> 
> Michael, it seems we need the patch applied. Is that correct? If so, we could
> enable support in the next FDP release 22.H:
> https://source.redhat.com/groups/public/
> fast_datapath_release_planning_and_scheduling/fdp_releases_2022
> 
> BTW, all this is assuming the support is limited to OVS kernel data path, so
> it
> does not include DPDK support. Zenghui, please confirm.

Yes, I confirm that DPDK won't be supported in the next microshift release.
I will let you know in the case there will be changes in the future.

Comment 24 zenghui.shi 2022-08-26 09:43:01 UTC
(In reply to Michael Santana from comment #22)
> Hi all
> 
> The handlers fix has been applied so you should not see a mismatch warning.

May I know which ovs version contains the fix?
do we have another bugzilla that tracks this fix? I was looking at this
one which is in new state: https://bugzilla.redhat.com/show_bug.cgi?id=2102449

> 
> 
> Based on some documentation I saw, the microshift systems are using 2-4
> cores. When using CPUAffinity=0 this will create 2 handler threads as shown
> above (or one thread in the case of 2 cores)
> 
> If you absolutely need to have exactly 1 handler threads as opposed to 2
> handler threads, then I think the best solution is to fix n-handler-threads
> for per-cpu mode. I can start looking into proposing a patch

The general idea is to optimize the resource usage as much as possible.
I think we will need to run some tests with > 2 cores system to see how much
memory is actually allocated to ovs-vswitchd in microshift env, and whether
it is acceptable to customers. Then come back with a definite answer. 

For the testing, which ovs version shall be used?

Comment 26 Michael Santana 2022-08-31 02:06:11 UTC
(In reply to zenghui.shi from comment #24)
> (In reply to Michael Santana from comment #22)
> > Hi all
> > 
> > The handlers fix has been applied so you should not see a mismatch warning.
> 
> May I know which ovs version contains the fix?
ovs2.16, 2.17, 3.0, dpdk-latest
> do we have another bugzilla that tracks this fix? I was looking at this
> one which is in new state:
> https://bugzilla.redhat.com/show_bug.cgi?id=2102449
yes, I need to update that bugzilla. thanks for the reminder
> 
> > 
> > 
> > Based on some documentation I saw, the microshift systems are using 2-4
> > cores. When using CPUAffinity=0 this will create 2 handler threads as shown
> > above (or one thread in the case of 2 cores)
> > 
> > If you absolutely need to have exactly 1 handler threads as opposed to 2
> > handler threads, then I think the best solution is to fix n-handler-threads
> > for per-cpu mode. I can start looking into proposing a patch
> 
> The general idea is to optimize the resource usage as much as possible.
> I think we will need to run some tests with > 2 cores system to see how much
> memory is actually allocated to ovs-vswitchd in microshift env, and whether
> it is acceptable to customers. Then come back with a definite answer. 
> 
> For the testing, which ovs version shall be used?
All the latest versions contain the fix so im not really sure

Comment 27 zenghui.shi 2022-09-09 02:47:19 UTC
Flavio, may I ask which OVS version and FDP release can we expect this feature to be supported?

Comment 28 Flavio Leitner 2022-09-09 12:12:11 UTC
Hi,

The feature "--no-mlockall" is already available and QE found no issues, so we can support that right away.

The CPUAffinity causes the cpu_id mismatch log message that is fixed by bz#2102449.
That BZ is queued for the next FDP release 22.I. The schedule is here:
https://source.redhat.com/groups/public/fast_datapath_release_planning_and_scheduling/fdp_releases_2022

The build containing the above fix is available for internal consumption at:
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2123539

The last open question is regarding to manually setting the number of threads.
If that is required, then we will need work upstream first.

Thanks,
fbl

Comment 30 Christian Trautman 2022-11-11 18:34:23 UTC
Tests have been added to our current suites.  Will check in with Zenghui occasionally to make sure our coverage for this feature is up to date.

Comment 31 Michael Santana 2022-11-15 19:38:38 UTC
I summited a patch allowing you to specify the number of handlers when using per-cpu mode

https://mail.openvswitch.org/pipermail/ovs-dev/2022-October/398793.html

Comment 32 ovs-bot 2024-10-08 17:49:14 UTC
This bug did not meet the criteria for automatic migration and is being closed.
If the issue remains, please open a new ticket in https://issues.redhat.com/browse/FDP

Comment 33 Red Hat Bugzilla 2025-02-06 04:25:16 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.