Description of problem: Ovn-kubernetes is chosen as default CNI solution MicroShift[0] which is optimized and simplified version of openshift built for Internet-of-Things and edge computing that are both CPU and memory constrained. Since ovn-kubernetes uses OVS as the underlying datapath, several optimizations are used in OVS to meet the small footprint requirement: 1) CPUAffinity CPUAffinity restricts the CPU cores where OVS services are running on. This option is applied to both ovs-vswitchd.service and ovsdb-server.service profiles. This option also helps with reducing the number of n-handler-threads which is not configurable in latest OVS versions, such as openvswitch2.17. 2) no-mlockall --no-mlockall, when used in starting ovs-vswitchd and/or ovsdb-server services, is observed to reduce the memory pre-allocation significantly. From 70M reduced to 15M in a 2G & 1vcpu VM. By allowing these two options to be set at ovs-vswitchd and ovsdb-server service startup/restart, it would improve the overall footprint of running ovn-kubernetes in MicroShift. This RFE is to request official support of no-mlockall and CPUAffinity options in OVS so that layered product such as MicroShift can make use of such options. [0]: https://github.com/openshift/microshift Additional info: An example of applying both options to ovs-vswitchd.service [Service] Type=forking CPUAffinity=0 <=== CPUAffinity PIDFile=/var/run/openvswitch/ovs-vswitchd.pid Restart=on-failure Environment=XDG_RUNTIME_DIR=/var/run/openvswitch EnvironmentFile=/etc/openvswitch/default.conf EnvironmentFile=-/etc/sysconfig/openvswitch EnvironmentFile=-/run/openvswitch.useropts LimitSTACK=2M ExecStartPre=-/bin/sh -c '/usr/bin/chown :$${OVS_USER_ID##*:} /dev/hugepages' ExecStartPre=-/usr/bin/chmod 0775 /dev/hugepages ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-mlockall \ <=== no-mlockall --no-ovsdb-server --no-monitor --system-id=random \ ${OVS_USER_OPT} \ start $OPTIONS ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop ExecReload=/usr/share/openvswitch/scripts/ovs-ctl --no-mlockall --no-ovsdb-server \ <=== no-mlockall --no-monitor --system-id=random \ ${OVS_USER_OPT} \ restart $OPTIONS TimeoutSec=300
From an engineering perspective, the following questions arise: - Setting CPUAffinity=0, limits the use of a single core. This might impact upcall handling with the new implementation, so from an ENG side, we need some investigation. - Performance might be lower, as memory will probably be allocated on the fly. The existing performance tests might capture this, QA? - Same might be true for limiting the CPUs being used. - Testing matrix does not need to explode, so how would we test these additional configuration changes? QA?
On the QA side we can see how the performance is impacted by these changes/options. Will provide some sample data as time permits to run this.
Adding performance and scalability requirements. MicroShift is being designed to run in a single-node configuration including the control plane and node components in one package. Expanding to Highly Available deployments will cause a resource budget expansion because of the CNI requirements, so we will favor two MicroShift deployments running active/active where HA is needed. The expected workload from the early customer in the MicroShift scenarios are: kubernetes Pods: 12-25 kubernetes Services: 5-10 kubernetes NetworkPolicies: 2-3 (really depends on what customer is trying to do) With the understanding that there will be performance penalties when enabling these two ovs parameters, customer, who is net-perf focused, will be instructed to use the default openvswitch settings if applicable.
Note that the mlocall option can already be configured from the /etc/sysconfig/openvswitch file: # Pass or not --mlockall option to ovs-vswitchd. # This option should be set to "yes" or "no". The default is "yes". # Enabling this option can avoid networking interruptions due to # system memory pressure in extraordinary situations, such as multiple # concurrent VM import operations. # --mlockall=yes Also, note that setting the CPUAffinity=0 in the service file is not really user-friendly. It should be overwritten using the systemd drop-in feature, this is an example for stack size I used in the past: $ mkdir -p /etc/systemd/system/ovs-vswitchd.service.d/ $ echo -e "[Service]\nLimitSTACK=8M" > /etc/systemd/system/ovs-vswitchd.service.d/limitstack.conf $ systemctl daemon-reload $ systemctl restart openvswitch $ cat /proc/$(pidof ovs-vswitchd)/limits | grep stack - Max stack size 8388608 8388608 bytes
I run comparing test between our standard container performance results and versus enable the CPUAffinity=0 and --no-mlockall options. The 1q 2pmd case got the similar result. The 2q 4pmd case with enable the options got slightly lower performance than disable the options. The 4q 8pmd case run failed when enable the two options. Please help me have a look. Thanks The ovs verison and kernel verison: [root@dell-per740-57 ~]# rpm -qa|grep openvs kernel-kernel-networking-openvswitch-perf-1.0-334.noarch openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch openvswitch2.17-2.17.0-30.el9fdp.x86_64 [root@dell-per740-57 ~]# uname -r 5.14.0-130.el9.x86_64 our standard container performance results 1q 2pmd case: 9863248pps 2q 4pmd case: 19419076pps 4q 8pmd case: 21338436pps enable CPUAffinity=0 and --no-mlockall options results 1q 2pmd case: 9853706pps 2q 4pmd case: 18831290pps 4q 8pmd case: failed Without enable no-mlockall and CPUAffinity=0. The ovs-vswitchd memory is 237916kb. [root@dell-per740-57 perf]# ps axu|grep ovs openvsw+ 111715 0.0 0.0 45508 20712 ? S<s 03:21 0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach openvsw+ 111765 399 0.3 269263432 237916 ? S<Lsl 03:21 122:58 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach root 119184 0.0 0.0 6408 2212 pts/1 S+ 03:52 0:00 grep --color=auto ovs I used following steps to enable no-mlockall and CPUAffinity=0. Add following CPUAffinity=0 and --no-mlockall to /usr/lib/systemd/system/ovs-vswitchd.service and /usr/lib/systemd/system/ovsdb-server.service. [Service] Type=forking CPUAffinity=0 PIDFile=/var/run/openvswitch/ovs-vswitchd.pid Restart=on-failure Environment=XDG_RUNTIME_DIR=/var/run/openvswitch EnvironmentFile=/etc/openvswitch/default.conf EnvironmentFile=-/etc/sysconfig/openvswitch EnvironmentFile=-/run/openvswitch.useropts LimitSTACK=2M ExecStartPre=-/bin/sh -c '/usr/bin/chown :$${OVS_USER_ID##*:} /dev/hugepages' ExecStartPre=-/usr/bin/chmod 0775 /dev/hugepages ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-mlockall \ <=== no-mlockall --no-ovsdb-server --no-monitor --system-id=random \ ${OVS_USER_OPT} \ start $OPTIONS ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop ExecReload=/usr/share/openvswitch/scripts/ovs-ctl --no-mlockall --no-ovsdb-server \ <=== no-mlockall --no-monitor --system-id=random \ ${OVS_USER_OPT} \ restart $OPTIONS TimeoutSec=300 And restart ovs service and check the ovs memory has changed. The ovs-vswitchd memory is 28400kb. systemctl daemon-reload systemctl restart ovs-vswitchd systemctl restart ovsdb-server systemctl restart openvswitch [root@dell-per740-57 perf]# ps axu|grep ovs openvsw+ 119549 259 0.0 269218476 28400 ? S<Lsl 03:59 4:22 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach openvsw+ 119602 0.0 0.0 45416 19768 ? S<s 03:59 0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach root 119672 0.0 0.0 6408 2308 pts/1 S+ 04:01 0:00 grep --color=auto ovs After enable no-mlockall and CPUAffinity=0, the ovs memory is 56284kb when running 1q 2pmd case. [root@dell-per740-57 ~]# ps aux | grep ovs-vswitchd openvsw+ 50766 199 0.0 275663864 56284 ? S<Lsl 00:15 53:55 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach root 58799 0.0 0.0 6408 2156 pts/0 S+ 00:42 0:00 grep --color=auto ovs-vswitchd After enable no-mlockall and CPUAffinity=0, the ovs memory is 73832kb when running 2q 4pmd case. [root@dell-per740-57 ~]# ps aux | grep ovs-vswitchd openvsw+ 80751 397 0.1 275554888 73832 ? S<Lsl 01:52 27:37 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach root 83224 0.0 0.0 6408 2240 pts/0 S+ 01:59 0:00 grep --color=auto ovs-vswitchd After enable no-mlockall and CPUAffinity=0, the ovs memory is 109468kb when running 4q 8pmd case. [root@dell-per740-57 perf]# ps aux | grep ovs-vswitchd openvsw+ 97643 796 0.1 269307624 109468 ? S<Lsl 02:41 91:50 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach root 101531 0.0 0.0 6408 2148 pts/1 R+ 02:53 0:00 grep --color=auto ovs-vswitchd For 4q 8pmd case run failed, following is the test info. Build ovs dpdk topo as follows, but use podman start container failed. The command can start successfully with CPUAffinity=0 and --no-mlockall options. [root@dell-per740-57 perf]# ovs-vsctl show 072811d6-6da2-4092-891e-74323b7ad2e6 Bridge ovsbr0 datapath_type: netdev Port ovsbr0 Interface ovsbr0 type: internal Port vhost1 Interface vhost1 type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser/vhost1"} Port vhost0 Interface vhost0 type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser/vhost0"} Port dpdk1 Interface dpdk1 type: dpdk options: {dpdk-devargs="0000:3b:00.1", n_rxq="4", n_rxq_desc="1024", n_txq_desc="1024"} Port dpdk0 Interface dpdk0 type: dpdk options: {dpdk-devargs="0000:3b:00.0", n_rxq="4", n_rxq_desc="1024", n_txq_desc="1024"} ovs_version: "2.17.3" [root@dell-per740-57 ~]# ovs-vsctl get Open_vSwitch . other_config {dpdk-init="true", dpdk-lcore-mask="0x1", dpdk-socket-mem="4096", pmd-cpu-mask="555500000000", userspace-tso-enable="false", vhost-iommu-support="true"} [root@dell-per740-57 perf]# taskset -c 2,4,6,8,10,12,14,16,18 podman run -i -t --privileged -v /tmp/vhostuser:/tmp/vhostuser -v /dev/hugepages:/dev/hugepages 4f4c841655b8 dpdk-testpmd -l 0-8 -n 1 -m 1024 --no-pci --vdev=virtio_user0,path=/tmp/vhostuser/vhost0,queues=4,server=1 --vdev=virtio_user1,path=/tmp/vhostuser/vhost1,queues=4,server=1 -- -i --forward-mode=io --burst=32 --rxd=8192 --txd=8192 --max-pkt-len=9600 --mbuf-size=9728 --nb-cores=8 --rxq=4 --txq=4 --mbcache=512 --auto-start EAL: Detected 48 lcore(s) EAL: Detected 2 NUMA nodes EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: No available hugepages reported in hugepages-2048kB EAL: Probing VFIO support... EAL: VFIO support initialized virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Success virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Success virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Success virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Success virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Resource temporarily unavailable virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Resource temporarily unavailable virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Success virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Success virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Success virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Success virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Success virtio_user_dev_update_status(): VHOST_USER_GET_STATUS failed (-1): Resource temporarily unavailable virtio_user_dev_set_status(): VHOST_USER_SET_STATUS failed (-1): Resource temporarily unavailable EAL: No legacy callbacks, legacy socket not created Interactive-mode selected Set io packet forwarding mode Auto-start selected Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa. testpmd: create a new mbuf pool <mb_pool_0>: n=278528, size=9728, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc testpmd: create a new mbuf pool <mb_pool_1>: n=278528, size=9728, socket=1 testpmd: preferred mempool ops selected: ring_mp_mc Configuring Port 0 (socket 0) Port 0: DA:A1:88:E1:27:15 Configuring Port 1 (socket 0) Port 1: D6:D2:AB:DC:28:A4 Checking link statuses... Done Start automatic packet forwarding io packet forwarding - ports=2 - cores=8 - streams=8 - NUMA support enabled, MP allocation mode: native Logical Core 1 (socket 1) forwards packets on 1 streams: RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01 Logical Core 2 (socket 0) forwards packets on 1 streams: RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00 Logical Core 3 (socket 1) forwards packets on 1 streams: RX P=0/Q=1 (socket 0) -> TX P=1/Q=1 (socket 0) peer=02:00:00:00:00:01 Logical Core 4 (socket 0) forwards packets on 1 streams: RX P=1/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00 Logical Core 5 (socket 1) forwards packets on 1 streams: RX P=0/Q=2 (socket 0) -> TX P=1/Q=2 (socket 0) peer=02:00:00:00:00:01 Logical Core 6 (socket 0) forwards packets on 1 streams: RX P=1/Q=2 (socket 0) -> TX P=0/Q=2 (socket 0) peer=02:00:00:00:00:00 Logical Core 7 (socket 1) forwards packets on 1 streams: RX P=0/Q=3 (socket 0) -> TX P=1/Q=3 (socket 0) peer=02:00:00:00:00:01 Logical Core 8 (socket 0) forwards packets on 1 streams: RX P=1/Q=3 (socket 0) -> TX P=0/Q=3 (socket 0) peer=02:00:00:00:00:00 io packet forwarding packets/burst=32 nb forwarding cores=8 - nb forwarding ports=2 port 0: RX queue number: 4 Tx queue number: 4 Rx offloads=0x800 Tx offloads=0x0 RX queue: 0 RX desc=8192 - RX free threshold=0 RX threshold registers: pthresh=0 hthresh=0 wthresh=0 RX Offloads=0x800 TX queue: 0 TX desc=8192 - TX free threshold=0 TX threshold registers: pthresh=0 hthresh=0 wthresh=0 TX offloads=0x0 - TX RS bit threshold=0 port 1: RX queue number: 4 Tx queue number: 4 Rx offloads=0x800 Tx offloads=0x0 RX queue: 0 RX desc=8192 - RX free threshold=0 RX threshold registers: pthresh=0 hthresh=0 wthresh=0 RX Offloads=0x800 TX queue: 0 TX desc=8192 - TX free threshold=0 TX threshold registers: pthresh=0 hthresh=0 wthresh=0 TX offloads=0x0 - TX RS bit threshold=0 testpmd>
@eelco, I'm applying the changes from the microshift-networking packages, as "OPTIONS" in /etc/sysconfig/openvswitch and as a systemd .d file. https://github.com/openshift/microshift/pull/787 With this we don't need any changes on OpenvSwitch itself. Everything seems to work, In our case we don't care about DPDK, and we don't use hugepages. The lower performance is acceptable and the tweaks could always be reverted by any customers working with high pps. The only thing I see on the kernel traces is: [ 924.127699] openvswitch: cpu_id mismatch with handler threads [ 983.870273] openvswitch: cpu_id mismatch with handler threads [ 988.909174] openvswitch: cpu_id mismatch with handler threads
@eelco, btw thanks for the suggestions, I didn't know about the systemd .d service modifications.
(In reply to liting from comment #5) > I run comparing test between our standard container performance results and > versus enable the CPUAffinity=0 and --no-mlockall options. The 1q 2pmd case > got the similar result. The 2q 4pmd case with enable the options got > slightly lower performance than disable the options. The 4q 8pmd case run > failed when enable the two options. Please help me have a look. Thanks > The ovs verison and kernel verison: Just to clarify that the environment we want to focus on at the moment is the kernel data path and not the userspace with DPDK. fbl
(In reply to Miguel Angel Ajo from comment #6) [...] > The only thing I see on the kernel traces is: > > [ 924.127699] openvswitch: cpu_id mismatch with handler threads > [ 983.870273] openvswitch: cpu_id mismatch with handler threads > [ 988.909174] openvswitch: cpu_id mismatch with handler threads This is most probably being fixed by https://patchwork.ozlabs.org/project/openvswitch/list/?series=310161 because when systemd sets CPUAffinity, I think not all CPUs get mapped to handlers and then the warning is issued. I will let Michael S. to confirm.
(In reply to Flavio Leitner from comment #9) > (In reply to Miguel Angel Ajo from comment #6) > [...] > > The only thing I see on the kernel traces is: > > > > [ 924.127699] openvswitch: cpu_id mismatch with handler threads > > [ 983.870273] openvswitch: cpu_id mismatch with handler threads > > [ 988.909174] openvswitch: cpu_id mismatch with handler threads > > This is most probably being fixed by > https://patchwork.ozlabs.org/project/openvswitch/list/?series=310161 > > because when systemd sets CPUAffinity, I think not all CPUs get mapped > to handlers and then the warning is issued. I will let Michael S. to > confirm. The CPUAffinity fix was in fact a form of limiting the threads created by ovs-vswitchd. I'm afraid that those patches could bump the memory used by ovs-vswitchd again. I wonder if we could control the number of threads manually, or add something to that patch to able to cap the number of threads, and avoid the warning, of course this is something that affects and will affect performance, but only targeted for embedded/edge targets.
Hi Miguel, The patch set in question does remove the 'cpu mismatch' warning. But it also increases the number of handler threads created. But in your case the CPU affinity is 0, so (IIUC) OVS is bound to only run on core. The patch set in question has an edge case where if OVS can only run on 2 or less cores, it will not create any additional threads. It will fallback to the normal behavior to creating as many handler threads as there are cores available for OVS to run (as long as that number is less than or euqal to 2). In your case that would be 1 handler thread and 1 revalidator. So it is possible the patch set would work as is in your use case Of course, I understand the need of creating as many threads as you would like. We already had this functionality with n-handler-threads, but that is currently broken with per-cpu dispatch mode. I can start working on a patch on top of the current patch set to reenable this functionality in per-cpu dispatch mode. The cpu mismatch warning is quiet harmless. That warning appears because that part of the OVS kernel module assumes OVS will be running on all online cores. IIUC CPU affinity and CPU isolation wasnt very well considered when developing per-cpu dispatch mode I was not aware you could change the affinity of OVS service by setting CPUAffinity= on the service file. Someone else had asked for a way to change the number of threads in OVS. If this workaround works well I will recommend it. My understanding is that when OVS is in per-cpu mode it will create as many handler threads as there are CPUs it can run on, and because you change the affinity via that flag you basically have control as to how many threads OVS creates. This is a brilliant workaround! Also, I just want to say that I am 100% behind this effort. I would like to decrease the memory and CPU usage of OVS. Im not too familiar with no-mlockall, so I will probably have to spend some time researching it
(In reply to Michael Santana from comment #11) > Hi Miguel, > > The patch set in question does remove the 'cpu mismatch' warning. But it > also increases the number of handler threads created. But in your case the > CPU affinity is 0, so (IIUC) OVS is bound to only run on core. The patch set > in question has an edge case where if OVS can only run on 2 or less cores, > it will not create any additional threads. It will fallback to the normal > behavior to creating as many handler threads as there are cores available > for OVS to run (as long as that number is less than or euqal to 2). In your > case that would be 1 handler thread and 1 revalidator. So it is possible the > patch set would work as is in your use case > Oh that's great it should totally work then. Sorry I missed those details on the patch, I was probably doing a diagonal reading at that time. > Of course, I understand the need of creating as many threads as you would > like. We already had this functionality with n-handler-threads, but that is > currently broken with per-cpu dispatch mode. I can start working on a patch > on top of the current patch set to reenable this functionality in per-cpu > dispatch mode. > > The cpu mismatch warning is quiet harmless. That warning appears because > that part of the OVS kernel module assumes OVS will be running on all online > cores. IIUC CPU affinity and CPU isolation wasnt very well considered when > developing per-cpu dispatch mode Ack, good to know it's harmless. It's mostly annoying when working on the device console, and I suspect customers will ask about it as soon as they face it, I think we should just include a comment about this, and a link to this bz on the release notes of MicroShift. > > > > I was not aware you could change the affinity of OVS service by setting > CPUAffinity= on the service file. Someone else had asked for a way to change > the number of threads in OVS. If this workaround works well I will recommend > it. It works like a charm. > My understanding is that when OVS is in per-cpu mode it will create as > many handler threads as there are CPUs it can run on, and because you change > the affinity via that flag you basically have control as to how many threads > OVS creates. This is a brilliant workaround! I think it was zshi's idea, not sure. > > Also, I just want to say that I am 100% behind this effort. I would like to > decrease the memory and CPU usage of OVS. Im not too familiar with > no-mlockall, so I will probably have to spend some time researching it Thanks a lot for all the feedback and details, very appreciated.
(In reply to Flavio Leitner from comment #8) > (In reply to liting from comment #5) > > I run comparing test between our standard container performance results and > > versus enable the CPUAffinity=0 and --no-mlockall options. The 1q 2pmd case > > got the similar result. The 2q 4pmd case with enable the options got > > slightly lower performance than disable the options. The 4q 8pmd case run > > failed when enable the two options. Please help me have a look. Thanks > > The ovs verison and kernel verison: > > Just to clarify that the environment we want to focus on at the > moment is the kernel data path and not the userspace with DPDK. > > fbl I run comparing test between our standard ovs kernel with container result versus enable the CPUAffinity=0 and --no-mlockall options.When enable the CPUAffinity=0 and --no-mlockall options, the ovs kernel result got slightly lower performance than the without the two options. The ovs kernel topo as following. [root@dell-per740-57 perf]# ovs-vsctl show 7c8dccc0-a28e-4f24-9e2a-78be2ae99c5f Bridge ovsbr0 Port p2 Interface p2 Port ovsbr0 Interface ovsbr0 type: internal Port enp59s0f1 Interface enp59s0f1 Port p1 Interface p1 Port enp59s0f0 Interface enp59s0f0 ovs_version: "2.17.3" The container command: [root@dell-per740-57 perf]# ps aux|grep podman root 181871 0.2 0.0 3486656 52992 pts/0 Sl+ 22:51 0:04 podman run -i -t --privileged -v /dev/hugepages:/dev/hugepages 4f4c841655b8 dpdk-testpmd -l 2,4,6 -n 1 -m 1024 --no-pci --vdev=net_tap0,iface=p1,mac=00:de:ad:00:00:01 --vdev=net_tap1,iface=p2,mac=00:de:ad:00:00:02 -- -i --forward-mode=io --burst=32 --rxd=8192 --txd=8192 --nb-cores=2 --rxq=1 --txq=1 --auto-start The ovs and kernel build: [root@dell-per740-57 perf]# rpm -qa|grep openvs kernel-kernel-networking-openvswitch-perf-1.0-334.noarch openvswitch-selinux-extra-policy-1.0-31.el9fdp.noarch openvswitch2.17-2.17.0-30.el9fdp.x86_64 [root@dell-per740-57 perf]# uname -r 5.14.0-130.el9.x86_64 During the standard test, ovs-vswitchd memory is 287224kb, and run ovs container kernel case three times, the result are 0.236pps,0.238pps,0.244pps. [root@dell-per740-57 bash_perf_result]# ps aux | grep ovs openvsw+ 157318 0.0 0.0 45356 20408 ? S<s 21:35 0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach openvsw+ 157367 1.9 0.4 4350716 287224 ? S<Lsl 21:35 0:50 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach root 170137 0.0 0.0 6408 2240 pts/1 S+ 22:17 0:00 grep --color=auto ovs And then enable the CPUAffinity=0 and --no-mlockall options. Rerun the case three times, the result are 0.206pps,0.201pps,0.193pps. ovs-vswitchd memory is 21268kb. [root@dell-per740-57 perf]# ps aux|grep ovs openvsw+ 179135 0.0 0.0 45352 20548 ? S<s 22:30 0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach openvsw+ 179189 0.0 0.0 363064 21268 ? S<sl 22:30 0:02 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach root 182394 0.0 0.0 6408 2216 pts/1 S+ 23:12 0:00 grep --color=auto ovs
(In reply to liting from comment #13) > (In reply to Flavio Leitner from comment #8) > > Just to clarify that the environment we want to focus on at the > > moment is the kernel data path and not the userspace with DPDK. [...] > The container command: > [root@dell-per740-57 perf]# ps aux|grep podman > root 181871 0.2 0.0 3486656 52992 pts/0 Sl+ 22:51 0:04 podman > run -i -t --privileged -v /dev/hugepages:/dev/hugepages 4f4c841655b8 > dpdk-testpmd -l 2,4,6 -n 1 -m 1024 --no-pci > --vdev=net_tap0,iface=p1,mac=00:de:ad:00:00:01 > --vdev=net_tap1,iface=p2,mac=00:de:ad:00:00:02 -- -i --forward-mode=io > --burst=32 --rxd=8192 --txd=8192 --nb-cores=2 --rxq=1 --txq=1 --auto-start Hi Liting, I think there is still a confusion because the test is using DPDK inside the container. We don't need any DPDK in this environment. Therefore, the test needs to be OVS without DPDK connected to normal VM or normal containers or another hosts, and then run iperf3 or netperf to test traffic. Example test #1: host-to-host host A host B +-------------------------------+ +----------------------------------+ Container +-------+ IPADDR1 IPADDR2 veth1 --- veth0 --- ovsbr0 -- eth0 <=> eth0 --- ovsbr0 --- veth0 --- veth1 Example test #2: container-to-container host A +-------------------------------+ Container1 +-------+ IPADDR1 vethc1 --- veth0--------- | ovsbr0 Container2 | +-------+ | IPADDR2 | vethc2 --- veth1 -------- fbl
(In reply to Flavio Leitner from comment #14) > (In reply to liting from comment #13) > > (In reply to Flavio Leitner from comment #8) > > > Just to clarify that the environment we want to focus on at the > > > moment is the kernel data path and not the userspace with DPDK. > [...] > > The container command: > > [root@dell-per740-57 perf]# ps aux|grep podman > > root 181871 0.2 0.0 3486656 52992 pts/0 Sl+ 22:51 0:04 podman > > run -i -t --privileged -v /dev/hugepages:/dev/hugepages 4f4c841655b8 > > dpdk-testpmd -l 2,4,6 -n 1 -m 1024 --no-pci > > --vdev=net_tap0,iface=p1,mac=00:de:ad:00:00:01 > > --vdev=net_tap1,iface=p2,mac=00:de:ad:00:00:02 -- -i --forward-mode=io > > --burst=32 --rxd=8192 --txd=8192 --nb-cores=2 --rxq=1 --txq=1 --auto-start > > Hi Liting, I think there is still a confusion because the test > is using DPDK inside the container. We don't need any DPDK in this > environment. > > Therefore, the test needs to be OVS without DPDK connected to > normal VM or normal containers or another hosts, and then run > iperf3 or netperf to test traffic. > > > Example test #1: host-to-host > > host A host B > +-------------------------------+ +----------------------------------+ > Container > +-------+ > IPADDR1 IPADDR2 > veth1 --- veth0 --- ovsbr0 -- eth0 <=> eth0 --- ovsbr0 --- veth0 --- veth1 > > > Example test #2: container-to-container > host A > +-------------------------------+ > Container1 > +-------+ > IPADDR1 > vethc1 --- veth0--------- > | > ovsbr0 > Container2 | > +-------+ | > IPADDR2 | > vethc2 --- veth1 -------- > > > fbl I build topo of example test #1, according to the iperf3 results, enabling the two options has no impact on performance. ice card on dell740-57 <--> i40e card on dell740-73 [root@dell-per740-57 ~]# rpm -qa|grep openvswitch kernel-kernel-networking-openvswitch-common-2.0-234.noarch kernel-kernel-networking-openvswitch-perf-1.0-336.noarch openvswitch-selinux-extra-policy-1.0-31.el9fdp.noarch openvswitch2.17-2.17.0-30.el9fdp.x86_64 [root@dell-per740-57 ~]# uname -r 5.14.0-70.17.1.el9_0.x86_64 [root@dell-per740-73 ~]# uname -r 4.18.0-305.25.1.el8_4.x86_64 [root@dell-per740-73 ~]# rpm -qa|grep openvswitch openvswitch2.15-2.15.0-113.el8fdp.x86_64 kernel-kernel-networking-openvswitch-perf-1.0-336.noarch openvswitch-selinux-extra-policy-1.0-29.el8fdp.noarch kernel-kernel-networking-openvswitch-common-2.0-234.noarch On dell740-57: ip link add veth1 type veth peer name veth0 ip link set veth1 up ip link set veth0 up ovs-vsctl add-br ovsbr0 ovs-vsctl add-port ovsbr0 veth0 ovs-vsctl add-port ovsbr0 ens1f0 ip link set ovsbr0 up ip netns add ns1 ip link set veth1 netns ns1 ip netns exec ns1 ip add add 20.0.0.1/24 dev veth1 ip netns exec ns1 ip link set veth1 up On dell740-73 ip link add veth1 type veth peer name veth0 ip link set veth1 up ip link set veth0 up ovs-vsctl add-br ovsbr0 ovs-vsctl add-port ovsbr0 veth0 ovs-vsctl add-port ovsbr0 ens2f0 ip link set ovsbr0 up ip netns add ns1 ip link set veth1 netns ns1 ip netns exec ns1 ip add add 20.0.0.2/24 dev veth1 ip netns exec ns1 ip link set veth1 up [root@dell-per740-57 ~]# ps aux|grep ovs openvsw+ 73499 0.0 0.0 45248 19540 ? S<s 22:12 0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach openvsw+ 73563 2.0 0.4 4350508 286860 ? S<Lsl 22:12 0:42 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach root 74007 0.0 0.0 6412 2184 pts/2 S+ 22:47 0:00 grep --color=auto ovs Start iperf server on dell740-73 [root@dell-per740-73 ~]# ip netns exec ns1 iperf3 -s -B 20.0.0.2 ----------------------------------------------------------- Server listening on 5201 ----------------------------------------------------------- Accepted connection from 20.0.0.1, port 53716 [ 5] local 20.0.0.2 port 5201 connected to 20.0.0.1 port 39056 [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 5] 0.00-1.00 sec 129 KBytes 1.05 Mbits/sec 311462284.854 ms 0/91 (0%) [ 5] 1.00-2.00 sec 127 KBytes 1.04 Mbits/sec 935018.797 ms 0/90 (0%) [ 5] 2.00-3.00 sec 129 KBytes 1.05 Mbits/sec 2631.520 ms 0/91 (0%) [ 5] 3.00-4.00 sec 127 KBytes 1.04 Mbits/sec 7.901 ms 0/90 (0%) [ 5] 4.00-5.00 sec 129 KBytes 1.05 Mbits/sec 0.023 ms 0/91 (0%) [ 5] 5.00-6.00 sec 129 KBytes 1.05 Mbits/sec 0.001 ms 0/91 (0%) [ 5] 6.00-7.00 sec 127 KBytes 1.04 Mbits/sec 0.002 ms 0/90 (0%) [ 5] 7.00-8.00 sec 129 KBytes 1.05 Mbits/sec 0.001 ms 0/91 (0%) [ 5] 8.00-9.00 sec 127 KBytes 1.04 Mbits/sec 0.001 ms 0/90 (0%) [ 5] 9.00-10.00 sec 129 KBytes 1.05 Mbits/sec 0.002 ms 0/91 (0%) - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 5] 0.00-10.00 sec 1.25 MBytes 1.05 Mbits/sec 0.002 ms 0/906 (0%) receiver Run iperf test on dell740-57, the result is 128 KBytes/sec. [root@dell-per740-57 ~]# ip netns exec ns1 iperf3 -c 20.0.0.2 -4 -f K -u Connecting to host 20.0.0.2, port 5201 [ 5] local 20.0.0.1 port 39298 connected to 20.0.0.2 port 5201 [ ID] Interval Transfer Bitrate Total Datagrams [ 5] 0.00-1.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 1.00-2.00 sec 127 KBytes 127 KBytes/sec 90 [ 5] 2.00-3.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 3.00-4.00 sec 127 KBytes 127 KBytes/sec 90 [ 5] 4.00-5.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 5.00-6.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 6.00-7.00 sec 127 KBytes 127 KBytes/sec 90 [ 5] 7.00-8.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 8.00-9.00 sec 127 KBytes 127 KBytes/sec 90 [ 5] 9.00-10.00 sec 129 KBytes 129 KBytes/sec 91 - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 5] 0.00-10.00 sec 1.25 MBytes 128 KBytes/sec 0.000 ms 0/906 (0%) sender [ 5] 0.00-10.00 sec 1.25 MBytes 128 KBytes/sec 0.001 ms 0/906 (0%) receiver iperf Done. Then enable the CPUAffinity=0 and --no-mlockall options for ovs service. [root@dell-per740-57 ~]# ps aux|grep ovs openvsw+ 74366 0.0 0.0 362848 20444 ? S<sl 22:50 0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach openvsw+ 74421 0.0 0.0 45260 19940 ? S<s 22:50 0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach root 74454 0.0 0.0 6412 2168 pts/2 S+ 22:52 0:00 grep --color=auto ovs Rerun the iperf test, the iperf test still is 128 KBytes/sec. [root@dell-per740-57 ~]# ip netns exec ns1 iperf3 -c 20.0.0.2 -4 -f K -u -t 10 Connecting to host 20.0.0.2, port 5201 [ 5] local 20.0.0.1 port 39056 connected to 20.0.0.2 port 5201 [ ID] Interval Transfer Bitrate Total Datagrams [ 5] 0.00-1.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 1.00-2.00 sec 127 KBytes 127 KBytes/sec 90 [ 5] 2.00-3.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 3.00-4.00 sec 127 KBytes 127 KBytes/sec 90 [ 5] 4.00-5.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 5.00-6.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 6.00-7.00 sec 127 KBytes 127 KBytes/sec 90 [ 5] 7.00-8.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 8.00-9.00 sec 127 KBytes 127 KBytes/sec 90 [ 5] 9.00-10.00 sec 129 KBytes 129 KBytes/sec 91 - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 5] 0.00-10.00 sec 1.25 MBytes 128 KBytes/sec 0.000 ms 0/906 (0%) sender [ 5] 0.00-10.00 sec 1.25 MBytes 128 KBytes/sec 0.002 ms 0/906 (0%) receiver iperf Done.
I also test example test #2, enabling the two options has no impact on performance. on dell740-57: ip link add vethc1 type veth peer name veth0 ip link add vethc2 type veth peer name veth1 ovs-vsctl add-br ovsbr0 ovs-vsctl add-port ovsbr0 veth0 ovs-vsctl add-port ovsbr0 veth1 ip netns add ns1 ip netns add ns2 ip link set vethc1 netns ns1 ip link set vethc2 netns ns2 ip netns exec ns1 ip add add 20.0.0.1/24 dev vethc1 ip netns exec ns2 ip add add 20.0.0.2/24 dev vethc2 ip link set veth0 up ip link set veth1 up ip link set ovsbr0 up ip netns exec ns2 ip link set vethc2 up ip netns exec ns1 ip link set vethc1 up [root@dell-per740-57 ~]# ps aux|grep ovs openvsw+ 75137 2.6 0.3 2458092 229572 ? S<Lsl 23:17 0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach openvsw+ 75188 0.0 0.0 45260 19908 ? S<s 23:17 0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach root 75248 0.0 0.0 6412 2372 pts/0 R+ 23:17 0:00 grep --color=auto ovs Start iperf server. [root@dell-per740-57 ~]# ip netns exec ns1 iperf3 -s -B 20.0.0.1 ----------------------------------------------------------- Server listening on 5201 ----------------------------------------------------------- Accepted connection from 20.0.0.2, port 47974 [ 5] local 20.0.0.1 port 5201 connected to 20.0.0.2 port 34476 [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 5] 0.00-1.00 sec 129 KBytes 1.05 Mbits/sec 0.001 ms 0/91 (0%) [ 5] 1.00-2.00 sec 127 KBytes 1.04 Mbits/sec 0.001 ms 0/90 (0%) [ 5] 2.00-3.00 sec 129 KBytes 1.05 Mbits/sec 0.001 ms 0/91 (0%) [ 5] 3.00-4.00 sec 127 KBytes 1.04 Mbits/sec 0.000 ms 0/90 (0%) [ 5] 4.00-5.00 sec 129 KBytes 1.05 Mbits/sec 0.001 ms 0/91 (0%) [ 5] 5.00-6.00 sec 129 KBytes 1.05 Mbits/sec 0.001 ms 0/91 (0%) [ 5] 6.00-7.00 sec 127 KBytes 1.04 Mbits/sec 0.001 ms 0/90 (0%) [ 5] 7.00-8.00 sec 129 KBytes 1.05 Mbits/sec 0.001 ms 0/91 (0%) [ 5] 8.00-9.00 sec 127 KBytes 1.04 Mbits/sec 0.001 ms 0/90 (0%) [ 5] 9.00-10.00 sec 129 KBytes 1.05 Mbits/sec 0.001 ms 0/91 (0%) - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 5] 0.00-10.00 sec 1.25 MBytes 1.05 Mbits/sec 0.001 ms 0/906 (0%) receiver ----------------------------------------------------------- Run iperf test, result is 128 KBytes/sec. [root@dell-per740-57 ~]# ip netns exec ns2 iperf3 -c 20.0.0.1 -4 -f K -u Connecting to host 20.0.0.1, port 5201 [ 5] local 20.0.0.2 port 34476 connected to 20.0.0.1 port 5201 [ ID] Interval Transfer Bitrate Total Datagrams [ 5] 0.00-1.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 1.00-2.00 sec 127 KBytes 127 KBytes/sec 90 [ 5] 2.00-3.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 3.00-4.00 sec 127 KBytes 127 KBytes/sec 90 [ 5] 4.00-5.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 5.00-6.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 6.00-7.00 sec 127 KBytes 127 KBytes/sec 90 [ 5] 7.00-8.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 8.00-9.00 sec 127 KBytes 127 KBytes/sec 90 [ 5] 9.00-10.00 sec 129 KBytes 129 KBytes/sec 91 - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 5] 0.00-10.00 sec 1.25 MBytes 128 KBytes/sec 0.000 ms 0/906 (0%) sender [ 5] 0.00-10.00 sec 1.25 MBytes 128 KBytes/sec 0.001 ms 0/906 (0%) receiver iperf Done. Enable the CPUAffinity=0 and --no-mlockall options for ovs service. [root@dell-per740-57 ~]# ps aux|grep ovs openvsw+ 74366 0.0 0.0 362916 21248 ? S<sl 22:50 0:01 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach openvsw+ 74421 0.0 0.0 45260 19940 ? S<s 22:50 0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach root 74764 0.0 0.0 6412 2372 pts/1 R+ 23:16 0:00 grep --color=auto ovs Rerun iperf test, the result still is 128 KBytes/sec. [root@dell-per740-57 ~]# ip netns exec ns2 iperf3 -c 20.0.0.1 -4 -f K -u Connecting to host 20.0.0.1, port 5201 [ 5] local 20.0.0.2 port 34476 connected to 20.0.0.1 port 5201 [ ID] Interval Transfer Bitrate Total Datagrams [ 5] 0.00-1.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 1.00-2.00 sec 127 KBytes 127 KBytes/sec 90 [ 5] 2.00-3.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 3.00-4.00 sec 127 KBytes 127 KBytes/sec 90 [ 5] 4.00-5.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 5.00-6.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 6.00-7.00 sec 127 KBytes 127 KBytes/sec 90 [ 5] 7.00-8.00 sec 129 KBytes 129 KBytes/sec 91 [ 5] 8.00-9.00 sec 127 KBytes 127 KBytes/sec 90 [ 5] 9.00-10.00 sec 129 KBytes 129 KBytes/sec 91 - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 5] 0.00-10.00 sec 1.25 MBytes 128 KBytes/sec 0.000 ms 0/906 (0%) sender [ 5] 0.00-10.00 sec 1.25 MBytes 128 KBytes/sec 0.001 ms 0/906 (0%) receiver iperf Done.
Hi Liting, Thanks for testing those scenarios. It looks promising. Can you also test with TCP traffic? Also, can we add that as part of usual FDP QE cycle? Thanks, fbl
(In reply to Flavio Leitner from comment #17) > Hi Liting, > > Thanks for testing those scenarios. It looks promising. > Can you also test with TCP traffic? > Also, can we add that as part of usual FDP QE cycle? > > Thanks, > fbl I run tcp test for the example test topo #2. The tcp performance of enable the CPUAffinity=0 and --no-mlockall options for ovs service is slightly higher than standard ovs setting. With the default options for ovs service. The result still is 4983922 KBytes/sec. [root@dell-per740-57 ~]# ps aux|grep ovs openvsw+ 23052 0.3 0.0 45248 19952 ? S<s 04:54 0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach openvsw+ 23101 1.6 0.2 159724 159616 ? S<Ls 04:54 0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach root 23112 0.0 0.0 6412 2176 pts/0 S+ 04:54 0:00 grep --color=auto ovs [root@dell-per740-57 ~]# ip netns exec ns2 iperf3 -c 20.0.0.1 -4 -f K Connecting to host 20.0.0.1, port 5201 [ 5] local 20.0.0.2 port 35736 connected to 20.0.0.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 4.73 GBytes 4960586 KBytes/sec 0 563 KBytes [ 5] 1.00-2.00 sec 4.82 GBytes 5057487 KBytes/sec 0 563 KBytes [ 5] 2.00-3.00 sec 4.80 GBytes 5029721 KBytes/sec 0 563 KBytes [ 5] 3.00-4.00 sec 4.82 GBytes 5053176 KBytes/sec 0 563 KBytes [ 5] 4.00-5.00 sec 4.83 GBytes 5060285 KBytes/sec 0 563 KBytes [ 5] 5.00-6.00 sec 4.83 GBytes 5063124 KBytes/sec 0 563 KBytes [ 5] 6.00-7.00 sec 4.84 GBytes 5074082 KBytes/sec 0 563 KBytes [ 5] 7.00-8.00 sec 4.62 GBytes 4838558 KBytes/sec 0 563 KBytes [ 5] 8.00-9.00 sec 4.63 GBytes 4849993 KBytes/sec 0 563 KBytes [ 5] 9.00-10.00 sec 4.63 GBytes 4852224 KBytes/sec 0 563 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 47.5 GBytes 4983922 KBytes/sec 0 sender [ 5] 0.00-10.00 sec 47.5 GBytes 4983919 KBytes/sec receiver iperf Done. Enable the CPUAffinity=0 and --no-mlockall options for ovs service. The result is 5453743 KBytes/sec. [root@dell-per740-57 ~]# ps aux|grep ovs openvsw+ 23854 0.2 0.0 362848 20392 ? S<sl 04:58 0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach openvsw+ 23909 0.0 0.0 45260 19996 ? S<s 04:58 0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach root 23935 0.0 0.0 6412 2300 pts/0 R+ 04:58 0:00 grep --color=auto ovs [root@dell-per740-57 ~]# ip netns exec ns2 iperf3 -c 20.0.0.1 -4 -f K Connecting to host 20.0.0.1, port 5201 [ 5] local 20.0.0.2 port 34972 connected to 20.0.0.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 4.70 GBytes 4927113 KBytes/sec 0 563 KBytes [ 5] 1.00-2.00 sec 5.15 GBytes 5397225 KBytes/sec 0 563 KBytes [ 5] 2.00-3.00 sec 5.26 GBytes 5515245 KBytes/sec 0 563 KBytes [ 5] 3.00-4.00 sec 5.28 GBytes 5532652 KBytes/sec 0 563 KBytes [ 5] 4.00-5.00 sec 5.26 GBytes 5517109 KBytes/sec 0 563 KBytes [ 5] 5.00-6.00 sec 5.27 GBytes 5528342 KBytes/sec 0 563 KBytes [ 5] 6.00-7.00 sec 5.27 GBytes 5530524 KBytes/sec 0 563 KBytes [ 5] 7.00-8.00 sec 5.28 GBytes 5531651 KBytes/sec 0 563 KBytes [ 5] 8.00-9.00 sec 5.28 GBytes 5537053 KBytes/sec 0 563 KBytes [ 5] 9.00-10.00 sec 5.26 GBytes 5520596 KBytes/sec 0 662 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 52.0 GBytes 5453743 KBytes/sec 0 sender [ 5] 0.00-10.00 sec 52.0 GBytes 5453740 KBytes/sec receiver iperf Done. We can add that as part of usual FDP QE cycle later. thanks, Li Ting
@Flavio, do you have an estimation on which FDP release this feature will be supported? @Liting, where/how do we track the inclusion of the above testing in FDP?
(In reply to zenghui.shi from comment #19) > @Flavio, do you have an estimation on which FDP release this feature will be > supported? > > @Liting, where/how do we track the inclusion of the above testing in FDP? @Zenghui, We consider add this test to our topo(networking/openvswitch/topo) case. And jiqiu will help to write this case to the topo case. thanks, Li Ting
(In reply to zenghui.shi from comment #19) > @Flavio, do you have an estimation on which FDP release this feature will be > supported? There is a pending question for Michael S. with regards to the per-cpu dispatch mode and the handlers threads with cpu affinity. Michael, it seems we need the patch applied. Is that correct? If so, we could enable support in the next FDP release 22.H: https://source.redhat.com/groups/public/fast_datapath_release_planning_and_scheduling/fdp_releases_2022 BTW, all this is assuming the support is limited to OVS kernel data path, so it does not include DPDK support. Zenghui, please confirm.
Hi all The handlers fix has been applied so you should not see a mismatch warning. With that said, I should make a correction over a mistake I made in my previous comment. The comment is: "The patch set in question has an edge case where if OVS can only runon 2 or less cores, it will not create any additional threads. " This comment is false. The edge case is if the TOTAL number of configured cores is less than or equal to 2, OVS will not create any additional threads. It will fall back to creating threads based on affinity. The point is, just setting CPUAffinity=0 is not enough to restrict the number of handler threads to one single handler thread. They will also need to be running on a system with 2 or less cores for that to happen i.e. 1 Core system, CPUAffinity=0 Makes 1 handlers 2 Core system, CPUAffinity=0 Makes 1 handler 3 or more Core system, CPUAffinity=0 Makes 2 handlers - This number does not change if we add more cores as long as we set CPUAffinity=0 Based on some documentation I saw, the microshift systems are using 2-4 cores. When using CPUAffinity=0 this will create 2 handler threads as shown above (or one thread in the case of 2 cores) If you absolutely need to have exactly 1 handler threads as opposed to 2 handler threads, then I think the best solution is to fix n-handler-threads for per-cpu mode. I can start looking into proposing a patch
(In reply to Flavio Leitner from comment #21) > (In reply to zenghui.shi from comment #19) > > @Flavio, do you have an estimation on which FDP release this feature will be > > supported? > > There is a pending question for Michael S. with regards to the per-cpu > dispatch > mode and the handlers threads with cpu affinity. > > Michael, it seems we need the patch applied. Is that correct? If so, we could > enable support in the next FDP release 22.H: > https://source.redhat.com/groups/public/ > fast_datapath_release_planning_and_scheduling/fdp_releases_2022 > > BTW, all this is assuming the support is limited to OVS kernel data path, so > it > does not include DPDK support. Zenghui, please confirm. Yes, I confirm that DPDK won't be supported in the next microshift release. I will let you know in the case there will be changes in the future.
(In reply to Michael Santana from comment #22) > Hi all > > The handlers fix has been applied so you should not see a mismatch warning. May I know which ovs version contains the fix? do we have another bugzilla that tracks this fix? I was looking at this one which is in new state: https://bugzilla.redhat.com/show_bug.cgi?id=2102449 > > > Based on some documentation I saw, the microshift systems are using 2-4 > cores. When using CPUAffinity=0 this will create 2 handler threads as shown > above (or one thread in the case of 2 cores) > > If you absolutely need to have exactly 1 handler threads as opposed to 2 > handler threads, then I think the best solution is to fix n-handler-threads > for per-cpu mode. I can start looking into proposing a patch The general idea is to optimize the resource usage as much as possible. I think we will need to run some tests with > 2 cores system to see how much memory is actually allocated to ovs-vswitchd in microshift env, and whether it is acceptable to customers. Then come back with a definite answer. For the testing, which ovs version shall be used?
(In reply to zenghui.shi from comment #24) > (In reply to Michael Santana from comment #22) > > Hi all > > > > The handlers fix has been applied so you should not see a mismatch warning. > > May I know which ovs version contains the fix? ovs2.16, 2.17, 3.0, dpdk-latest > do we have another bugzilla that tracks this fix? I was looking at this > one which is in new state: > https://bugzilla.redhat.com/show_bug.cgi?id=2102449 yes, I need to update that bugzilla. thanks for the reminder > > > > > > > Based on some documentation I saw, the microshift systems are using 2-4 > > cores. When using CPUAffinity=0 this will create 2 handler threads as shown > > above (or one thread in the case of 2 cores) > > > > If you absolutely need to have exactly 1 handler threads as opposed to 2 > > handler threads, then I think the best solution is to fix n-handler-threads > > for per-cpu mode. I can start looking into proposing a patch > > The general idea is to optimize the resource usage as much as possible. > I think we will need to run some tests with > 2 cores system to see how much > memory is actually allocated to ovs-vswitchd in microshift env, and whether > it is acceptable to customers. Then come back with a definite answer. > > For the testing, which ovs version shall be used? All the latest versions contain the fix so im not really sure
Flavio, may I ask which OVS version and FDP release can we expect this feature to be supported?
Hi, The feature "--no-mlockall" is already available and QE found no issues, so we can support that right away. The CPUAffinity causes the cpu_id mismatch log message that is fixed by bz#2102449. That BZ is queued for the next FDP release 22.I. The schedule is here: https://source.redhat.com/groups/public/fast_datapath_release_planning_and_scheduling/fdp_releases_2022 The build containing the above fix is available for internal consumption at: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2123539 The last open question is regarding to manually setting the number of threads. If that is required, then we will need work upstream first. Thanks, fbl
Tests have been added to our current suites. Will check in with Zenghui occasionally to make sure our coverage for this feature is up to date.
I summited a patch allowing you to specify the number of handlers when using per-cpu mode https://mail.openvswitch.org/pipermail/ovs-dev/2022-October/398793.html
This bug did not meet the criteria for automatic migration and is being closed. If the issue remains, please open a new ticket in https://issues.redhat.com/browse/FDP
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days