1628835 – OVS n-handler-threads and n-handler-threads parameters are incoherent with hypervisor isolated core list

Bug 1628835 - OVS n-handler-threads and n-handler-threads parameters are incoherent with hypervisor isolated core list

Summary: OVS n-handler-threads and n-handler-threads parameters are incoherent with hy...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	10.0 (Newton)
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Haresh Khandelwal
QA Contact:	Arik Chernetsky
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-09-14 07:17 UTC by Franck Baudin
Modified:	2020-03-10 11:18 UTC (History)
CC List:	7 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-8.3.1-86
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-03-10 11:18:29 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenStack gerrit	664600	0	None	MERGED	OVS Revalidator and handler threads	2020-12-28 13:21:52 UTC

Description Franck Baudin 2018-09-14 07:17:19 UTC

Description of problem:

Since RHOSP10, a compute node can be configured to dedicate CPUs for the host and dedicate CPUs for the VMs/vSwitch. This is not specific to OVS-DPDK deployments, for instance when only using SR-IOV and kernel OVS, cuch partitioning makes also sense.

Example, dedicate the first physical node of a compute node to the host:
HostCpusList: "'0,18,36,54'"

All OpenStack and hypervisor services will run on the HostCpusList, aka 4 CPUs in the example above. The problem is that ovs-vswitchd is not aware of such partitioning, and is spawning many userland threads as it believes that he can run on all of the CPUs (72 in my example):

  10574 ?        -    9864:01 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
      - -        S<Lsl   5:19 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:04 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        R<Lsl 4926:43 -
      - -        R<Lsl 4926:43 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   0:00 -
      - -        S<Lsl   1:05 -
      - -        S<Lsl   0:16 -
      - -        S<Lsl   0:13 -
      - -        S<Lsl   0:12 -
      - -        S<Lsl   0:18 -
      - -        S<Lsl   0:12 -
      - -        S<Lsl   0:12 -
      - -        S<Lsl   0:12 -
      - -        S<Lsl   0:12 -
      - -        S<Lsl   0:13 -
      - -        S<Lsl   0:12 -
      - -        S<Lsl   0:12 -
      - -        S<Lsl   0:12 -
      - -        S<Lsl   0:13 -
      - -        S<Lsl   0:17 -
      - -        S<Lsl   0:12 -
      - -        S<Lsl   0:12 -
      - -        S<Lsl   0:12 -
      - -        S<Lsl   0:12 -

ovs-vswitchd
ovs-vswitchd
ovs-vswitchd
ovs-vswitchd
ovs-vswitchd
ovs-vswitchd
dpdk_watchdog1
urcu2
ct_clean3
pmd220
pmd221
pmd222
pmd223
handler731
handler728
handler729
handler730
handler732
handler733
handler734
handler735
handler737
handler736
handler738
handler739
handler741
handler740
handler742
handler743
handler745
handler746
handler744
handler747
handler749
handler750
handler748
handler751
handler753
handler754
handler752
handler756
handler755
handler757
handler758
handler760
handler761
handler759
handler762
handler764
handler763
handler765
handler766
handler768
handler769
handler767
handler770
handler772
handler773
handler771
handler774
handler776
handler775
handler777
handler778
handler780
handler781
revalidator779
revalidator782
revalidator784
revalidator785
revalidator783
revalidator787
revalidator788
revalidator789
revalidator786
revalidator791
revalidator790
revalidator792
revalidator793
revalidator795
revalidator796
revalidator794
revalidator797
revalidator798
revalidator799

Those threads are revalidators and handlers, and the consequence of OVS design (see https://www.youtube.com/watch?v=wUJupgOAIgY if you're curious). Having more that available threads than available CPUs to run on make no sense,this has been confirmed by OVS SMEs. So when we dedicate 4 CPUs to the host, we should have exactly 4 handlers and 4 revalidators:

ovs-vsctl --no-wait set Open_vSwitch . other_config:n-handler-threads=4
ovs-vsctl --no-wait set Open_vSwitch . other_config:n-revalidator-threads=4


Version-Release number of selected component (if applicable): since RHOSP10


How reproducible: just deploy with HostCpusList


Additional info, criticity: why do we care? why is it important?

When debugging a live system in production in SEV1, or simply investigate an SOS report, having all of those unnecessary threads is making the investigations harder. Also, they are completely useless, and if all want to run at the same time, we will face a snowball effect. Because we have no time to trigger such corner testing proactively, not starting unnecessary threads can only be beneficial.

So this is a bug, not an RFE. This is not a regression: it has always been a pending risk.

Comment 2 Flavio Leitner 2018-09-19 18:12:27 UTC

The number of handlers is the number of online CPUs minus the number of revalidator threads. The number of revalidators is the number of online CPUs divided by 4 plus 1. So, it should not overcommit as the comment#0 is implying.

Regarding to limiting the number of threads, well, if they don't have anything to do, then the kernel would not schedule then. The old vswitchd had a duplicate events issue, where more threads could wake up, but still they should quickly go sleep again. There is a patch being pushed in upstream relying on epoll exclusive flag to wake up only a single thread.

Anyways, I don't know what this bug is requesting. Do you want OVS to start small and spawn more threads if needed? Do you want to have a fixed upper limit for the threads? I.e. more than 16 CPUs, just assume 16. Do you need a CPU mask parameter to tell exactly how many and in which CPUs the thread should be created?

Please clarify.
Thanks
fbl

Comment 3 Franck Baudin 2018-09-24 07:22:16 UTC

Thanks Flavio,

The context of this BZ is that in OpenStack NFV deployments, most of the CPUs are isolated and not available for ovs-vswitchd, as they are dedicated to run vCPUs and PMD threads. The numbers of revalidators and dispatchers threads is calculated based on the total number of CPUs, regardless if they are isolated or not. Most of the time, there will be only 4 CPUs  non isolated (one core, 2 HT, per NUMA node), while the total number of CPUs is 72 or even more.

So my proposal is to configure the number of revalidators and handler therads based on the non isolated CPUs list, which are known by OpenStack installer, so can be configured by OpenStack installer.

Thanks!
Franck

Comment 4 Flavio Leitner 2018-09-27 14:10:25 UTC

(In reply to Franck Baudin from comment #3)

My concern is that the revalidator/handler workload depends on variables that we can't predict (the traffic pattern and flow table), so maybe that number of CPUs is enough, maybe not. However, if the goal is resource isolation, then I think this is on the right track.

Well, OVS exposes the parameters to configure the number of the threads, but not the CPU mask. Perhaps we could add the CPU mask to indicate which CPUs are allowed to run the threads while the existing parameters define how many threads. 

Please let me know how you want to proceed.
fbl

Comment 5 Franck Baudin 2018-10-03 06:34:52 UTC

Thanks Flavio!

So we will calculate the revalidator/handler at TripleO level, based on the number on non isolated CPUs. Adding a parameter to OVS is not required at this point.

Thanks Again!
Franck

Comment 7 Lon Hohberger 2019-09-25 10:44:39 UTC

According to our records, this should be resolved by openstack-tripleo-heat-templates-8.3.1-87.el7ost.  This build is available now.

Comment 9 errata-xmlrpc 2020-03-10 11:18:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0760

Note You need to log in before you can comment on or make changes to this bug.