Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1298079

Summary:	[RFE] - Provide option in Nova to configure emulator threads to run on dedicated physical CPU's
Product:	Red Hat OpenStack	Reporter:	pagupta
Component:	openstack-nova	Assignee:	Sahid Ferdjaoui <sferdjao>
Status:	CLOSED ERRATA	QA Contact:	awaugama
Severity:	low	Docs Contact:	Joe H. Rahme <jhakimra>
Priority:	urgent
Version:	12.0 (Pike)	CC:	atelang, awaugama, berrange, chris.brown, dasmith, djuran, eglynn, fbaudin, fherrman, jdonohue, jhakimra, jhsiao, jial, jjung, jniu, jraju, juzhang, jwang, kchamart, lruzicka, nlevinki, owalsh, panbalag, pezhang, pliu, sbauza, sclewis, sferdjao, sgordon, srevivo, tvignaud, vchundur, vromanso, zshi
Target Milestone:	Upstream M2	Keywords:	FutureFeature, Triaged
Target Release:	12.0 (Pike)
Hardware:	x86_64
OS:	Linux
URL:	https://blueprints.launchpad.net/nova/+spec/libvirt-emulator-threads-policy
Whiteboard:	upstream_milestone_pike-1 upstream_definition_approved upstream_status_implemented
Fixed In Version:	openstack-nova-16.0.0-0.20170624031428.3863eca.el7ost	Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-12-13 20:37:32 UTC	Type:	Feature Request
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1235009, 1321653, 1341176, 1342229, 1389435, 1389441, 1414580, 1442136, 1469405

Description pagupta 2016-01-13 08:22:46 UTC

Description of problem:

    If we use Nova option 'vcpu_pin_set' to pin vCPU's to host physical CPU's.
    Currently "emulatorpin" libvirt option gets configured to same physical
    CPU's on which vCPU's are pinned. We need to new configuration so 
    that emulator threads should be configured to run on different isolated 
    physical CPU's. This option is required for Nova with realtime KVM to get 
    best performance.

Version-Release number of selected component (if applicable):

openstack-nova-2015.1.1-1.el7

How reproducible:
Boot up instance after configuring 'vcpu_pin_set' option in 'nova.conf' file.

Steps to Reproduce:
1. Configure Nova with option 'vcpu_pin_set' option.
2. virsh edit 'instance' will show 'emulatorpin' option value(host CPU) same as 'vcpupin' option value.

Actual results:

virsh edit 'instance' shows 'emulatorpin' option value(host CPU) same as 'vcpupin' option value.

Expected results:

* We need another option to configure 'emulatorpin' option to separate set of isloated physical CPU's.

* virsh edit 'instance' should show 'emulatorpin' option value(host CPU) different then 'vcpupin' option value.

Additional info:

Comment 2 Sahid Ferdjaoui 2016-01-15 15:38:03 UTC

A spec has been proposed for Nova by Daniel Berrangé which should to address this RFE.

  https://review.openstack.org/#/c/225893/

Comment 3 Stephen Gordon 2016-01-29 15:26:26 UTC

Spec was not approved for Mitaka, will need to be re-proposed for Newton.

Comment 5 Peng Liu 2016-07-11 08:27:31 UTC

Besides the new option requirement, the 'cpuset' of 'emulatorpin' option is using the id of vcpu which is incorrect. The id of host cpu should be used here.

e.g.
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='3'/>
    <emulatorpin cpuset='0'/>          ### the cpuset should be '2' here, when cpu_realtime_mask=^0.  
    <vcpusched vcpus='1' scheduler='fifo' priority='1'/>
  </cputune>

Comment 6 Daniel Berrangé 2016-07-29 10:38:49 UTC

(In reply to Peng Liu from comment #5)
> Besides the new option requirement, the 'cpuset' of 'emulatorpin' option is
> using the id of vcpu which is incorrect. The id of host cpu should be used
> here.
> 
> e.g.
>   <cputune>
>     <vcpupin vcpu='0' cpuset='2'/>
>     <vcpupin vcpu='1' cpuset='3'/>
>     <emulatorpin cpuset='0'/>          ### the cpuset should be '2' here,
> when cpu_realtime_mask=^0.  
>     <vcpusched vcpus='1' scheduler='fifo' priority='1'/>
>   </cputune>

This RFE has not been implemented, so this is not relevant to this bug.

Comment 7 Franck Baudin 2016-11-14 13:16:41 UTC

Additional requirement for NFV: emulator thread should be pinned out of the isolcpus set, as when not using an RT kernel, the emulator thread may never be scheduled id part of the isolated CPUs. 

Today, all CPUs used by Nova are part of isolcpus set
NFV do not use RT kernel
NFV do not really bother about emulator thread CPU consumption accounting

For NFV, pinning the emulator thread on CPU0 would be acceptable. Any other option is on the table as long as the CPU is out of the isolcpus list.

Comment 8 Daniel Berrangé 2016-11-14 13:19:46 UTC

(In reply to Franck Baudin from comment #7)
> Additional requirement for NFV: emulator thread should be pinned out of the
> isolcpus set, as when not using an RT kernel, the emulator thread may never
> be scheduled id part of the isolated CPUs. 

If not using an RT kernel, then isolcpus should *never* be used. Instead systemd CPUAffinity must be used for non-RT kernels.  So this requirement you're stating is not relevant.

Comment 9 Franck Baudin 2016-11-16 14:54:19 UTC

(In reply to Daniel Berrange from comment #8)
> (In reply to Franck Baudin from comment #7)
> > Additional requirement for NFV: emulator thread should be pinned out of the
> > isolcpus set, as when not using an RT kernel, the emulator thread may never
> > be scheduled id part of the isolated CPUs. 
> 
> If not using an RT kernel, then isolcpus should *never* be used. Instead
> systemd CPUAffinity must be used for non-RT kernels.  So this requirement
> you're stating is not relevant.

The issue is that isolcpus provides a stronger isolation than CPUAffinity (avoid kernel thread spawning), and that all NFV deployments use isolcpus without RT kernel. I understand that this is technically wrong/irrelevant, but until NFV can get an isolation level compared to isolcpus with CPUAffinity, our NFV users will still use isolcpus and manually re-pin the qemu  emulator thread to avoid to fall into https://bugzilla.redhat.com/show_bug.cgi?id=1321653.

Comment 10 Stephen Gordon 2016-11-16 17:46:07 UTC

(In reply to Franck Baudin from comment #9)
> (In reply to Daniel Berrange from comment #8)
> > (In reply to Franck Baudin from comment #7)
> > > Additional requirement for NFV: emulator thread should be pinned out of the
> > > isolcpus set, as when not using an RT kernel, the emulator thread may never
> > > be scheduled id part of the isolated CPUs. 
> > 
> > If not using an RT kernel, then isolcpus should *never* be used. Instead
> > systemd CPUAffinity must be used for non-RT kernels.  So this requirement
> > you're stating is not relevant.
> 
> The issue is that isolcpus provides a stronger isolation than CPUAffinity
> (avoid kernel thread spawning), and that all NFV deployments use isolcpus
> without RT kernel. 

Yes, because they followed our original advice on this which has now proven to be incorrect. We're now updating that advice, the correct approach going forward for the non-RT cases is to use CPUAffinity. Documentation and blog updates to cover this are pending.

> I understand that this is technically wrong/irrelevant,
> but until NFV can get an isolation level compared to isolcpus with
> CPUAffinity, our NFV users will still use isolcpus and manually re-pin the
> qemu  emulator thread to avoid to fall into
> https://bugzilla.redhat.com/show_bug.cgi?id=1321653.

This isn't a scalable approach to operating a cloud, the agreed and recommended approach to avoiding Bug # 1321653 (and discussed in the comments thereof) is to use CPUAffinity for the foreseeable future unless you have an RFE open against RHEL requesting an isolation level equivalent to isolcpus for non-RT workloads?

Comment 11 Franck Baudin 2016-11-16 18:06:18 UTC

Perf team is exploring how to get an equivalent isolation with CPUAffinity, see https://bugzilla.redhat.com/show_bug.cgi?id=1394932

Until they find the complete recipe, I'm afraid that customer will use isolcpus. And this is very fine with the proposed spec, as explained by Sahid below:

In the spec [1], emulator threads are going to be pinned on an
*additional* and isolated* pCPU. In other word, for a guest, the pCPU
used to run emulator threads is not in the set of the pCPUs used to
run vCPUs.

[1] https://review.openstack.org/#/c/284094/10/specs/ocata/approved/libvirt-emulator-threads-policy.rst

We "just" need to have the option at TripleO level so TripleO can pin the emulator threads out of the isolated CPUs,enforced by CPUAffinity in vanilla RHOSP10, that could be re-enforced with isolcpus by a post deployment script (not elegant I concede).

Comment 12 Franck Baudin 2016-11-16 18:10:54 UTC

Vijay, can you provide a TripleO yaml exanple file showing how CPUs are partitioned between nova, OVS-DPDK, and Linux/OpenStack [1]?

The emulator thread will be pinned in [1]. And [1] correspond to CPUAffinity

Comment 13 Stephen Gordon 2016-11-17 13:40:06 UTC

(In reply to Franck Baudin from comment #11)
> Perf team is exploring how to get an equivalent isolation with CPUAffinity,
> see https://bugzilla.redhat.com/show_bug.cgi?id=1394932
> 
> Until they find the complete recipe, I'm afraid that customer will use
> isolcpus. 

Yes, especially if we ourselves persist with intentionally giving them bad advice!

> And this is very fine with the proposed spec, as explained by
> Sahid below:
>
> In the spec [1], emulator threads are going to be pinned on an
> *additional* and isolated* pCPU. In other word, for a guest, the pCPU
> used to run emulator threads is not in the set of the pCPUs used to
> run vCPUs.

Can you elaborate on the new requirement then? It seems like you are saying there isn't one? Note that what you describe here will only occur for guests that explicitly opt into the new behavior.

> We "just" need to have the option at TripleO level so TripleO can pin the
> emulator threads out of the isolated CPUs,enforced by CPUAffinity in vanilla
> RHOSP10, that could be re-enforced with isolcpus by a post deployment script
> (not elegant I concede).

Not sure what you mean here, in none of the scenarios we've discussed would TripleO be responsible for pinning emulator threads - only for setting the right combination of cores (using CPUAffinity - or isolcpus in the RT case - and vcpu_pin_set) as available/unavailable for VM placement to guide the process actually doing the pinning as guests are created.

Comment 14 Franck Baudin 2016-11-18 11:32:16 UTC

Example of RHOSP10 TripleO parameter:

  NovaReservedHostMemory: 4096  
  NovaVcpuPinSet: ['8-17','20-71']  

NovaReservedHostMemory
Amount of memory to be reserved the host
 
NovaVcpuPinset
Configure the list of CPUs which should be used by nova to run the guest VMs. Note that this should exclude NeutronDpdkCoreList and the list of CPUs dedicated for host.

For RHOSP11, we need a new parameter in TripleO: 
  NovaEmulatorPinSet: ['0-1','55-56']

Comment 15 Stephen Gordon 2016-11-25 18:57:10 UTC

(In reply to Franck Baudin from comment #14)
> Example of RHOSP10 TripleO parameter:
> 
>   NovaReservedHostMemory: 4096  
>   NovaVcpuPinSet: ['8-17','20-71']  
> 
> NovaReservedHostMemory
> Amount of memory to be reserved the host
>  
> NovaVcpuPinset
> Configure the list of CPUs which should be used by nova to run the guest
> VMs. Note that this should exclude NeutronDpdkCoreList and the list of CPUs
> dedicated for host.
> 
> For RHOSP11, we need a new parameter in TripleO: 
>   NovaEmulatorPinSet: ['0-1','55-56']

Hrm, I actually really don't think we should be doing it this way - we should be only parameterizing how *many* cores to reserver for each purpose not exactly which ones. How do you intend the above to work across clusters that support differently sized compute node hardware (which we currently support)?

Comment 16 Stephen Gordon 2016-11-25 19:06:42 UTC

(In reply to Franck Baudin from comment #14)
> For RHOSP11, we need a new parameter in TripleO: 
>   NovaEmulatorPinSet: ['0-1','55-56']

From the approved Nova specification:

"""
A user which expresses the desire to
isolate emulator threads must use a flavor configured to accept that
specification as:

* hw:cpu_emulator_threads=isolate

Would say that this instance is to be considered to consume 1
additional host CPU. That pCPU used to make running emulator threads
is going to always be configured on the related guest NUMA node ID 0,
to make it predictable for users. Currently there is no desire to make
customizable the number of host CPUs running emulator threads since
only one should work for almost every use case. If in the future there
is a desire to isolate more than one host CPU to run emulator threads,
we would implement instead I/O threads to add granularity on
dedicating used resources to run guests on host CPUs.
"""

I don't see how this gels with the settings you are suggesting at the TripleO level because Nova is going to choose where it places the emulator threads on node 0 - there is no parameterization of this and it will only ever use exactly 1 core so I don't see how or where we would use two ranges of CPUs with this feature in 11, esp. when one appears to be on a different node.

Comment 17 Franck Baudin 2016-12-05 12:33:18 UTC

Got it, this doesn't requires a TripleO parameter, as it doesn't requires a nova.conf parameter, this is purely triggered by the flavor keys. Thanks Steve!

Comment 18 Stephen Gordon 2016-12-05 14:37:36 UTC

(In reply to Franck Baudin from comment #17)
> Got it, this doesn't requires a TripleO parameter, as it doesn't requires a
> nova.conf parameter, this is purely triggered by the flavor keys. Thanks
> Steve!

A clarification Q for you Sahid, I am assuming that *if* vcpu_pin_set is in use the pCPU for the emulator threads is also being allocated to a core from *within* that range, is that correct?

Comment 19 Sahid Ferdjaoui 2016-12-05 14:55:04 UTC

(In reply to Stephen Gordon from comment #18)
> (In reply to Franck Baudin from comment #17)
> > Got it, this doesn't requires a TripleO parameter, as it doesn't requires a
> > nova.conf parameter, this is purely triggered by the flavor keys. Thanks
> > Steve!
> 
> A clarification Q for you Sahid, I am assuming that *if* vcpu_pin_set is in
> use the pCPU for the emulator threads is also being allocated to a core from
> *within* that range, is that correct?

Yes, correct.

Comment 20 Stephen Gordon 2016-12-05 14:58:30 UTC

(In reply to Sahid Ferdjaoui from comment #19)
> (In reply to Stephen Gordon from comment #18)
> > (In reply to Franck Baudin from comment #17)
> > > Got it, this doesn't requires a TripleO parameter, as it doesn't requires a
> > > nova.conf parameter, this is purely triggered by the flavor keys. Thanks
> > > Steve!
> > 
> > A clarification Q for you Sahid, I am assuming that *if* vcpu_pin_set is in
> > use the pCPU for the emulator threads is also being allocated to a core from
> > *within* that range, is that correct?
> 
> Yes, correct.

Right makes sense, I think we are good then and don't need an additional TripleO setting for this specific specification.

Comment 22 Eoghan Glynn 2017-01-09 20:14:00 UTC

The patch series is well-developed, but still awaiting review traction.

Comment 23 Stephen Gordon 2017-01-27 14:53:18 UTC

Sahid had requested a feature freeze exception (FFE) for this work [1] but as somewhat expected the Nova PTL has indicated [2] that only items on the upstream Nova priorities list will be FFE candidates for Ocata.

Based on this I am moving this out to Pike, we'll need to re-submit the specification and keep the code series up to date so that we can attempt to land it early in the Pike cycle.

[1] http://lists.openstack.org/pipermail/openstack-dev/2017-January/111057.html
[2] http://lists.openstack.org/pipermail/openstack-dev/2017-January/111084.html
[3] https://specs.openstack.org/openstack/nova-specs/priorities/ocata-priorities.html

Comment 24 Sahid Ferdjaoui 2017-01-31 09:12:43 UTC

Spec re-proposed for Pike:
  https://review.openstack.org/#/c/427066/

Comment 26 Jean-Tsung Hsiao 2017-04-28 15:35:44 UTC

*** Bug 1446372 has been marked as a duplicate of this bug. ***

Comment 39 errata-xmlrpc 2017-12-13 20:37:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462