Hide Forgot
Description of problem: If we use Nova option 'vcpu_pin_set' to pin vCPU's to host physical CPU's. Currently "emulatorpin" libvirt option gets configured to same physical CPU's on which vCPU's are pinned. We need to new configuration so that emulator threads should be configured to run on different isolated physical CPU's. This option is required for Nova with realtime KVM to get best performance. Version-Release number of selected component (if applicable): openstack-nova-2015.1.1-1.el7 How reproducible: Boot up instance after configuring 'vcpu_pin_set' option in 'nova.conf' file. Steps to Reproduce: 1. Configure Nova with option 'vcpu_pin_set' option. 2. virsh edit 'instance' will show 'emulatorpin' option value(host CPU) same as 'vcpupin' option value. Actual results: virsh edit 'instance' shows 'emulatorpin' option value(host CPU) same as 'vcpupin' option value. Expected results: * We need another option to configure 'emulatorpin' option to separate set of isloated physical CPU's. * virsh edit 'instance' should show 'emulatorpin' option value(host CPU) different then 'vcpupin' option value. Additional info:
A spec has been proposed for Nova by Daniel Berrangé which should to address this RFE. https://review.openstack.org/#/c/225893/
Spec was not approved for Mitaka, will need to be re-proposed for Newton.
Besides the new option requirement, the 'cpuset' of 'emulatorpin' option is using the id of vcpu which is incorrect. The id of host cpu should be used here. e.g. <cputune> <vcpupin vcpu='0' cpuset='2'/> <vcpupin vcpu='1' cpuset='3'/> <emulatorpin cpuset='0'/> ### the cpuset should be '2' here, when cpu_realtime_mask=^0. <vcpusched vcpus='1' scheduler='fifo' priority='1'/> </cputune>
(In reply to Peng Liu from comment #5) > Besides the new option requirement, the 'cpuset' of 'emulatorpin' option is > using the id of vcpu which is incorrect. The id of host cpu should be used > here. > > e.g. > <cputune> > <vcpupin vcpu='0' cpuset='2'/> > <vcpupin vcpu='1' cpuset='3'/> > <emulatorpin cpuset='0'/> ### the cpuset should be '2' here, > when cpu_realtime_mask=^0. > <vcpusched vcpus='1' scheduler='fifo' priority='1'/> > </cputune> This RFE has not been implemented, so this is not relevant to this bug.
Additional requirement for NFV: emulator thread should be pinned out of the isolcpus set, as when not using an RT kernel, the emulator thread may never be scheduled id part of the isolated CPUs. Today, all CPUs used by Nova are part of isolcpus set NFV do not use RT kernel NFV do not really bother about emulator thread CPU consumption accounting For NFV, pinning the emulator thread on CPU0 would be acceptable. Any other option is on the table as long as the CPU is out of the isolcpus list.
(In reply to Franck Baudin from comment #7) > Additional requirement for NFV: emulator thread should be pinned out of the > isolcpus set, as when not using an RT kernel, the emulator thread may never > be scheduled id part of the isolated CPUs. If not using an RT kernel, then isolcpus should *never* be used. Instead systemd CPUAffinity must be used for non-RT kernels. So this requirement you're stating is not relevant.
(In reply to Daniel Berrange from comment #8) > (In reply to Franck Baudin from comment #7) > > Additional requirement for NFV: emulator thread should be pinned out of the > > isolcpus set, as when not using an RT kernel, the emulator thread may never > > be scheduled id part of the isolated CPUs. > > If not using an RT kernel, then isolcpus should *never* be used. Instead > systemd CPUAffinity must be used for non-RT kernels. So this requirement > you're stating is not relevant. The issue is that isolcpus provides a stronger isolation than CPUAffinity (avoid kernel thread spawning), and that all NFV deployments use isolcpus without RT kernel. I understand that this is technically wrong/irrelevant, but until NFV can get an isolation level compared to isolcpus with CPUAffinity, our NFV users will still use isolcpus and manually re-pin the qemu emulator thread to avoid to fall into https://bugzilla.redhat.com/show_bug.cgi?id=1321653.
(In reply to Franck Baudin from comment #9) > (In reply to Daniel Berrange from comment #8) > > (In reply to Franck Baudin from comment #7) > > > Additional requirement for NFV: emulator thread should be pinned out of the > > > isolcpus set, as when not using an RT kernel, the emulator thread may never > > > be scheduled id part of the isolated CPUs. > > > > If not using an RT kernel, then isolcpus should *never* be used. Instead > > systemd CPUAffinity must be used for non-RT kernels. So this requirement > > you're stating is not relevant. > > The issue is that isolcpus provides a stronger isolation than CPUAffinity > (avoid kernel thread spawning), and that all NFV deployments use isolcpus > without RT kernel. Yes, because they followed our original advice on this which has now proven to be incorrect. We're now updating that advice, the correct approach going forward for the non-RT cases is to use CPUAffinity. Documentation and blog updates to cover this are pending. > I understand that this is technically wrong/irrelevant, > but until NFV can get an isolation level compared to isolcpus with > CPUAffinity, our NFV users will still use isolcpus and manually re-pin the > qemu emulator thread to avoid to fall into > https://bugzilla.redhat.com/show_bug.cgi?id=1321653. This isn't a scalable approach to operating a cloud, the agreed and recommended approach to avoiding Bug # 1321653 (and discussed in the comments thereof) is to use CPUAffinity for the foreseeable future unless you have an RFE open against RHEL requesting an isolation level equivalent to isolcpus for non-RT workloads?
Perf team is exploring how to get an equivalent isolation with CPUAffinity, see https://bugzilla.redhat.com/show_bug.cgi?id=1394932 Until they find the complete recipe, I'm afraid that customer will use isolcpus. And this is very fine with the proposed spec, as explained by Sahid below: In the spec [1], emulator threads are going to be pinned on an *additional* and isolated* pCPU. In other word, for a guest, the pCPU used to run emulator threads is not in the set of the pCPUs used to run vCPUs. [1] https://review.openstack.org/#/c/284094/10/specs/ocata/approved/libvirt-emulator-threads-policy.rst We "just" need to have the option at TripleO level so TripleO can pin the emulator threads out of the isolated CPUs,enforced by CPUAffinity in vanilla RHOSP10, that could be re-enforced with isolcpus by a post deployment script (not elegant I concede).
Vijay, can you provide a TripleO yaml exanple file showing how CPUs are partitioned between nova, OVS-DPDK, and Linux/OpenStack [1]? The emulator thread will be pinned in [1]. And [1] correspond to CPUAffinity
(In reply to Franck Baudin from comment #11) > Perf team is exploring how to get an equivalent isolation with CPUAffinity, > see https://bugzilla.redhat.com/show_bug.cgi?id=1394932 > > Until they find the complete recipe, I'm afraid that customer will use > isolcpus. Yes, especially if we ourselves persist with intentionally giving them bad advice! > And this is very fine with the proposed spec, as explained by > Sahid below: > > In the spec [1], emulator threads are going to be pinned on an > *additional* and isolated* pCPU. In other word, for a guest, the pCPU > used to run emulator threads is not in the set of the pCPUs used to > run vCPUs. Can you elaborate on the new requirement then? It seems like you are saying there isn't one? Note that what you describe here will only occur for guests that explicitly opt into the new behavior. > We "just" need to have the option at TripleO level so TripleO can pin the > emulator threads out of the isolated CPUs,enforced by CPUAffinity in vanilla > RHOSP10, that could be re-enforced with isolcpus by a post deployment script > (not elegant I concede). Not sure what you mean here, in none of the scenarios we've discussed would TripleO be responsible for pinning emulator threads - only for setting the right combination of cores (using CPUAffinity - or isolcpus in the RT case - and vcpu_pin_set) as available/unavailable for VM placement to guide the process actually doing the pinning as guests are created.
Example of RHOSP10 TripleO parameter: NovaReservedHostMemory: 4096 NovaVcpuPinSet: ['8-17','20-71'] NovaReservedHostMemory Amount of memory to be reserved the host NovaVcpuPinset Configure the list of CPUs which should be used by nova to run the guest VMs. Note that this should exclude NeutronDpdkCoreList and the list of CPUs dedicated for host. For RHOSP11, we need a new parameter in TripleO: NovaEmulatorPinSet: ['0-1','55-56']
(In reply to Franck Baudin from comment #14) > Example of RHOSP10 TripleO parameter: > > NovaReservedHostMemory: 4096 > NovaVcpuPinSet: ['8-17','20-71'] > > NovaReservedHostMemory > Amount of memory to be reserved the host > > NovaVcpuPinset > Configure the list of CPUs which should be used by nova to run the guest > VMs. Note that this should exclude NeutronDpdkCoreList and the list of CPUs > dedicated for host. > > For RHOSP11, we need a new parameter in TripleO: > NovaEmulatorPinSet: ['0-1','55-56'] Hrm, I actually really don't think we should be doing it this way - we should be only parameterizing how *many* cores to reserver for each purpose not exactly which ones. How do you intend the above to work across clusters that support differently sized compute node hardware (which we currently support)?
(In reply to Franck Baudin from comment #14) > For RHOSP11, we need a new parameter in TripleO: > NovaEmulatorPinSet: ['0-1','55-56'] From the approved Nova specification: """ A user which expresses the desire to isolate emulator threads must use a flavor configured to accept that specification as: * hw:cpu_emulator_threads=isolate Would say that this instance is to be considered to consume 1 additional host CPU. That pCPU used to make running emulator threads is going to always be configured on the related guest NUMA node ID 0, to make it predictable for users. Currently there is no desire to make customizable the number of host CPUs running emulator threads since only one should work for almost every use case. If in the future there is a desire to isolate more than one host CPU to run emulator threads, we would implement instead I/O threads to add granularity on dedicating used resources to run guests on host CPUs. """ I don't see how this gels with the settings you are suggesting at the TripleO level because Nova is going to choose where it places the emulator threads on node 0 - there is no parameterization of this and it will only ever use exactly 1 core so I don't see how or where we would use two ranges of CPUs with this feature in 11, esp. when one appears to be on a different node.
Got it, this doesn't requires a TripleO parameter, as it doesn't requires a nova.conf parameter, this is purely triggered by the flavor keys. Thanks Steve!
(In reply to Franck Baudin from comment #17) > Got it, this doesn't requires a TripleO parameter, as it doesn't requires a > nova.conf parameter, this is purely triggered by the flavor keys. Thanks > Steve! A clarification Q for you Sahid, I am assuming that *if* vcpu_pin_set is in use the pCPU for the emulator threads is also being allocated to a core from *within* that range, is that correct?
(In reply to Stephen Gordon from comment #18) > (In reply to Franck Baudin from comment #17) > > Got it, this doesn't requires a TripleO parameter, as it doesn't requires a > > nova.conf parameter, this is purely triggered by the flavor keys. Thanks > > Steve! > > A clarification Q for you Sahid, I am assuming that *if* vcpu_pin_set is in > use the pCPU for the emulator threads is also being allocated to a core from > *within* that range, is that correct? Yes, correct.
(In reply to Sahid Ferdjaoui from comment #19) > (In reply to Stephen Gordon from comment #18) > > (In reply to Franck Baudin from comment #17) > > > Got it, this doesn't requires a TripleO parameter, as it doesn't requires a > > > nova.conf parameter, this is purely triggered by the flavor keys. Thanks > > > Steve! > > > > A clarification Q for you Sahid, I am assuming that *if* vcpu_pin_set is in > > use the pCPU for the emulator threads is also being allocated to a core from > > *within* that range, is that correct? > > Yes, correct. Right makes sense, I think we are good then and don't need an additional TripleO setting for this specific specification.
The patch series is well-developed, but still awaiting review traction.
Sahid had requested a feature freeze exception (FFE) for this work [1] but as somewhat expected the Nova PTL has indicated [2] that only items on the upstream Nova priorities list will be FFE candidates for Ocata. Based on this I am moving this out to Pike, we'll need to re-submit the specification and keep the code series up to date so that we can attempt to land it early in the Pike cycle. [1] http://lists.openstack.org/pipermail/openstack-dev/2017-January/111057.html [2] http://lists.openstack.org/pipermail/openstack-dev/2017-January/111084.html [3] https://specs.openstack.org/openstack/nova-specs/priorities/ocata-priorities.html
Spec re-proposed for Pike: https://review.openstack.org/#/c/427066/
*** Bug 1446372 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462