Bug 1468004

Summary: RFE: Dedicate pCPU for emulator thread placement per host rather than per guest.
Product: Red Hat OpenStack Reporter: Stephen Gordon <sgordon>
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED ERRATA QA Contact: Joe H. Rahme <jhakimra>
Severity: high Docs Contact:
Priority: high    
Version: 14.0 (Rocky)CC: atelang, awaugama, berrange, cfields, dasmith, djuran, egallen, eglynn, fbaudin, fherrman, jraju, kchamart, lyarwood, marjones, rlondhe, sbauza, sclewis, sgordon, srevivo, stephenfin, vromanso
Target Milestone: Upstream M2Keywords: FutureFeature, Triaged
Target Release: 14.0 (Rocky)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-nova-18.0.0-0.20180710150340.8469fa7 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
: 1591229 1656068 (view as bug list) Environment:
Last Closed: 2019-01-11 11:47:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1341176, 1591229, 1656068, 1656069    

Description Stephen Gordon 2017-07-05 18:45:26 UTC
Description of problem:

By default when launching a guest where CPU pinning is enabled Nova pins the emulator threads to the set of pCPUs to which the guest's vCPUs are assigned. In real-time use cases this is undesirable as the emulator threads may contend with pCPUs hosting vCPUs that in turn host RT threads in the guest.

In Pike we took an initial cut at this problem by introducing the hw:cpu_emulator_threads=isolate policy. This allowed a guest to consume one additional pCPU on which to place its emulator threads, separate from the pCPUs allocated to vCPUs. This additional pCPU was allocated on a per guest basis (that is to say, each new guest with hw:emulator_threads=isolate got a dedicated pCPU for its emulator threads). A hw:cpu_emulator_threads=share policy was also introduced mimicking the existing behavior.

Feedback from users indicates that while this approach achieves the goal of separating emulator thread placement from RT workload placement it is somewhat wasteful, and that it would be preferable if a single core could be allocated per machine for all guests emulator threads to share rather than dedicating one pCPU per guest for such tasks. This represents an iteration over and above the current design.

See additional info for more context from the mailing list discussion.

Version-Release number of selected component (if applicable):

RHOSP 12 (Pike)

How reproducible:

Every time.

Steps to Reproduce:
1. Create a guest with a flavor that defines hw:emulator_threads=isolate
1. Create a second guest with a flavor that defines hw:emulator_threads=isolate

Actual results:

Each guest's emulator threads are pinned to distinctly separate pCPU cores.

Expected results:

Each guest's emulator threads are pinned to the same pCPU cores.

Additional info:

Pike Specification: https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/libvirt-emulator-threads-policy.html

Upstream List Discussion: http://lists.openstack.org/pipermail/openstack-dev/2017-June/118620.html

Comment 2 Sahid Ferdjaoui 2017-07-06 07:39:41 UTC
I don't think it's something we should do or at least we should wait before to start working on isolating pCPU(s) to run all emulator threads of guests.

There are two usecases: DPDK and RT.

- For Realtime they want the emulator threads to run on additional CPUs. That's because of a potential non-RT vCPU holding a kernel lock which could have negative impact on the RT vCPUs.
- For DPDK that requirement is not necessary since it's running in user space. So they just want a solution to isolate the emulator threads on a specific vCPU.

In the ML my point was to configure a mask which can be set to the host configuration or flavor extra-specs.

On nova.conf the mask will be applied to the set of CPUs dedicated for Nova (vcpu_pin_set) and basically isolate them to run emulator threads of guests only. On flavor extra specs the mask will isolate the emulator threads on specific vCPUs.

But the KVM-RT team is working on resolving the problem for Realtime so at the end it will not be necessary to have additional CPUs to isolate the emulator threads.

So basically the only change we should make is for DPDK usage. We have to provide the same kind of mask we already have for realtime (hw:cpu_realtime_mask) but to only isolate the emulator threads.

  hw:cpu_emulator_threads_mask=^0

So the emulator threads are going to run on the vCPU0.

Comment 3 Stephen Gordon 2017-08-22 15:01:44 UTC
OK Sahid, are you planning to work on this with the upstream folks for Queens?

Comment 4 Sahid Ferdjaoui 2017-08-22 15:17:00 UTC
Yes I already updated the spec and asked Franck to confirm whether that is OK.

  https://review.openstack.org/#/c/486617/1/specs/queens/approved/libvirt-emulator-threads-policy.rst

Comment 5 Sahid Ferdjaoui 2017-09-12 14:15:10 UTC
The patches are waiting for review:

   https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/libvirt-emulator-threads-policy

Stephen, is that OK for you if we rename this BZ to something like ?

  RFE: add mask on the guest vCPUs to place emulator threads of the guest on the host related CPUs

(I'm sure you would find something much better :)

Comment 7 Sahid Ferdjaoui 2017-10-02 10:25:47 UTC
(In reply to Sahid Ferdjaoui from comment #5)
> The patches are waiting for review:
> 
>   
> https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:
> master+topic:bp/libvirt-emulator-threads-policy
> 
> Stephen, is that OK for you if we rename this BZ to something like ?
> 
>   RFE: add mask on the guest vCPUs to place emulator threads of the guest on
> the host related CPUs
> 
> (I'm sure you would find something much better :)

Since [0] is closed as WONTFIX, we do not have other solution than provide a config option to isolate the emulator threads of all guests on a set pCPUs

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1468217

Comment 8 Stephen Gordon 2017-10-10 18:02:55 UTC
(In reply to Sahid Ferdjaoui from comment #5)
> The patches are waiting for review:
> 
>   
> https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:
> master+topic:bp/libvirt-emulator-threads-policy

Hi Sahid,

I notice these changes were recently abandoned. What is the current plan of record based on your discussions with the various other folks familiar with RT?

Thanks,

Steve

Comment 9 Sahid Ferdjaoui 2017-10-11 08:45:52 UTC
(In reply to Stephen Gordon from comment #8)
> (In reply to Sahid Ferdjaoui from comment #5)
> > The patches are waiting for review:
> > 
> >   
> > https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:
> > master+topic:bp/libvirt-emulator-threads-policy
> 
> Hi Sahid,
> 
> I notice these changes were recently abandoned. What is the current plan of
> record based on your discussions with the various other folks familiar with
> RT?
> 
> Thanks,
> 
> Steve

I pushed BP and code for introducing an option 'overhead_pin_set' That one will be used to isolate a set of host CPUs where we should be able to pin the emulator threads on for libvirt driver.

  https://review.openstack.org/#/c/510897/

I need to update the spec already merged to refer that new option but first I have to discussed with community to ensure that they are agree.

Comment 16 Sahid Ferdjaoui 2018-01-22 10:52:00 UTC
Patches sent upstream:

  https://review.openstack.org/#/c/510897/

Comment 17 Sahid Ferdjaoui 2018-01-30 08:56:55 UTC
Spec re-proposed for Rocky:

  https://review.openstack.org/#/c/511188/

Comment 22 Franck Baudin 2018-06-25 14:27:23 UTC
Inputs for NFV documentation: https://access.redhat.com/solutions/3384881

Comment 29 errata-xmlrpc 2019-01-11 11:47:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045