Bug 1344497

Summary: [RFE] SLIT table in KVM differs from Host SLIT table - nova
Product: Red Hat OpenStack Reporter: Karen Noel <knoel>
Component: openstack-novaAssignee: Eoghan Glynn <eglynn>
Status: CLOSED DEFERRED QA Contact: Prasanth Anbalagan <panbalag>
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: berrange, bgray, dasmith, djdumas, eglynn, imammedo, kchamart, knoel, libvirt-maint, mst, rbalakri, sbauza, sferdjao, sgordon, srao, srevivo, trees, virt-bugs, vromanso
Target Milestone: ---Keywords: FutureFeature, Triaged
Target Release: 13.0 (Queens)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 1344494 Environment:
Last Closed: 2017-06-16 17:43:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1344450, 1344494, 1454889    
Bug Blocks:    

Description Karen Noel 2016-06-09 19:45:22 UTC
+++ This bug was initially created as a clone of Bug #1344494 +++

+++ This bug was initially created as a clone of Bug #1344450 +++

Description of problem:
Partner uses information in the SLIT table to make NUMA placement decisions in their application.  Without accurate information, performance can be impacted.

Passing the correct SLIT table information from the host to the guest seems to make sense as long as the CPUs are pinned in a 1-to-1 fashion between the host and guest.  If the guest consumes all the numa nodes, then the whole table could be passed.  In the case of a guest using only a subset of nodes, the appropriate section (or sections) of the host SLIT table could be used.

The following example is from an 8-socket SGI system (this is not unique to this system however):

Host> numactl -H
…

node   0   1   2   3   4   5   6   7
  0:  10  16  19  16  50  50  50  50
  1:  16  10  16  19  50  50  50  50
  2:  19  16  10  16  50  50  50  50
  3:  16  19  16  10  50  50  50  50
  4:  50  50  50  50  10  16  19  16
  5:  50  50  50  50  16  10  16  19
  6:  50  50  50  50  19  16  10  16
  7:  50  50  50  50  16  19  16  10
 
-------------------------------------------------


Guest> numactl -H
…

node   0   1   2   3   4   5   6   7
  0:  10  20  20  20  20  20  20  20
  1:  20  10  20  20  20  20  20  20
  2:  20  20  10  20  20  20  20  20
  3:  20  20  20  10  20  20  20  20
  4:  20  20  20  20  10  20  20  20
  5:  20  20  20  20  20  10  20  20
  6:  20  20  20  20  20  20  10  20
  7:  20  20  20  20  20  20  20  10


Version-Release number of selected component (if applicable):
Host and guest - RHEL 7.2
qemu-kvm-rhev  10:2.3.0-31

How reproducible:


Steps to Reproduce:
1. Create a guest, pin host and guest CPUs on a 1-to-1 basis
2. Run numactl -H command on host and guest and compare


Actual results:
SLIT tables are different between host and guest

Expected results:
Whole SLIT table or appropriate section of SLIT table in guest and host are the same

Additional info:

--- Additional comment from Karen Noel on 2016-06-09 15:25:14 EDT ---

The NUMA topology is set up by libvirt, while the slit table lives in the guest firmware. 

Should the slit table change after live migration if the destination host is different? I think so, but guests may not expect a dynamic slit table. Application may have to modified to take migration into account.

Comment 2 Stephen Gordon 2016-08-16 21:46:52 UTC
What is expected of OpenStack here? This seems like it is something that needs to be addressed between Libvirt and QEMU?

Comment 3 Karen Noel 2016-08-17 23:08:48 UTC
(In reply to Stephen Gordon from comment #2)
> What is expected of OpenStack here? This seems like it is something that
> needs to be addressed between Libvirt and QEMU?

If nothing needs to change in OpenStack, then the BZ is just for awareness. TestOnly? Documentation only? We can only document KVM features through the layered products. Thanks.

Comment 4 Stephen Gordon 2016-08-18 14:49:28 UTC
Dan do you have any thoughts on this? It seems to me like this will be implicitly be visible in the guest for those who want to use it, but it's probably not something I would call out as an OpenStack feature (I haven't had any asks for this at the OpenStack layer as yet, not to say they aren't out there).

Comment 5 Daniel Berrangé 2016-08-18 15:09:34 UTC
I can certainly see the benefit of exposing the SLIT table info to the guest.

When a guest has NUMA topology enabled we have two choices really

 - If CPU pinning is active, we can tell the truth and expose the real distances from the host SLIT info. After live migration this is potentially wrong though if dest host has different SLIT info

 - We can lie and always expose a fixed set of distances for guest NUMA nodes. We would always want to do this for guests without CPU pinning, and might want todo this for guests with CPU pinning if hosts in the cloud don't have homogeneous SLIT info

So if libvirt provides a way to configure NUMA node distances in the guest XML and uses that to populate SLIT info, Nova cna make use of it.

Comment 6 Stephen Gordon 2016-08-18 19:26:08 UTC
Right, what's not clear to me though from the original bug is whether we would need to explicitly do this or whether the proposal is that Libvirt would be doing it transparently.

Comment 7 Stephen Gordon 2016-08-18 19:29:07 UTC
Moved to rhos-future due to the state of the dependencies, further discussion required for a future OpenStack release.

Comment 8 Daniel Berrangé 2016-08-22 17:41:36 UTC
Libvirt only provides mechanisms, not policy, so there will certainly need to be something in Nova that uses the libvirt mechanisms to implement a policy for Nova

Comment 11 Stephen Gordon 2017-06-16 17:43:55 UTC
While I agree with Dan's assessment above, that the SLIT table that if libvirt provides a way to configure NUMA node distances in the guest XML and uses that to populate SLIT info, Nova can make use of it and that may be useful we don't have a clear OSP customer/partner driver for this right now.

Making DEFERRED, we can revisit and re-open at a later date if need be.