1459543 – Virt NUMA placements on NUMA1 using flavor extra specs - places Virt guest on NUMA0

Bug 1459543 - Virt NUMA placements on NUMA1 using flavor extra specs - places Virt guest on NUMA0

Summary: Virt NUMA placements on NUMA1 using flavor extra specs - places Virt guest on...

Keywords:
Status:	CLOSED DUPLICATE of bug 1187945
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	11.0 (Ocata)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Stephen Finucane
QA Contact:	Joe H. Rahme
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-07 12:20 UTC by Eyal Dannon
Modified:	2019-09-09 16:46 UTC (History)
CC List:	23 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-07-12 13:59:41 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Compute sosreport (10.91 MB, application/x-xz) 2017-06-14 14:42 UTC, Eyal Dannon	no flags	Details
View All

Description Eyal Dannon 2017-06-07 12:20:42 UTC

Description of problem:

Hi,

I'm trying to measure cross NUMA performance of OVS 2.6 in DPDK environment,
Which means combinations of NIC binding from one NUMA node and vCPUs from another one.

On my existing environment I got 2 NUMA nodes available;
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31

I'm trying to boot an instance from NUMA node 1 by setting the following:

- In /etc/nova/nova.conf 
vcpu_pin_set=1,2,3,4,5,8,9,10,17,18,19,20,21,24,25,26
 * NUMA 1 - 8,9,10,24,25,26

- Extra specs in my flavor:
 extra_specs                | {"hw:cpu_policy": "dedicated", "hw:mem_page_size": "1GB", "hw:numa_nodes": "1", "hw:numa_mempolicy": "preferred", "hw:numa_cpus.1": "0,1,2,3", "hw:numa_mem.1": "4096"} |

But, looking at the vCPUs attached to my instance, gave me the following:
<vcpupin vcpu='0' cpuset='2'/>
<vcpupin vcpu='1' cpuset='18'/>
<vcpupin vcpu='2' cpuset='1'/>
<vcpupin vcpu='3' cpuset='17'/>

From my understanding, currently, we can't choose from which NUMA node the guest will get his lcores.
Is there any way to implement the require configuration instead of modification of /etc/nova/nova.conf?

Version-Release number of selected component (if applicable):

How reproducible:
Always

Steps to Reproduce:
1.Set vcpu_pin in nova.conf as mentioned above
2.boot an instance with the given extra specs
3.display attached vcpus

Actual results:
vCPUs selected from NUMA0

Expected results:
Should get from NUMA1

Additional info:

Comment 2 Stephen Gordon 2017-06-08 12:59:39 UTC

(In reply to Eyal Dannon from comment #0)
> Description of problem:
> 
> Hi,
> 
> I'm trying to measure cross NUMA performance of OVS 2.6 in DPDK environment,
> Which means combinations of NIC binding from one NUMA node and vCPUs from
> another one.
> 
> On my existing environment I got 2 NUMA nodes available;
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
> node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
> 
> I'm trying to boot an instance from NUMA node 1 by setting the following:
> 
> - In /etc/nova/nova.conf 
> vcpu_pin_set=1,2,3,4,5,8,9,10,17,18,19,20,21,24,25,26
>  * NUMA 1 - 8,9,10,24,25,26
> 
> - Extra specs in my flavor:
>  extra_specs                | {"hw:cpu_policy": "dedicated",
> "hw:mem_page_size": "1GB", "hw:numa_nodes": "1", "hw:numa_mempolicy":
> "preferred", "hw:numa_cpus.1": "0,1,2,3", "hw:numa_mem.1": "4096"} |
> 
> But, looking at the vCPUs attached to my instance, gave me the following:
> <vcpupin vcpu='0' cpuset='2'/>
> <vcpupin vcpu='1' cpuset='18'/>
> <vcpupin vcpu='2' cpuset='1'/>
> <vcpupin vcpu='3' cpuset='17'/>
> 
> From my understanding, currently, we can't choose from which NUMA node the
> guest will get his cores.

Correct, selecting optimal guest placement is Nova's job. Can you elaborate on why Nova should consider placing the guest vCPUs on one host NUMA node and the NIC binding on another optimal placement? Most requests I have seen are for us to prefer the exact opposite.

> Is there any way to implement the require configuration instead of
> modification of /etc/nova/nova.conf?

Can you provide the rationale for why we would permit the user to explicitly pin their workload to a specific core or node when as a cloud user they should not have to know about the topology of the host in the first place?

Comment 3 Eyal Dannon 2017-06-11 11:11:09 UTC

(In reply to Stephen Gordon from comment #2)
> (In reply to Eyal Dannon from comment #0)
> > Description of problem:
> > 
> > Hi,
> > 
> > I'm trying to measure cross NUMA performance of OVS 2.6 in DPDK environment,
> > Which means combinations of NIC binding from one NUMA node and vCPUs from
> > another one.
> > 
> > On my existing environment I got 2 NUMA nodes available;
> > available: 2 nodes (0-1)
> > node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
> > node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
> > 
> > I'm trying to boot an instance from NUMA node 1 by setting the following:
> > 
> > - In /etc/nova/nova.conf 
> > vcpu_pin_set=1,2,3,4,5,8,9,10,17,18,19,20,21,24,25,26
> >  * NUMA 1 - 8,9,10,24,25,26
> > 
> > - Extra specs in my flavor:
> >  extra_specs                | {"hw:cpu_policy": "dedicated",
> > "hw:mem_page_size": "1GB", "hw:numa_nodes": "1", "hw:numa_mempolicy":
> > "preferred", "hw:numa_cpus.1": "0,1,2,3", "hw:numa_mem.1": "4096"} |
> > 
> > But, looking at the vCPUs attached to my instance, gave me the following:
> > <vcpupin vcpu='0' cpuset='2'/>
> > <vcpupin vcpu='1' cpuset='18'/>
> > <vcpupin vcpu='2' cpuset='1'/>
> > <vcpupin vcpu='3' cpuset='17'/>
> > 
> > From my understanding, currently, we can't choose from which NUMA node the
> > guest will get his cores.
> 
> Correct, selecting optimal guest placement is Nova's job. Can you elaborate
> on why Nova should consider placing the guest vCPUs on one host NUMA node
> and the NIC binding on another optimal placement? Most requests I have seen
> are for us to prefer the exact opposite.

Sure, I don't necessarily want to bind instances CPU's to the "wrong" NUMA node, I just need to take them only from the secondary NUMA node.
As part of our performance measurement we need to place the VNF to specific NUMA node, please refer to:
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/11/html/network_functions_virtualization_planning_and_prerequisites_guide/ch-hardware-requirements#hardware_partitioning_for_a_nfv_ovs_dpdk_deployment

As you could see, VNF3 totally placed on NUMA node1. 

> 
> > Is there any way to implement the require configuration instead of
> > modification of /etc/nova/nova.conf?
> 
> Can you provide the rationale for why we would permit the user to explicitly
> pin their workload to a specific core or node when as a cloud user they
> should not have to know about the topology of the host in the first place?

Let's assume I'm working on a compute node with 2 NICs, each of them placed on different NUMA node.
I set 8 CPUs as vcpu_pin_set at nova.conf file, 4 from each NUMA node.
Now, I would like to boot 2 instances - each of them entirely placed on different NUMA node, because only this way I could achieve zero packet loss.


Beyond that, I've tried to do the following[This time with "hw:numa_nodes=2"]:

- same cores as before
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31

- flavor keys: hw:cpu_policy=dedicated hw:mem_page_size=1GB hw:numa_nodes=2 hw:numa_mempolicy=preferred hw:numa_cpus.1=0,1,2,3 hw:numa_mem.1=4096

- nova.conf vcpu pin set
vcpu_pin_set=1,2,3,4,5,8,9,10,17,18,19,20,21,24,25,26


- Instances got attached with 4 vCPUs, 2 from each NUMA node.
[root@compute-0 ~]# virsh dumpxml instance-00000036 | grep vcpu
        <nova:vcpus>4</nova:vcpus>
  <vcpu placement='static'>4</vcpu>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='18'/>
    <vcpupin vcpu='2' cpuset='8'/>
    <vcpupin vcpu='3' cpuset='24'/>

Thank you for your response.

Comment 4 Yariv 2017-06-13 07:32:21 UTC

See response to https://bugzilla.redhat.com/show_bug.cgi?id=1459543#c3

Comment 5 Stephen Finucane 2017-06-13 09:24:26 UTC

(In reply to Eyal Dannon from comment #3)
> (In reply to Stephen Gordon from comment #2)
> > (In reply to Eyal Dannon from comment #0)
> > > Description of problem:
> > > 
> > > Hi,
> > > 
> > > I'm trying to measure cross NUMA performance of OVS 2.6 in DPDK environment,
> > > Which means combinations of NIC binding from one NUMA node and vCPUs from
> > > another one.
> > > 
> > > On my existing environment I got 2 NUMA nodes available;
> > > available: 2 nodes (0-1)
> > > node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
> > > node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
> > > 
> > > I'm trying to boot an instance from NUMA node 1 by setting the following:
> > > 
> > > - In /etc/nova/nova.conf 
> > > vcpu_pin_set=1,2,3,4,5,8,9,10,17,18,19,20,21,24,25,26
> > >  * NUMA 1 - 8,9,10,24,25,26
> > > 
> > > - Extra specs in my flavor:
> > >  extra_specs                | {"hw:cpu_policy": "dedicated",
> > > "hw:mem_page_size": "1GB", "hw:numa_nodes": "1", "hw:numa_mempolicy":
> > > "preferred", "hw:numa_cpus.1": "0,1,2,3", "hw:numa_mem.1": "4096"} |
> > > 
> > > But, looking at the vCPUs attached to my instance, gave me the following:
> > > <vcpupin vcpu='0' cpuset='2'/>
> > > <vcpupin vcpu='1' cpuset='18'/>
> > > <vcpupin vcpu='2' cpuset='1'/>
> > > <vcpupin vcpu='3' cpuset='17'/>
> > > 
> > > From my understanding, currently, we can't choose from which NUMA node the
> > > guest will get his cores.
> > 
> > Correct, selecting optimal guest placement is Nova's job. Can you elaborate
> > on why Nova should consider placing the guest vCPUs on one host NUMA node
> > and the NIC binding on another optimal placement? Most requests I have seen
> > are for us to prefer the exact opposite.
> 
> Sure, I don't necessarily want to bind instances CPU's to the "wrong" NUMA
> node, I just need to take them only from the secondary NUMA node.
> As part of our performance measurement we need to place the VNF to specific
> NUMA node, please refer to:
> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/11/
> html/network_functions_virtualization_planning_and_prerequisites_guide/ch-
> hardware-requirements#hardware_partitioning_for_a_nfv_ovs_dpdk_deployment
> 
> As you could see, VNF3 totally placed on NUMA node1. 
> 
> > 
> > > Is there any way to implement the require configuration instead of
> > > modification of /etc/nova/nova.conf?
> > 
> > Can you provide the rationale for why we would permit the user to explicitly
> > pin their workload to a specific core or node when as a cloud user they
> > should not have to know about the topology of the host in the first place?
> 
> Let's assume I'm working on a compute node with 2 NICs, each of them placed
> on different NUMA node.

When you say NICs, do you mean physical NICs or virtual NICs provided by OVS? If the former, are these two virtual functions from a shared SR-IOV NICs, or are they two discrete NICs shared via full PCI passthrough? I ask because Nova will automatically tie a guest to the NUMA core associated with a PCI device.

Further comments below.

> I set 8 CPUs as vcpu_pin_set at nova.conf file, 4 from each NUMA node.
> Now, I would like to boot 2 instances - each of them entirely placed on
> different NUMA node, because only this way I could achieve zero packet loss.

And this works, right? You won't be able to control which NUMA node they go onto but the two instances will go onto separate nodes.

> Beyond that, I've tried to do the following[This time with
> "hw:numa_nodes=2"]:

What you're saying here is that the guest topology should be split into two nodes, and this will generally result in the the instance being split over two host nodes. I don't think this is what you want, based on the above.

> - same cores as before
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
> node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
> 
> - flavor keys: hw:cpu_policy=dedicated hw:mem_page_size=1GB hw:numa_nodes=2
> hw:numa_mempolicy=preferred hw:numa_cpus.1=0,1,2,3 hw:numa_mem.1=4096

For future reference, 'hw:numa_mempolicy' doesn't do anything.

> 
> - nova.conf vcpu pin set
> vcpu_pin_set=1,2,3,4,5,8,9,10,17,18,19,20,21,24,25,26
> 
> 
> - Instances got attached with 4 vCPUs, 2 from each NUMA node.
> [root@compute-0 ~]# virsh dumpxml instance-00000036 | grep vcpu
>         <nova:vcpus>4</nova:vcpus>
>   <vcpu placement='static'>4</vcpu>
>     <vcpupin vcpu='0' cpuset='2'/>
>     <vcpupin vcpu='1' cpuset='18'/>
>     <vcpupin vcpu='2' cpuset='8'/>
>     <vcpupin vcpu='3' cpuset='24'/>
> 
> Thank you for your response.

As above, this is totally expected as it's what you asked for (two guest NUMA nodes, ostensibly split over two host NUMA nodes).

Comment 6 Eyal Dannon 2017-06-14 09:35:00 UTC

(In reply to Stephen Finucane from comment #5)
> (In reply to Eyal Dannon from comment #3)
> > (In reply to Stephen Gordon from comment #2)
> > > (In reply to Eyal Dannon from comment #0)
> > > > Description of problem:
> > > > 
> > > > Hi,
> > > > 
> > > > I'm trying to measure cross NUMA performance of OVS 2.6 in DPDK environment,
> > > > Which means combinations of NIC binding from one NUMA node and vCPUs from
> > > > another one.
> > > > 
> > > > On my existing environment I got 2 NUMA nodes available;
> > > > available: 2 nodes (0-1)
> > > > node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
> > > > node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
> > > > 
> > > > I'm trying to boot an instance from NUMA node 1 by setting the following:
> > > > 
> > > > - In /etc/nova/nova.conf 
> > > > vcpu_pin_set=1,2,3,4,5,8,9,10,17,18,19,20,21,24,25,26
> > > >  * NUMA 1 - 8,9,10,24,25,26
> > > > 
> > > > - Extra specs in my flavor:
> > > >  extra_specs                | {"hw:cpu_policy": "dedicated",
> > > > "hw:mem_page_size": "1GB", "hw:numa_nodes": "1", "hw:numa_mempolicy":
> > > > "preferred", "hw:numa_cpus.1": "0,1,2,3", "hw:numa_mem.1": "4096"} |
> > > > 
> > > > But, looking at the vCPUs attached to my instance, gave me the following:
> > > > <vcpupin vcpu='0' cpuset='2'/>
> > > > <vcpupin vcpu='1' cpuset='18'/>
> > > > <vcpupin vcpu='2' cpuset='1'/>
> > > > <vcpupin vcpu='3' cpuset='17'/>
> > > > 
> > > > From my understanding, currently, we can't choose from which NUMA node the
> > > > guest will get his cores.
> > > 
> > > Correct, selecting optimal guest placement is Nova's job. Can you elaborate
> > > on why Nova should consider placing the guest vCPUs on one host NUMA node
> > > and the NIC binding on another optimal placement? Most requests I have seen
> > > are for us to prefer the exact opposite.
> > 
> > Sure, I don't necessarily want to bind instances CPU's to the "wrong" NUMA
> > node, I just need to take them only from the secondary NUMA node.
> > As part of our performance measurement we need to place the VNF to specific
> > NUMA node, please refer to:
> > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/11/
> > html/network_functions_virtualization_planning_and_prerequisites_guide/ch-
> > hardware-requirements#hardware_partitioning_for_a_nfv_ovs_dpdk_deployment
> > 
> > As you could see, VNF3 totally placed on NUMA node1. 
> > 
> > > 
> > > > Is there any way to implement the require configuration instead of
> > > > modification of /etc/nova/nova.conf?
> > > 
> > > Can you provide the rationale for why we would permit the user to explicitly
> > > pin their workload to a specific core or node when as a cloud user they
> > > should not have to know about the topology of the host in the first place?
> > 
> > Let's assume I'm working on a compute node with 2 NICs, each of them placed
> > on different NUMA node.
> 
> When you say NICs, do you mean physical NICs or virtual NICs provided by
> OVS? If the former, are these two virtual functions from a shared SR-IOV
> NICs, or are they two discrete NICs shared via full PCI passthrough? I ask
> because Nova will automatically tie a guest to the NUMA core associated with
> a PCI device.

When I mention NICs I mean physical NICs, those are 2 NICs each of them places of different NUMA node.
I wish the vCPUs of the instance will automatically be located where the physical NIC is.
> 
> Further comments below.
> 
> > I set 8 CPUs as vcpu_pin_set at nova.conf file, 4 from each NUMA node.
> > Now, I would like to boot 2 instances - each of them entirely placed on
> > different NUMA node, because only this way I could achieve zero packet loss.
> 
> And this works, right? You won't be able to control which NUMA node they go
> onto but the two instances will go onto separate nodes.
No, I could not achieve instance which if fully located on NUMA node 1 when vcpu_pin_set contains cores from both NUMA nodes.
> 
> > Beyond that, I've tried to do the following[This time with
> > "hw:numa_nodes=2"]:
> 
> What you're saying here is that the guest topology should be split into two
> nodes, and this will generally result in the the instance being split over
> two host nodes. I don't think this is what you want, based on the above.
OK, This is the second try, please refer to comment one and see the result when using hw:numa_nodes=1
> 
> > - same cores as before
> > available: 2 nodes (0-1)
> > node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
> > node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
> > 
> > - flavor keys: hw:cpu_policy=dedicated hw:mem_page_size=1GB hw:numa_nodes=2
> > hw:numa_mempolicy=preferred hw:numa_cpus.1=0,1,2,3 hw:numa_mem.1=4096
> 
> For future reference, 'hw:numa_mempolicy' doesn't do anything.
Thanks!
> 
> > 
> > - nova.conf vcpu pin set
> > vcpu_pin_set=1,2,3,4,5,8,9,10,17,18,19,20,21,24,25,26
> > 
> > 
> > - Instances got attached with 4 vCPUs, 2 from each NUMA node.
> > [root@compute-0 ~]# virsh dumpxml instance-00000036 | grep vcpu
> >         <nova:vcpus>4</nova:vcpus>
> >   <vcpu placement='static'>4</vcpu>
> >     <vcpupin vcpu='0' cpuset='2'/>
> >     <vcpupin vcpu='1' cpuset='18'/>
> >     <vcpupin vcpu='2' cpuset='8'/>
> >     <vcpupin vcpu='3' cpuset='24'/>
> > 
> > Thank you for your response.
> 
> As above, this is totally expected as it's what you asked for (two guest
> NUMA nodes, ostensibly split over two host NUMA nodes).
OK, Please refer to comment 1 when I tried to set:
> > > >  extra_specs                | {"hw:cpu_policy": "dedicated",
> > > > "hw:mem_page_size": "1GB", "hw:numa_nodes": "1", "hw:numa_mempolicy":
> > > > "preferred", "hw:numa_cpus.1": "0,1,2,3", "hw:numa_mem.1": "4096"} |

And got vCPUs from numa 0.

Thank you for your time.

Comment 7 Stephen Finucane 2017-06-14 12:56:39 UTC

(In reply to Eyal Dannon from comment #6)
> (In reply to Stephen Finucane from comment #5)
> > (In reply to Eyal Dannon from comment #3)

[snip]

> > > Let's assume I'm working on a compute node with 2 NICs, each of them placed
> > > on different NUMA node.
> > 
> > When you say NICs, do you mean physical NICs or virtual NICs provided by
> > OVS? If the former, are these two virtual functions from a shared SR-IOV
> > NICs, or are they two discrete NICs shared via full PCI passthrough? I ask
> > because Nova will automatically tie a guest to the NUMA core associated with
> > a PCI device.
> 
> When I mention NICs I mean physical NICs, those are 2 NICs each of them
> places of different NUMA node.
> I wish the vCPUs of the instance will automatically be located where the
> physical NIC is.

OK, let's recollect what we know. You have the following topology:

  node 0 cpus:  0  1  2  3  4  5  6  7 16 17 18 19 20 21 22 23
  node 1 cpus:  8  9 10 11 12 13 14 15 24 25 26 27 28 29 30 31

You also have the following PCI devices:

  node 0 pcis:  xxx
  node 1 pcis:  yyy

where xxx and yyy are two identical physical NICs.

You want to attach these two NICs directly to two instances via PCI passthrough. You expect the instance with PCI xxx to be affined to node 0 and the instance with PCI yyy to be affined to node 1.

Is this all correct? If so, it should be happening automatically. nova will automatically affine vCPUs to a given NUMA cell if the PCI device is affined to that cell. There is, in fact, an RFC [1] to soften this requirement but for now it's a very hard requirement.

Could you provide an exact copy of the commands you are running, the topology of the machine (hwloc is a good tool for this), package versions for nova and libvirt, and a copy of all logs from sosreport. I suspect a misconfiguration, but I'd like to see what's going on here in case it's a bug.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1446311

Comment 8 Eyal Dannon 2017-06-14 14:41:01 UTC

I hope you are right,
let's focus on node 1, we would like to boot an instance using vCPUs from node1 and PCI from node1.

Packages and versions: 
[root@compute-0 ~]# rpm -qa | grep openstack-nova-scheduler; rpm -qa | grep libvirt-client
openstack-nova-scheduler-15.0.3-3.el7ost.noarch
libvirt-client-2.0.0-10.el7_3.5.x86_64

Flavor extra specs:
[root@controller-0 ~]# openstack flavor show m1.medium
+----------------------------+--------------------------------------------------------------------------+
| Field                      | Value                                                                    |
+----------------------------+--------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                                                    |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                        |
| access_project_ids         | None                                                                     |
| disk                       | 20                                                                       |
| id                         | 96034ba6-32d7-49ef-a06a-d201b6c1b426                                     |
| name                       | m1.medium                                                                |
| os-flavor-access:is_public | True                                                                     |
| properties                 | hw:cpu_policy='dedicated', hw:mem_page_size='1GB',                       |
|                            | hw:numa_cpus.1='0,1,2,3', hw:numa_mem.1='4096',                          |
|                            | hw:numa_mempolicy='preferred', hw:numa_nodes='1' 
| ram                        | 4096                                                                     |


nova.conf:
vcpu_pin_set=1,2,3,4,5,8,9,10,17,18,19,20,21,24,25,26

As we already know:
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31

Assuming this is our configuration, we're hoping to get an instance with 4 vCPUs from this range: 8,9,10,24,25,26

The NIC we're going to assign:
[root@compute-0 ~]# ovs-appctl dpif-netdev/pmd-rxq-show | grep "numa_id 1" -A 2
pmd thread numa_id 1 core_id 9:
	isolated : false
	port: dpdk2	queue-id: 0

Let's make sure this is the interface we're using:
    Bridge "br-link0"
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port "phy-br-link0"
            Interface "phy-br-link0"
                type: patch
                options: {peer="int-br-link0"}
        Port "dpdk2"
            Interface "dpdk2"
                type: dpdk

The bridge connected to dpdk_mgmt physnet:

/etc/neutron/plugins/ml2/openvswitch_agent.ini:bridge_mappings =dpdk_mgmt:br-link0,dpdk_data1:br-link1,dpdk_data2:br-link2

Which connected to the following network:
[root@controller-0 ~]# openstack network show b2924725-d739-4010-ba0e-79d85eeace74
+---------------------------+--------------------------------------+
| Field                     | Value                                |
+---------------------------+--------------------------------------+
| admin_state_up            | UP                                   |
| availability_zone_hints   |                                      |
| availability_zones        | nova                                 |
| created_at                | 2017-05-29T09:13:18Z                 |
| description               |                                      |
| dns_domain                | None                                 |
| id                        | b2924725-d739-4010-ba0e-79d85eeace74 |
| ipv4_address_scope        | None                                 |
| ipv6_address_scope        | None                                 |
| is_default                | False                                |
| mtu                       | 1500                                 |
| name                      | external                             |
| port_security_enabled     | True                                 |
| project_id                | 23fadcf4c4804f39a7cbfd32349aff22     |
| provider:network_type     | vlan                                 |
| provider:physical_network | dpdk_mgmt                            |
| provider:segmentation_id  | 396      

My boot command:
[root@controller-0 ~]# nova boot --image rhel --flavor m1.medium --nic net-id=b2924725-d739-4010-ba0e-79d85eeace74 test



[root@compute-0 ~]# virsh dumpxml instance-00000038 | grep vcpu
        <nova:vcpus>4</nova:vcpus>
  <vcpu placement='static'>4</vcpu>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='18'/>
    <vcpupin vcpu='2' cpuset='1'/>
    <vcpupin vcpu='3' cpuset='17'/>

vCPUs 1,2,17,18 are from node 0..

I'm attaching the sosreport so you could take a deeper look, 
I'll be glad to give you access to the servers, PM me in this case.
Let me know if any further information is needed.

Thanks again

Comment 9 Eyal Dannon 2017-06-14 14:42:29 UTC

Created attachment 1287682 [details]
Compute sosreport

Comment 10 Stephen Finucane 2017-07-03 15:01:15 UTC

Sorry for the delay in responding to this. I took a look through the sosreport and noted that you don't have 'NUMATopologyFilter' enabled. You need to do this for any NUMA-related functionality (including basic CPU pinning) to function correctly [1]. Have you done this?

One other comment below.

> +----------------------------+---------------------------------------------------
> | Field                      | Value
> +----------------------------+---------------------------------------------------
> | OS-FLV-DISABLED:disabled   | False
> | OS-FLV-EXT-DATA:ephemeral  | 0
> | access_project_ids         | None
> | disk                       | 20
> | id                         | 96034ba6-32d7-49ef-a06a-d201b6c1b426
> | name                       | m1.medium
> | os-flavor-access:is_public | True
> | properties                 | hw:cpu_policy='dedicated', hw:mem_page_size='1GB',
> |                            | hw:numa_cpus.1='0,1,2,3', hw:numa_mem.1='4096',

For reference, the '1' in 'hw:numa_cpus.1' and 'hw:numa_mem.1' does not refer to host node - it refers to the guest node. These parameters in general should only be provided if you want asynchronous placement across NUMA nodes. Given that you've only one NUMA node here (no 'hw:numa_nodes=1', this is totally unnecessary.

> |                            | hw:numa_mempolicy='preferred', hw:numa_nodes='1'
> | ram                        | 4096


[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/11/html/instances_and_images_guide/ch-cpu_pinning#scheduler_configuration

Comment 11 Eyal Dannon 2017-07-11 14:28:47 UTC

The sosreport I provided belong to the compute node, 
the scheduler_default_filters defined at the controller.
I'll attach the sosreport of the controller also.

Meanwhile, the values of this parameter are:
scheduler_default_filters=RamFilter,ComputeFilter,AvailabilityZoneFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,PciPassthroughFilter,NUMATopologyFilter

Which means NUMATopologyFilter does enabled.

Regarding hw:numa_cpus.1; you are correct, it can be removed.
But either way I face the same issue.

Is there any other steps I can do to achieve the wished configuration?
Thanks.

Comment 13 Stephen Finucane 2017-07-12 13:59:41 UTC

(In reply to Eyal Dannon from comment #11)
> The sosreport I provided belong to the compute node, 
> the scheduler_default_filters defined at the controller.
> I'll attach the sosreport of the controller also.
> 
> Meanwhile, the values of this parameter are:
> scheduler_default_filters=RamFilter,ComputeFilter,AvailabilityZoneFilter,
> ComputeCapabilitiesFilter,ImagePropertiesFilter,PciPassthroughFilter,
> NUMATopologyFilter
> 
> Which means NUMATopologyFilter does enabled.
> 
> Regarding hw:numa_cpus.1; you are correct, it can be removed.
> But either way I face the same issue.
> 
> Is there any other steps I can do to achieve the wished configuration?
> Thanks.

I'm afraid I was mistaken. I discussed this with other folks and it seems this is a known issue with VIFs provided by neutron (which would encompass SR-IOV devices). While PCI passthrough NICs do have NUMA affinity, neutron-provided VIFs do not.

There's a longstanding RFE open for this particular issue. I'm going to close this issue as a duplicate of that.

*** This bug has been marked as a duplicate of bug 1187945 ***

Note You need to log in before you can comment on or make changes to this bug.

atelang
atheurer
berrange
dasmith
edannon
eglynn
fbaudin
karthik
kchamart
ksundara
mbabushk
oblaut
sbauza
sferdjao
sgordon
skramaja
srevivo
stephenfin
vchundur
vromanso
yrachman
ysubrama
zgreenbe