Bug 1306698

Summary: NUMA memory mapping is not generated correctly
Product: [oVirt] ovirt-engine Reporter: Roman Mohr <rmohr>
Component: Backend.CoreAssignee: Andrej Krejcir <akrejcir>
Status: CLOSED CURRENTRELEASE QA Contact: Artyom <alukiano>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.6.2.1CC: akrejcir, alukiano, bugs, dfediuck, lbopf, mavital, mpoledni, rmohr
Target Milestone: ovirt-4.1.0-alphaKeywords: Triaged
Target Release: 4.1.0.2Flags: rule-engine: ovirt-4.1+
rule-engine: planning_ack+
dfediuck: devel_ack+
mavital: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-01 14:49:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Roman Mohr 2016-02-11 15:38:39 UTC
Description of problem:

When pinning virtual NUMA nodes to physical numa node with memory allocation policy 'STRICT', we create the required virtual numa nodes and we pin the virtual numa node CPUs to the correct physical NUMA nodes, but we do not create the correct memory mapping.

We produce something like 

>  <numatune>
>    <memory mode='strict' nodeset='0-1'/>
>  </numatune>

instead of

>  <numatune>
>    <memory mode='strict' nodeset='0-1'/>
>    <memnode cellid="0" mode="strict" nodeset="0"/>
>    <memnode cellid="1" mode="strict" nodeset="1"/>
>  </numatune>

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create a VM with two numa nodes (0, 1)
2. Pin the numa nodes to two different host numa nodes (0,1)
3. Start the VM
4. run virsh -r dumpxml <vmname>

Actual results:

Virtual NUMA node cpus are pinned but memory is not pinned

Expected results:

Both should be pinned

Additional info:

Full example with two virtual NUMA nodes which are pinned with the strict policy to two physical NUMA nodes:

> <domain type='kvm' id='3'>
> [...]
>  <cputune>
>    <shares>1020</shares>
>    <vcpupin vcpu='1' cpuset='1'/>
>    <vcpupin vcpu='0' cpuset='0'/>
>  </cputune>
>  <numatune>
>    <memory mode='strict' nodeset='0-1'/>
>  </numatune>
>  <cpu mode='custom' match='exact'>
>    <model fallback='allow'>SandyBridge</model>
>    <topology sockets='16' cores='1' threads='1'/>
>    <numa>
>      <cell id='0' cpus='0' memory='3072' unit='KiB'/>
>      <cell id='1' cpus='1' memory='3072' unit='KiB'/>
>    </numa>
>  </cpu>
> [...]
> </domain>

Comment 1 Roy Golan 2016-02-17 11:55:05 UTC
(In reply to Roman Mohr from comment #0)
> Description of problem:
> 
> When pinning virtual NUMA nodes to physical numa node with memory allocation
> policy 'STRICT', we create the required virtual numa nodes and we pin the
> virtual numa node CPUs to the correct physical NUMA nodes, but we do not
> create the correct memory mapping.
> 
> We produce something like 
> 
> >  <numatune>
> >    <memory mode='strict' nodeset='0-1'/>
> >  </numatune>

prior to RHEL 7 this was the configuration we could use AFAIR. 

Roman what is the runtime effect of this configuration? is it really different?

> 
> instead of
> 
> >  <numatune>
> >    <memory mode='strict' nodeset='0-1'/>
> >    <memnode cellid="0" mode="strict" nodeset="0"/>
> >    <memnode cellid="1" mode="strict" nodeset="1"/>
> >  </numatune>
> 
> Version-Release number of selected component (if applicable):
> 
> 
> How reproducible:
> 
> 
> Steps to Reproduce:
> 1. Create a VM with two numa nodes (0, 1)
> 2. Pin the numa nodes to two different host numa nodes (0,1)
> 3. Start the VM
> 4. run virsh -r dumpxml <vmname>
> 
> Actual results:
> 
> Virtual NUMA node cpus are pinned but memory is not pinned
> 
> Expected results:
> 
> Both should be pinned
> 
> Additional info:
> 
> Full example with two virtual NUMA nodes which are pinned with the strict
> policy to two physical NUMA nodes:
> 
> > <domain type='kvm' id='3'>
> > [...]
> >  <cputune>
> >    <shares>1020</shares>
> >    <vcpupin vcpu='1' cpuset='1'/>
> >    <vcpupin vcpu='0' cpuset='0'/>
> >  </cputune>
> >  <numatune>
> >    <memory mode='strict' nodeset='0-1'/>
> >  </numatune>
> >  <cpu mode='custom' match='exact'>
> >    <model fallback='allow'>SandyBridge</model>
> >    <topology sockets='16' cores='1' threads='1'/>
> >    <numa>
> >      <cell id='0' cpus='0' memory='3072' unit='KiB'/>
> >      <cell id='1' cpus='1' memory='3072' unit='KiB'/>
> >    </numa>
> >  </cpu>
> > [...]
> > </domain>

Comment 2 Roman Mohr 2016-02-29 12:11:27 UTC
(In reply to Roy Golan from comment #1)
> (In reply to Roman Mohr from comment #0)
> > Description of problem:
> > 
> > When pinning virtual NUMA nodes to physical numa node with memory allocation
> > policy 'STRICT', we create the required virtual numa nodes and we pin the
> > virtual numa node CPUs to the correct physical NUMA nodes, but we do not
> > create the correct memory mapping.
> > 
> > We produce something like 
> > 
> > >  <numatune>
> > >    <memory mode='strict' nodeset='0-1'/>
> > >  </numatune>
> 
> prior to RHEL 7 this was the configuration we could use AFAIR. 
> 
> Roman what is the runtime effect of this configuration? is it really
> different?
> 

At least on PPC it is some kind of ignored. Will add some more data for x86.

@Martin could you share your PPC findings?

> > 
> > instead of
> > 
> > >  <numatune>
> > >    <memory mode='strict' nodeset='0-1'/>
> > >    <memnode cellid="0" mode="strict" nodeset="0"/>
> > >    <memnode cellid="1" mode="strict" nodeset="1"/>
> > >  </numatune>
> > 
> > Version-Release number of selected component (if applicable):
> > 
> > 
> > How reproducible:
> > 
> > 
> > Steps to Reproduce:
> > 1. Create a VM with two numa nodes (0, 1)
> > 2. Pin the numa nodes to two different host numa nodes (0,1)
> > 3. Start the VM
> > 4. run virsh -r dumpxml <vmname>
> > 
> > Actual results:
> > 
> > Virtual NUMA node cpus are pinned but memory is not pinned
> > 
> > Expected results:
> > 
> > Both should be pinned
> > 
> > Additional info:
> > 
> > Full example with two virtual NUMA nodes which are pinned with the strict
> > policy to two physical NUMA nodes:
> > 
> > > <domain type='kvm' id='3'>
> > > [...]
> > >  <cputune>
> > >    <shares>1020</shares>
> > >    <vcpupin vcpu='1' cpuset='1'/>
> > >    <vcpupin vcpu='0' cpuset='0'/>
> > >  </cputune>
> > >  <numatune>
> > >    <memory mode='strict' nodeset='0-1'/>
> > >  </numatune>
> > >  <cpu mode='custom' match='exact'>
> > >    <model fallback='allow'>SandyBridge</model>
> > >    <topology sockets='16' cores='1' threads='1'/>
> > >    <numa>
> > >      <cell id='0' cpus='0' memory='3072' unit='KiB'/>
> > >      <cell id='1' cpus='1' memory='3072' unit='KiB'/>
> > >    </numa>
> > >  </cpu>
> > > [...]
> > > </domain>

Comment 3 Martin Polednik 2016-03-07 15:10:30 UTC
I have created a VM with 4 NUMA vcells pinned to pcell 0.

Resulting XML looked as follows:

<cpu>
        <model>POWER8</model>
        <topology cores="2" sockets="2" threads="2"/>
        <numa>
                <cell cpus="0,1" memory="2621440"/>
                <cell cpus="2,3" memory="2621440"/>
                <cell cpus="4,5" memory="2621440"/>
                <cell cpus="6,7" memory="2621440"/>
        </numa>
</cpu>
<numatune>
        <memory mode="strict" nodeset="0"/>
</numatune>

but checking the memory maps of the vcpu pids unveils this information:

for vcpu_pid in /proc/1476{09..16}; do
    echo $vcpu_pid
    cat $vcpu_pid/numa_maps | cut -d \  -f2 | uniq
done

/proc/147609
prefer:16
/proc/147610
prefer:16
/proc/147611
prefer:16
/proc/147612
prefer:16
/proc/147613
prefer:16
/proc/147614
prefer:16
/proc/147615
prefer:16
/proc/147616
prefer:16

My conclussion is that, at least on PPC, the memory is not pinned at all and prefers node that was not at all chosen.

Comment 4 Martin Polednik 2016-03-07 15:41:24 UTC
Testing with Cluster-on-Die PC (Xeon 2650-v3) with COD enabled (2 numa nodes, 0 and 1)

<cpu match="exact" mode="host-passthrough">
        <topology cores="2" sockets="16" threads="2"/>
        <numa>
                <cell cpus="0,1" memory="2621440"/>
                <cell cpus="2,3" memory="2621440"/>
                <cell cpus="4,5" memory="2621440"/>
                <cell cpus="6,7" memory="2621440"/>
        </numa>
</cpu>
<numatune>
        <memory mode="strict" nodeset="1"/>
</numatune>

---

/proc/28616
prefer:0
/proc/28617
prefer:1
/proc/28618
prefer:0
/proc/28619
prefer:1
/proc/28620
prefer:0
/proc/28621
prefer:1
/proc/28622
prefer:0
/proc/28623
prefer:0

Comment 7 Sandro Bonazzola 2016-12-12 13:56:00 UTC
The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified.

Comment 8 Artyom 2016-12-13 14:37:55 UTC
Verified on ovirt-engine-setup-plugin-ovirt-engine-4.1.0-0.2.master.20161212172238.gitea103bd.el7.centos.noarch

dumpxml for two NUMA nodes
<numatune>
    <memory mode='strict' nodeset='0-1'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
    <memnode cellid='1' mode='strict' nodeset='1'/>
  </numatune>


VM process also looks fine:
default
bind:1
default
bind:0
default
/proc/16941/task/16943
default
bind:1
default
bind:0
default
/proc/16941/task/16947
default
bind:1
default
bind:0
default
/proc/16941/task/16948
default
bind:1
default
bind:0
default
/proc/16941/task/16950
default
bind:1
default
bind:0
default