Bug 1912967
| Summary: | Unexpected Threads per core on guest for VM when setting NUMA pinning | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Polina <pagranat> | ||||
| Component: | BLL.Virt | Assignee: | Liran Rotenberg <lrotenbe> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Polina <pagranat> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.4.4.5 | CC: | ahadas, bugs, lrotenbe | ||||
| Target Milestone: | ovirt-4.5.0 | Flags: | pm-rhel:
ovirt-4.5?
ahadas: planning_ack? ahadas: devel_ack+ pm-rhel: testing_ack+ |
||||
| Target Release: | 4.5.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | ovirt-engine-4.5.0 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2022-04-20 06:33:59 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Polina
2021-01-05 17:16:29 UTC
The problem is not with auto pinning policy. It's with the way we automatically set the cpusets for NUMA nodes.
I will pull out the HW from internal pastebin:
Host lscpu:
[root@ocelot05 ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 1
NUMA node(s): 4
Vendor ID: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD EPYC 7451 24-Core Processor
Stepping: 2
CPU MHz: 2859.092
CPU max MHz: 2300.0000
CPU min MHz: 1200.0000
BogoMIPS: 4599.39
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-5,24-29
NUMA node1 CPU(s): 6-11,30-35
NUMA node2 CPU(s): 12-17,36-41
NUMA node3 CPU(s): 18-23,42-47
VM lscpu:
[root@vm-30-110 ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 46
On-line CPU(s) list: 0-45
Thread(s) per core: 1
Core(s) per socket: 24
Socket(s): 1
NUMA node(s): 4
Vendor ID: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD EPYC Processor
Stepping: 2
CPU MHz: 2299.994
BogoMIPS: 4599.98
Virtualization: AMD-V
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-11
NUMA node1 CPU(s): 12-23
NUMA node2 CPU(s): 24-34
NUMA node3 CPU(s): 35-45
VM domxml was created with:
Eduardo Habkost investigation lead to:
<numa>
<cell id='0' cpus='0-11,48,52,56,60,64,68,72,76,80,84,88,92,96,100,104,108,112,116,120,124,128,132,136,140,144,148,152,156,160,164,168,172,176,180' memory='262144' unit='KiB'/>
<cell id='1' cpus='12-23,49,53,57,61,65,69,73,77,81,85,89,93,97,101,105,109,113,117,121,125,129,133,137,141,145,149,153,157,161,165,169,173,177,181' memory='262144' unit='KiB'/>
<cell id='2' cpus='24-34,46,50,54,58,62,66,70,74,78,82,86,90,94,98,102,106,110,114,118,122,126,130,134,138,142,146,150,154,158,162,166,170,174,178,182' memory='262144' unit='KiB'/>
<cell id='3' cpus='35-45,47,51,55,59,63,67,71,75,79,83,87,91,95,99,103,107,111,115,119,123,127,131,135,139,143,147,151,155,159,163,167,171,175,179,183' memory='262144' unit='KiB'/>
</numa>
VCPU 34 is in node 2, but CPU 35 is in node 3. Which confuses lscpu.
As far as I can see, this configuration
is generated by RHV. It looks like the only input in the RHV UI
is "number of NUMA nodes", and in that VM the number of cores was
not a multiple of the number of NUMA nodes.
It's up to the RHV UI designers to decide what to do in this
case: it could prevent such configuration, emit a warning, or
split the VCPUs between the NUMA nodes correctly.
I suggest also making sure all cores in a socket stay in the same
NUMA node, to avoid similar surprises in the guest interpretation
of the socket/core topology.
Dr. David Gillbert sugguested:
it's specifying that one physical CPU
(34/35 ) is split across two NUMA nodes.
If you change it to:
<cell id='0' cpus='0-11'
<cell id='1' cpus='12-23'
<cell id='2' cpus='24-35'
<cell id='3' cpus='36-45'
that depends on the topology within a socket on AMD;
it might want to be at the 'die' level, but if we're not doing dies
then I agree we may as well keep everything in a socket within a NUMA
node.
We need to decide to have a warning / split it correctly when the cores not a multiple of the number of NUMA nodes.
(In reply to Liran Rotenberg from comment #1) > We need to decide to have a warning / split it correctly when the cores not > a multiple of the number of NUMA nodes. Note that a warning is not an option for the OCP on RHV use case so we should probably try to fix the allocation logic. I think this bug currently talks about two related but different issues: 1. That the CPU topology from within the guest doesn't match the VM settings (that's what the title says) 2. That the allocation of vCPUs to NUMA nodes is not correct/optimal The reason for #1 is that when creating a VM with auto_pinning_policy=adjust, we set the CPU topology of the VM according to the CPU topology of the host but we don't allocate all CPUs (in this case 46 vCPUs are allocated on a host with 48 threads so we can't reach a 1:24:2 topology from the guest point of view - maybe in this particular case it would have been better to set the CPU topology of the guest to 1:23:2 if that's valid). #2 is more broad - it can happen regardless of auto_pinning_policy=adjust. Polina, could you please file a separate bug for #2? verified on ovirt-engine-tools-4.5.0.2-0.7.el8ev.noarch accordibng to the description host CPU(s): 48 On-line CPU(s) list: 0-47 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 1 NUMA node(s): 4 This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022. Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |