Created attachment 1744652 [details] logs Created attachment 1744652 [details] logs Description of problem: the number of Threads per Core on the guest differs from the host in lscpu command (reproducible on hosts with 4 Numa nodes) Version-Release number of selected component (if applicable): ovirt-engine-4.4.4.5-0.10.el8ev.noarch How reproducible:100% on certain environment Steps to Reproduce: 1. POST https://{{host}}/ovirt-engine/api/vms?auto_pinning_policy=adjust <vm> <name>auto_cpu_vm</name> <template> <name>latest-rhel-guest-image-8.3-infra</name> </template> <cluster> <name>golden_env_mixed_1</name> </cluster> <placement_policy> <hosts> <host> <name>host_mixed_1</name> </host> </hosts> </placement_policy> </vm> Actual results:Thread(s) per Core in lscpu are not identical Expected results: According to the documentation the expected is 1. CPU(s) in the VM must be lower or equal to the host. 45 (on VM) < 46 (host) - ok 2. Thread(s) per Core must be identical. though in this case 2(on host) != 1 (on VM) 3. Core(s) per socket in the VM must be lower or equal to the host. In this case 24 =24 - ok 4. Socket(s) must be identical. In this case 1=1 5. NUMA node(s) must be identical. In this case 4=4. Additional info: VM lscpu: http://pastebin.test.redhat.com/927472 Host lscpu : http://pastebin.test.redhat.com/927471 cpuid -r for vm and host are in the attached cpuid_vm.txt and cpuid_host.txt. on host and vm we have util-linux-2.32.1-24.el8.x86_64 . kernel version on host and vm 4.18.0-240.4.1.el8_3.x86_64
The problem is not with auto pinning policy. It's with the way we automatically set the cpusets for NUMA nodes. I will pull out the HW from internal pastebin: Host lscpu: [root@ocelot05 ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 1 NUMA node(s): 4 Vendor ID: AuthenticAMD CPU family: 23 Model: 1 Model name: AMD EPYC 7451 24-Core Processor Stepping: 2 CPU MHz: 2859.092 CPU max MHz: 2300.0000 CPU min MHz: 1200.0000 BogoMIPS: 4599.39 Virtualization: AMD-V L1d cache: 32K L1i cache: 64K L2 cache: 512K L3 cache: 8192K NUMA node0 CPU(s): 0-5,24-29 NUMA node1 CPU(s): 6-11,30-35 NUMA node2 CPU(s): 12-17,36-41 NUMA node3 CPU(s): 18-23,42-47 VM lscpu: [root@vm-30-110 ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 46 On-line CPU(s) list: 0-45 Thread(s) per core: 1 Core(s) per socket: 24 Socket(s): 1 NUMA node(s): 4 Vendor ID: AuthenticAMD CPU family: 23 Model: 1 Model name: AMD EPYC Processor Stepping: 2 CPU MHz: 2299.994 BogoMIPS: 4599.98 Virtualization: AMD-V Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 64K L2 cache: 512K L3 cache: 8192K NUMA node0 CPU(s): 0-11 NUMA node1 CPU(s): 12-23 NUMA node2 CPU(s): 24-34 NUMA node3 CPU(s): 35-45 VM domxml was created with: Eduardo Habkost investigation lead to: <numa> <cell id='0' cpus='0-11,48,52,56,60,64,68,72,76,80,84,88,92,96,100,104,108,112,116,120,124,128,132,136,140,144,148,152,156,160,164,168,172,176,180' memory='262144' unit='KiB'/> <cell id='1' cpus='12-23,49,53,57,61,65,69,73,77,81,85,89,93,97,101,105,109,113,117,121,125,129,133,137,141,145,149,153,157,161,165,169,173,177,181' memory='262144' unit='KiB'/> <cell id='2' cpus='24-34,46,50,54,58,62,66,70,74,78,82,86,90,94,98,102,106,110,114,118,122,126,130,134,138,142,146,150,154,158,162,166,170,174,178,182' memory='262144' unit='KiB'/> <cell id='3' cpus='35-45,47,51,55,59,63,67,71,75,79,83,87,91,95,99,103,107,111,115,119,123,127,131,135,139,143,147,151,155,159,163,167,171,175,179,183' memory='262144' unit='KiB'/> </numa> VCPU 34 is in node 2, but CPU 35 is in node 3. Which confuses lscpu. As far as I can see, this configuration is generated by RHV. It looks like the only input in the RHV UI is "number of NUMA nodes", and in that VM the number of cores was not a multiple of the number of NUMA nodes. It's up to the RHV UI designers to decide what to do in this case: it could prevent such configuration, emit a warning, or split the VCPUs between the NUMA nodes correctly. I suggest also making sure all cores in a socket stay in the same NUMA node, to avoid similar surprises in the guest interpretation of the socket/core topology. Dr. David Gillbert sugguested: it's specifying that one physical CPU (34/35 ) is split across two NUMA nodes. If you change it to: <cell id='0' cpus='0-11' <cell id='1' cpus='12-23' <cell id='2' cpus='24-35' <cell id='3' cpus='36-45' that depends on the topology within a socket on AMD; it might want to be at the 'die' level, but if we're not doing dies then I agree we may as well keep everything in a socket within a NUMA node. We need to decide to have a warning / split it correctly when the cores not a multiple of the number of NUMA nodes.
(In reply to Liran Rotenberg from comment #1) > We need to decide to have a warning / split it correctly when the cores not > a multiple of the number of NUMA nodes. Note that a warning is not an option for the OCP on RHV use case so we should probably try to fix the allocation logic. I think this bug currently talks about two related but different issues: 1. That the CPU topology from within the guest doesn't match the VM settings (that's what the title says) 2. That the allocation of vCPUs to NUMA nodes is not correct/optimal The reason for #1 is that when creating a VM with auto_pinning_policy=adjust, we set the CPU topology of the VM according to the CPU topology of the host but we don't allocate all CPUs (in this case 46 vCPUs are allocated on a host with 48 threads so we can't reach a 1:24:2 topology from the guest point of view - maybe in this particular case it would have been better to set the CPU topology of the guest to 1:23:2 if that's valid). #2 is more broad - it can happen regardless of auto_pinning_policy=adjust. Polina, could you please file a separate bug for #2?
reported https://bugzilla.redhat.com/show_bug.cgi?id=1913269
verified on ovirt-engine-tools-4.5.0.2-0.7.el8ev.noarch accordibng to the description host CPU(s): 48 On-line CPU(s) list: 0-47 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 1 NUMA node(s): 4
This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022. Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.