Red Hat Bugzilla – Bug 888503
libvirt: wrong cpu topology - AMD Bulldozer 62XX familly
Last modified: 2014-01-12 18:54:39 EST
Description of problem: VDSM uses libvirt cpu topology to determine the number of cores and threads. For AMD Bulldozer 62XX machines it doesn't look accurate. To avoid change the historic output from libvirt, what about add into the xml output the 'total CPU sockets' like lscpu and /pro/cpuinfo does and leave the current libvirt as: <topology sockets='1' cores='8' threads='2'/> (as upstream libvirt-1.0.0 shows) ... <cells num='4'> ================================================== Would be like: <topology totalsockets='2' sockets='1' cores='8' threads='2'/> ... <cells num='4'> ================================================== This will show the totalSockets = 2 and sockets per numa = 1 (as libvirt already shows) Just to clarify our needs, vdsm gets the total sockets and total cores (without threads) from libvirt. To this system for example, we are looking for a way to have 2 sockets and 16 cores total from libvirt. With that, we would report in vdsm the field 'report_host_threads_as_cores' as: if enabled: ================ (cores = 8) * (threads = 2) * (new_libvirt_field_total_sockets = 2) = 32 total cores if disabled: =================== (cores = 8) * (new_libvirt_field_total_sockets = 2) = 16 total cores we have others system resources sharing: /proc/cpuinfo we have the split: <snip> cpu cores : 8 (number of cores per CPU package) siblings : 16 (HT per CPU package) * (number of cores per CPU package) </snip> Socket: ===================== # cat /proc/cpuinfo | grep "physical id" | sort | uniq | wc -l 2 also from lscpu: ========================= <snip> Thread(s) per core: 2 (core + thread) Core(s) per socket: 8 CPU socket(s): 2 On-line CPU(s) list: 0-31 NUMA node(s): 4 NUMA node0 CPU(s): 0-7 NUMA node1 CPU(s): 8-15 NUMA node2 CPU(s): 16-23 NUMA node3 CPU(s): 24-31 </snip> Initial discussion started in BZ#833425.
Hi Peter, I do believe it will affect customers in 6.4. If possible, yes. Amador, do you have any customer affected at moment for this processor family? Thanks Douglas
Yes Douglas. I attached the cases I'm following so far. Thank you.
> VDSM uses libvirt cpu topology to determine the number of cores and threads. > For AMD Bulldozer 62XX machines it doesn't look accurate. > > To avoid change the historic output from libvirt, what about add into the > xml output the 'total CPU sockets' like lscpu and /pro/cpuinfo does and > leave the current libvirt as: Before we start discussing extensions to the libvirt XML, can you actually tell us what's wrong with the existing data. Please provide the current libvirt XML, a complete copy of the /proc/cpuinfo file, and the full output of 'numactl --hardware'.
The root of the problem with the existing data is that VDSM is unable to tell the actual number of physical CPU sockets/packages in the host. This issue is visible on AMD Piledriver and AMD bulldozer hosts that have multiple NUMA nodes per physical socket. With this VDSM is unable to count the number of sockets as multiplying the "sockets" field by number of NUMA nodes yields incorrect count. I'm not sure why is the number of actual CPU sockets that important, but in case the host has a strange NUMA arch, the numbers will be off. Note: AMD Piledriver is two 6-core CPUs in one physical package.
The NUMA topology data were added upstream by: commit 79a003f9b0042ef4d2cf290e555364565b7bff42 Author: Peter Krempa <pkrempa@redhat.com> Date: Fri Jan 18 23:06:55 2013 +0100 capabilities: Add additional data to the NUMA topology info This patch adds data gathering to the NUMA gathering files and adds support for outputting the data. The test driver and xend driver need to be adapted to fill sensible data to the structure in a future patch. commit 87b4c10c6cf02251dd8c29b5b895bebc6ec297f9 Author: Peter Krempa <pkrempa@redhat.com> Date: Tue Jan 22 18:42:08 2013 +0100 capabilities: Switch CPU data in NUMA topology to a struct This will allow storing additional topology data in the NUMA topology definition. This patch changes the storage type and fixes fallout of the change across the drivers using it. This patch also changes semantics of adding new NUMA cell information. Until now the data were re-allocated and copied to the topology definition. This patch changes the addition function to steal the pointer to a pre-allocated structure to simplify the code. commit 987fd7db4fc4ed8ff47339d440cdfb02ef1f0b58 Author: Peter Krempa <pkrempa@redhat.com> Date: Fri Jan 18 20:39:00 2013 +0100 conf: Split out NUMA topology formatting to simplify access to data commit 828820e2d371205d6a6061301165d58a1a92e611 Author: Peter Krempa <pkrempa@redhat.com> Date: Fri Jan 18 19:30:00 2013 +0100 schemas: Add schemas for more CPU topology information in the caps XML This patch adds RNG schemas for adding more information in the topology output of the NUMA section in the capabilities XML. The added elements are designed to provide more information about the placement and topology of the processors in the system to management applications. A demonstration of supported XML added by this patch: <capabilities> <host> <topology> <cells num='3'> <cell id='0'> <cpus num='4'> <!-- this is node with Hyperthreading --> <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/> <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/> <cpu id='2' socket_id='0' core_id='1' siblings='2-3'/> <cpu id='3' socket_id='0' core_id='1' siblings='2-3'/> </cpus> </cell> <cell id='1'> <cpus num='4'> <!-- this is node with modules (Bulldozer) --> <cpu id='4' socket_id='0' core_id='2' siblings='4-5'/> <cpu id='5' socket_id='0' core_id='3' siblings='4-5'/> <cpu id='6' socket_id='0' core_id='4' siblings='6-7'/> <cpu id='7' socket_id='0' core_id='5' siblings='6-7'/> </cpus> </cell> <cell id='2'> <cpus num='4'> <!-- this is a normal multi-core node --> <cpu id='8' socket_id='1' core_id='0' siblings='8'/> <cpu id='9' socket_id='1' core_id='1' siblings='9'/> <cpu id='10' socket_id='1' core_id='2' siblings='10'/> <cpu id='11' socket_id='1' core_id='3' siblings='11'/> </cpus> </cell> </cells> </topology> </host> </capabilities> The socket_id field represents identification of the physical socket the CPU is plugged in. This ID may not be identical to the physical socket ID reported by the kernel. The core_id identifies a core within a socket. Also this field may not accurately represent physical ID's. The core_id is guaranteed to be unique within a cell and a socket. There may be duplicates between sockets. Only cores sharing core_id within one cell and one socket can be considered as threads. Cores sharing core_id within sparate cells are distinct cores. The siblings field is a list of CPU id's the cpu id's the CPU is sibling with - thus a thread. The list is in the cpuset format. Moving to POST for 6.5
Created attachment 690965 [details] Test script
I tried to reproduce a funky NUMA distribution I had to deal some days ago (with memory banks distributed in a bad way). I started a VM with: .. -smp 32,sockets=4,cores=4,threads=2 -numa node,nodeid=0,cpus=0-23 -numa node,nodeid=1,cpus=24-31 -numa node,nodeid=2 -numa node,nodeid=3 ... Results: # numactl --hardware node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 node 0 size: 1247 MB node 0 free: 927 MB node 1 cpus: 24 25 26 27 28 29 30 31 node 1 size: 1247 MB node 1 free: 1118 MB node 2 cpus: node 2 size: 1248 MB node 2 free: 1215 MB node 3 cpus: node 3 size: 1256 MB node 3 free: 1221 MB Libvirt is in fallback probe: # virsh nodeinfo ... CPU socket(s): 1 Core(s) per socket: 32 Thread(s) per core: 1 NUMA cell(s): 1 ... New capabilities working as expected: <topology> <cells num='4'> <cell id='0'> <cpus num='24'> <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/> <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/> <cpu id='2' socket_id='0' core_id='1' siblings='2-3'/> <cpu id='3' socket_id='0' core_id='1' siblings='2-3'/> <cpu id='4' socket_id='0' core_id='2' siblings='4-5'/> <cpu id='5' socket_id='0' core_id='2' siblings='4-5'/> <cpu id='6' socket_id='0' core_id='3' siblings='6-7'/> <cpu id='7' socket_id='0' core_id='3' siblings='6-7'/> <cpu id='8' socket_id='1' core_id='0' siblings='8-9'/> <cpu id='9' socket_id='1' core_id='0' siblings='8-9'/> <cpu id='10' socket_id='1' core_id='1' siblings='10-11'/> <cpu id='11' socket_id='1' core_id='1' siblings='10-11'/> <cpu id='12' socket_id='1' core_id='2' siblings='12-13'/> <cpu id='13' socket_id='1' core_id='2' siblings='12-13'/> <cpu id='14' socket_id='1' core_id='3' siblings='14-15'/> <cpu id='15' socket_id='1' core_id='3' siblings='14-15'/> <cpu id='16' socket_id='2' core_id='0' siblings='16-17'/> <cpu id='17' socket_id='2' core_id='0' siblings='16-17'/> <cpu id='18' socket_id='2' core_id='1' siblings='18-19'/> <cpu id='19' socket_id='2' core_id='1' siblings='18-19'/> <cpu id='20' socket_id='2' core_id='2' siblings='20-21'/> <cpu id='21' socket_id='2' core_id='2' siblings='20-21'/> <cpu id='22' socket_id='2' core_id='3' siblings='22-23'/> <cpu id='23' socket_id='2' core_id='3' siblings='22-23'/> </cpus> </cell> <cell id='1'> <cpus num='8'> <cpu id='24' socket_id='3' core_id='0' siblings='24-25'/> <cpu id='25' socket_id='3' core_id='0' siblings='24-25'/> <cpu id='26' socket_id='3' core_id='1' siblings='26-27'/> <cpu id='27' socket_id='3' core_id='1' siblings='26-27'/> <cpu id='28' socket_id='3' core_id='2' siblings='28-29'/> <cpu id='29' socket_id='3' core_id='2' siblings='28-29'/> <cpu id='30' socket_id='3' core_id='3' siblings='30-31'/> <cpu id='31' socket_id='3' core_id='3' siblings='30-31'/> </cpus> </cell> <cell id='2'> <cpus num='0'> </cpus> </cell> <cell id='3'> <cpus num='0'> </cpus> </cell> </cells> </topology> And my script (attached) returning reasonable results: #./with_new_caps_pythonic.py Sockets: 4 Cores: 16 Threads: 32 Thank you Peter. Great work here. Amador Pahim
pkgs: libvirt-0.10.2-19.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.377.el6.x86_64 kernel-2.6.32-395.el6.x86_64 steps: On a host with AMD bulldozer cpu: # cat /proc/cpuinfo |grep "model name"|head -1 model name : AMD Opteron(tm) Processor 6282 SE # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 4 NUMA node(s): 8 Vendor ID: AuthenticAMD CPU family: 21 Model: 1 Stepping: 2 CPU MHz: 2593.501 BogoMIPS: 5186.42 Virtualization: AMD-V L1d cache: 16K L1i cache: 64K L2 cache: 2048K L3 cache: 6144K NUMA node0 CPU(s): 0,4,8,12,16,20,24,28 NUMA node1 CPU(s): 32,36,40,44,48,52,56,60 NUMA node2 CPU(s): 1,5,9,13,17,21,25,29 NUMA node3 CPU(s): 33,37,41,45,49,53,57,61 NUMA node4 CPU(s): 2,6,10,14,18,22,26,30 NUMA node5 CPU(s): 34,38,42,46,50,54,58,62 NUMA node6 CPU(s): 35,39,43,47,51,55,59,63 NUMA node7 CPU(s): 3,7,11,15,19,23,27,31 # numactl --hardware available: 8 nodes (0-7) node 0 cpus: 0 4 8 12 16 20 24 28 node 0 size: 16349 MB node 0 free: 15482 MB node 1 cpus: 32 36 40 44 48 52 56 60 node 1 size: 16384 MB node 1 free: 15140 MB node 2 cpus: 1 5 9 13 17 21 25 29 node 2 size: 16384 MB node 2 free: 15933 MB node 3 cpus: 33 37 41 45 49 53 57 61 node 3 size: 16384 MB node 3 free: 15872 MB node 4 cpus: 2 6 10 14 18 22 26 30 node 4 size: 16384 MB node 4 free: 15672 MB node 5 cpus: 34 38 42 46 50 54 58 62 node 5 size: 16384 MB node 5 free: 15912 MB node 6 cpus: 35 39 43 47 51 55 59 63 node 6 size: 16384 MB node 6 free: 15806 MB node 7 cpus: 3 7 11 15 19 23 27 31 node 7 size: 16367 MB node 7 free: 15894 MB node distances: node 0 1 2 3 4 5 6 7 0: 10 16 16 22 16 16 22 22 1: 16 10 16 22 22 22 16 22 2: 16 16 10 16 22 22 22 16 3: 22 22 16 10 22 16 22 16 4: 16 22 22 22 10 16 16 16 5: 16 22 22 16 16 10 22 22 6: 22 16 22 22 16 22 10 16 7: 22 22 16 16 16 22 16 10 # virsh nodeinfo CPU model: x86_64 CPU(s): 64 CPU frequency: 2593 MHz CPU socket(s): 1 Core(s) per socket: 64 Thread(s) per core: 1 NUMA cell(s): 1 Memory size: 132035588 KiB 1. check capabilities # virsh capabilities ... <topology> <cells num='8'> <cell id='0'> <cpus num='8'> <cpu id='0' socket_id='0' core_id='0' siblings='0,4'/> <cpu id='4' socket_id='0' core_id='1' siblings='0,4'/> <cpu id='8' socket_id='0' core_id='2' siblings='8,12'/> <cpu id='12' socket_id='0' core_id='3' siblings='8,12'/> <cpu id='16' socket_id='0' core_id='4' siblings='16,20'/> <cpu id='20' socket_id='0' core_id='5' siblings='16,20'/> <cpu id='24' socket_id='0' core_id='6' siblings='24,28'/> <cpu id='28' socket_id='0' core_id='7' siblings='24,28'/> </cpus> </cell> <cell id='1'> <cpus num='8'> <cpu id='32' socket_id='0' core_id='0' siblings='32,36'/> <cpu id='36' socket_id='0' core_id='1' siblings='32,36'/> <cpu id='40' socket_id='0' core_id='2' siblings='40,44'/> <cpu id='44' socket_id='0' core_id='3' siblings='40,44'/> <cpu id='48' socket_id='0' core_id='4' siblings='48,52'/> <cpu id='52' socket_id='0' core_id='5' siblings='48,52'/> <cpu id='56' socket_id='0' core_id='6' siblings='56,60'/> <cpu id='60' socket_id='0' core_id='7' siblings='56,60'/> </cpus> </cell> <cell id='2'> <cpus num='8'> <cpu id='1' socket_id='1' core_id='0' siblings='1,5'/> <cpu id='5' socket_id='1' core_id='1' siblings='1,5'/> <cpu id='9' socket_id='1' core_id='2' siblings='9,13'/> <cpu id='13' socket_id='1' core_id='3' siblings='9,13'/> <cpu id='17' socket_id='1' core_id='4' siblings='17,21'/> <cpu id='21' socket_id='1' core_id='5' siblings='17,21'/> <cpu id='25' socket_id='1' core_id='6' siblings='25,29'/> <cpu id='29' socket_id='1' core_id='7' siblings='25,29'/> </cpus> </cell> <cell id='3'> <cpus num='8'> <cpu id='33' socket_id='1' core_id='0' siblings='33,37'/> <cpu id='37' socket_id='1' core_id='1' siblings='33,37'/> <cpu id='41' socket_id='1' core_id='2' siblings='41,45'/> <cpu id='45' socket_id='1' core_id='3' siblings='41,45'/> <cpu id='49' socket_id='1' core_id='4' siblings='49,53'/> <cpu id='53' socket_id='1' core_id='5' siblings='49,53'/> <cpu id='57' socket_id='1' core_id='6' siblings='57,61'/> <cpu id='61' socket_id='1' core_id='7' siblings='57,61'/> </cpus> </cell> <cell id='4'> <cpus num='8'> <cpu id='2' socket_id='2' core_id='0' siblings='2,6'/> <cpu id='6' socket_id='2' core_id='1' siblings='2,6'/> <cpu id='10' socket_id='2' core_id='2' siblings='10,14'/> <cpu id='14' socket_id='2' core_id='3' siblings='10,14'/> <cpu id='18' socket_id='2' core_id='4' siblings='18,22'/> <cpu id='22' socket_id='2' core_id='5' siblings='18,22'/> <cpu id='26' socket_id='2' core_id='6' siblings='26,30'/> <cpu id='30' socket_id='2' core_id='7' siblings='26,30'/> </cpus> </cell> <cell id='5'> <cpus num='8'> <cpu id='34' socket_id='2' core_id='0' siblings='34,38'/> <cpu id='38' socket_id='2' core_id='1' siblings='34,38'/> <cpu id='42' socket_id='2' core_id='2' siblings='42,46'/> <cpu id='46' socket_id='2' core_id='3' siblings='42,46'/> <cpu id='50' socket_id='2' core_id='4' siblings='50,54'/> <cpu id='54' socket_id='2' core_id='5' siblings='50,54'/> <cpu id='58' socket_id='2' core_id='6' siblings='58,62'/> <cpu id='62' socket_id='2' core_id='7' siblings='58,62'/> </cpus> </cell> <cell id='6'> <cpus num='8'> <cpu id='35' socket_id='3' core_id='0' siblings='35,39'/> <cpu id='39' socket_id='3' core_id='1' siblings='35,39'/> <cpu id='43' socket_id='3' core_id='2' siblings='43,47'/> <cpu id='47' socket_id='3' core_id='3' siblings='43,47'/> <cpu id='51' socket_id='3' core_id='4' siblings='51,55'/> <cpu id='55' socket_id='3' core_id='5' siblings='51,55'/> <cpu id='59' socket_id='3' core_id='6' siblings='59,63'/> <cpu id='63' socket_id='3' core_id='7' siblings='59,63'/> </cpus> </cell> <cell id='7'> <cpus num='8'> <cpu id='3' socket_id='3' core_id='0' siblings='3,7'/> <cpu id='7' socket_id='3' core_id='1' siblings='3,7'/> <cpu id='11' socket_id='3' core_id='2' siblings='11,15'/> <cpu id='15' socket_id='3' core_id='3' siblings='11,15'/> <cpu id='19' socket_id='3' core_id='4' siblings='19,23'/> <cpu id='23' socket_id='3' core_id='5' siblings='19,23'/> <cpu id='27' socket_id='3' core_id='6' siblings='27,31'/> <cpu id='31' socket_id='3' core_id='7' siblings='27,31'/> </cpus> </cell> </cells> </topology> ... socket_id is from 0 to 3 and siblings have 2 which match with socket and threads in lscpu output. 2. using attached python script # virsh capabilities > capabilities.xml # python with_new_caps_pythonic.py Sockets: 4 Cores: 32 Threads: 64 this is expected. Hi Amador, the machine you used in comment #20 we call it sparse NUMA box, is this machine happen to be in beaker with internal access?
Hi Wayne, The machine is not accessible, but the sparse NUMA box could be reproduced with qemu using something like this: /usr/libexec/qemu-kvm -m 4096 -smp 32,sockets=4,cores=4,threads=2 -numa node,nodeid=0,cpus=0-23 -numa node,nodeid=1,cpus=24-31 -numa node,nodeid=2 -numa node,nodeid=3 /var/lib/vms/vm01.img
(In reply to Amador Pahim from comment #26) > Hi Wayne, > > The machine is not accessible, but the sparse NUMA box could be reproduced > with qemu using something like this: > > /usr/libexec/qemu-kvm -m 4096 -smp 32,sockets=4,cores=4,threads=2 -numa > node,nodeid=0,cpus=0-23 -numa node,nodeid=1,cpus=24-31 -numa node,nodeid=2 > -numa node,nodeid=3 /var/lib/vms/vm01.img Hi Amador, thx for reply, are you suggesting to test sparse NUMA inside a qemu-kvm vm? I did start a vm with sparse NUMA, but problem is nested kvm is not supported in rhel7 yet, so the virsh cmd of nodeinfo and capabilities will fail of can't find hypervisor. And I'm also not sure guest numa topo is fully supported now, as my test with numactl in vm did not output the right info as I passed to qemu. Anyway, this bug is verified on a physical host. Thanks for help
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1581.html