Red Hat Bugzilla – Bug 874050
virsh nodeinfo can't get the right info on AMD Bulldozer cpu
Last modified: 2013-02-21 02:26:06 EST
Created attachment 639976 [details] sysfs dump info Description of problem: On a host with AMD 6200 series cpu, which is AMD "Interlagos" platform, consist two MCM (Multi-Chip Module) with 4 "Bulldozer" modules each, total 8 "Bulldozer" modules, virsh nodeinfo collect wrong info with threads and then the cpu total number not match. Detail Bulldozer info: http://en.wikipedia.org/wiki/Bulldozer_(microarchitecture) the sysfs device info is attached. The problem is at parsing the thread numbers, the total CPU number should be 64 while nodeinfo shows will be 128(8*2*8). "threads" are somewhat between a core and a thread, they have separate core ID's and separate thread ID's, it also have the thread_siblings parameter filled, that might be the cause. Version-Release number of selected component (if applicable): libvirt-0.10.2-7.el6.x86_64 qemu-kvm-0.12.1.2-2.295.el6.x86_64 kernel-2.6.32-279.el6.x86_64 How reproducible: always Steps to Reproduce: 1. # cat /proc/cpuinfo |grep "model name"|tail -1 model name : AMD Opteron(tm) Processor 6282 SE # numactl --hardware available: 8 nodes (0-7) node 0 cpus: 0 4 8 12 16 20 24 28 node 0 size: 16349 MB node 0 free: 15596 MB node 1 cpus: 32 36 40 44 48 52 56 60 node 1 size: 16384 MB node 1 free: 15931 MB node 2 cpus: 1 5 9 13 17 21 25 29 node 2 size: 16384 MB node 2 free: 15871 MB node 3 cpus: 33 37 41 45 49 53 57 61 node 3 size: 16384 MB node 3 free: 15845 MB node 4 cpus: 2 6 10 14 18 22 26 30 node 4 size: 16384 MB node 4 free: 15811 MB node 5 cpus: 34 38 42 46 50 54 58 62 node 5 size: 16384 MB node 5 free: 15917 MB node 6 cpus: 35 39 43 47 51 55 59 63 node 6 size: 16384 MB node 6 free: 15855 MB node 7 cpus: 3 7 11 15 19 23 27 31 node 7 size: 16367 MB node 7 free: 15869 MB node distances: node 0 1 2 3 4 5 6 7 0: 10 20 20 20 20 20 20 20 1: 20 10 20 20 20 20 20 20 2: 20 20 10 20 20 20 20 20 3: 20 20 20 10 20 20 20 20 4: 20 20 20 20 10 20 20 20 5: 20 20 20 20 20 10 20 20 6: 20 20 20 20 20 20 10 20 7: 20 20 20 20 20 20 20 10 2. # virsh nodeinfo CPU model: x86_64 CPU(s): 64 CPU frequency: 2593 MHz CPU socket(s): 1 Core(s) per socket: 8 Thread(s) per core: 2 NUMA cell(s): 8 Memory size: 132101788 KiB 3. Actual results: nodeinfo is not right Expected results: nodeinfo output should be right Additional info:
Created attachment 640518 [details] cpuinfo /proc/cpuinfo is attached
Fix/workaround proposed upstream: http://www.redhat.com/archives/libvir-list/2012-November/msg00365.html
Fixed upstream: commit 7a791677b0e6cc3ae45aafdbca732f0f7ce05cbf Author: Peter Krempa <pkrempa@redhat.com> Date: Wed Nov 7 15:50:56 2012 +0100 nodeinfotest: Add test data from a AMD bulldozer machine. The AMD Bulldozer architecture uses so called "Clustered integer core modules" that count both as threads and cores. This patch expects the cpu to be detected using the new fallback condition otherwise twice the number of processors would be detected. commit 86748976f18423c359e94294bd57df9fd9d98ce4 Author: Peter Krempa <pkrempa@redhat.com> Date: Wed Nov 7 15:19:47 2012 +0100 nodeinfotest: Add test data for 2 processor host with broken NUMA This test data was gathered on an AMD MagnyCours machine that reports it has only one NUMA node although the hardware is consisting of 4. As duplicate core id's are ignored the reported topology was bogous. This should be fixed by the previous patch. Reported and data provided by George-Cristian Bîrzan. commit 9576afd110b8c3edeb65f9b39448884763ca68bd Author: Peter Krempa <pkrempa@redhat.com> Date: Wed Nov 7 14:53:36 2012 +0100 nodeinfo: Add check and workaround to guarantee valid cpu topologies Lately there were a few reports of the output of the virsh nodeinfo command being inaccurate. This patch tries to avoid that by checking if the topology actually makes sense. If it doesn't we then report a synthetic topology that indicates to the user that the host capabilities should be checked for the actual topology.
Should we move this back to ASSIGNED to also take in Viktor's upstream improvements? https://www.redhat.com/archives/libvir-list/2012-November/msg00572.html
# rpm -q libvirt qemu-kvm kernel libvirt-0.10.2-9.el6.x86_64 qemu-kvm-0.12.1.2-2.295.el6.x86_64 kernel-2.6.32-279.el6.x86_64 On the same box as in description: # virsh nodeinfo CPU model: x86_64 CPU(s): 64 CPU frequency: 2593 MHz CPU socket(s): 1 Core(s) per socket: 64 Thread(s) per core: 1 NUMA cell(s): 1 Memory size: 132101788 KiB I did not do detail check of all the patches yet, but nodeinfo still fail to show the right info. Threads per core might be right now, but host is 8 nodes not 1, and cores per socket should be 8 not 64 as i think. Hi Peter, What you think?
In case of unusual NUMA machines where we can't accurately detect the topology of the processor the data reported in the virNodeInfo structure is modified to correctly report the maximum number of processors in the host. The modification is done according to this documentation: nodes: the number of NUMA cell, 1 for unusual NUMA topologies or uniform memory access; check capabilities XML for the actual NUMA topology sockets: number of CPU sockets per node if nodes > 1, 1 in case of unusual NUMA topology cores: number of cores per socket, total number of processors in case of unusual NUMA topology threads: number of threads per core, 1 in case of unusual numa topology
Since patches in comment #7 is not included, no check on them. Peter emphasise what nodeinfo will act on unusual NUMA machines in comment #10, so the result in comment #9 is expected now. So, this is fixed. Also test on 1 usual NUMA box and an non-NUMA box, works fine.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0276.html