Bug 888503 - libvirt: wrong cpu topology - AMD Bulldozer 62XX familly
Summary: libvirt: wrong cpu topology - AMD Bulldozer 62XX familly
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.4
Hardware: Unspecified
OS: Unspecified
Target Milestone: rc
: ---
Assignee: Peter Krempa
QA Contact: Rami Vaknin
Keywords: ZStream
Depends On:
Blocks: 908836
TreeView+ depends on / blocked
Reported: 2012-12-18 19:32 UTC by Douglas Schilling Landgraf
Modified: 2018-12-02 18:40 UTC (History)
17 users (show)

The AMD Bulldozer CPU architecture consists of so-called "modules". These are represented both as separate cores and separate threads. Management applications need to choose between one of the approaches. Libvirt wasn't providing enough information to do this.

Management applications weren't able to represent the modules in a bulldozer core according to their needs.

The capabilities XML output now contains more information about the processor topology so that the management apps can extract the information they need.
Clone Of:
Last Closed: 2013-11-21 08:35:37 UTC

Attachments (Terms of Use)
Test script (870 bytes, text/plain)
2013-01-31 14:46 UTC, Amador Pahim
no flags Details

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:1581 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2013-11-21 01:11:35 UTC
Red Hat Bugzilla 927128 None None None Never

Internal Trackers: 927128

Description Douglas Schilling Landgraf 2012-12-18 19:32:45 UTC
Description of problem:

VDSM uses libvirt cpu topology to determine the number of cores and threads. 
For AMD Bulldozer 62XX machines it doesn't look accurate.

To avoid change the historic output from libvirt, what about add into the xml output the 'total CPU sockets' like lscpu and /pro/cpuinfo does and leave the current libvirt as:

<topology sockets='1' cores='8' threads='2'/> (as upstream libvirt-1.0.0 shows)
<cells num='4'>

Would be like:
<topology totalsockets='2' sockets='1' cores='8' threads='2'/>
<cells num='4'>

This will show the totalSockets = 2 and sockets per numa = 1 (as libvirt already shows)

Just to clarify our needs, vdsm gets the total sockets and total cores (without threads) from libvirt. To this system for example, we are looking for a way to have 2 sockets and 16 cores total from libvirt.

With that, we would report in vdsm the field 'report_host_threads_as_cores' as:

if enabled:
(cores = 8) * (threads = 2) * (new_libvirt_field_total_sockets = 2) = 32 total cores

if disabled:
(cores = 8) * (new_libvirt_field_total_sockets = 2) = 16 total cores

we have others system resources sharing:
/proc/cpuinfo we have the split:

  cpu cores	: 8   (number of cores per CPU package)
  siblings	: 16  (HT per CPU package) * (number of cores per CPU package)

# cat /proc/cpuinfo | grep "physical id" | sort | uniq | wc -l

also from lscpu:
  Thread(s) per core:    2   (core + thread)
  Core(s) per socket:    8
  CPU socket(s):         2
  On-line CPU(s) list:   0-31

  NUMA node(s):          4 
  NUMA node0 CPU(s):     0-7
  NUMA node1 CPU(s):     8-15
  NUMA node2 CPU(s):     16-23
  NUMA node3 CPU(s):     24-31

Initial discussion started in BZ#833425.

Comment 3 Douglas Schilling Landgraf 2012-12-19 11:23:52 UTC
Hi Peter,

I do believe it will affect customers in 6.4. If possible, yes.
Amador, do you have any customer affected at moment for this processor family?


Comment 4 Amador Pahim 2012-12-19 13:07:17 UTC
Yes Douglas. I attached the cases I'm following so far.

Thank you.

Comment 5 Daniel Berrange 2013-01-02 15:08:30 UTC
> VDSM uses libvirt cpu topology to determine the number of cores and threads. 
> For AMD Bulldozer 62XX machines it doesn't look accurate.
> To avoid change the historic output from libvirt, what about add into the
> xml output the 'total CPU sockets' like lscpu and /pro/cpuinfo does and
> leave the current libvirt as:

Before we start discussing extensions to the libvirt XML, can you actually tell us what's wrong with the existing data. Please provide the current libvirt XML, a complete copy of the /proc/cpuinfo file, and the full output of 'numactl --hardware'.

Comment 6 Peter Krempa 2013-01-02 16:45:22 UTC
The root of the problem with the existing data is that VDSM is unable to tell the actual number of physical CPU sockets/packages in the host. This issue is visible on AMD Piledriver and AMD bulldozer hosts that have multiple NUMA nodes per physical socket. With this VDSM is unable to count the number of sockets as multiplying the "sockets" field by number of NUMA nodes yields incorrect count.

I'm not sure why is the number of actual CPU sockets that important, but in case the host has a strange NUMA arch, the numbers will be off.

Note: AMD Piledriver is two 6-core CPUs in one physical package.

Comment 18 Peter Krempa 2013-01-24 11:05:26 UTC
The NUMA topology data were added upstream by:

commit 79a003f9b0042ef4d2cf290e555364565b7bff42
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Fri Jan 18 23:06:55 2013 +0100

    capabilities: Add additional data to the NUMA topology info
    This patch adds data gathering to the NUMA gathering files and adds
    support for outputting the data. The test driver and xend driver need to
    be adapted to fill sensible data to the structure in a future patch.

commit 87b4c10c6cf02251dd8c29b5b895bebc6ec297f9
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Tue Jan 22 18:42:08 2013 +0100

    capabilities: Switch CPU data in NUMA topology to a struct
    This will allow storing additional topology data in the NUMA topology
    This patch changes the storage type and fixes fallout of the change
    across the drivers using it.
    This patch also changes semantics of adding new NUMA cell information.
    Until now the data were re-allocated and copied to the topology
    definition. This patch changes the addition function to steal the
    pointer to a pre-allocated structure to simplify the code.

commit 987fd7db4fc4ed8ff47339d440cdfb02ef1f0b58
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Fri Jan 18 20:39:00 2013 +0100

    conf: Split out NUMA topology formatting to simplify access to data

commit 828820e2d371205d6a6061301165d58a1a92e611
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Fri Jan 18 19:30:00 2013 +0100

    schemas: Add schemas for more CPU topology information in the caps XML
    This patch adds RNG schemas for adding more information in the topology
    output of the NUMA section in the capabilities XML.
    The added elements are designed to provide more information about the
    placement and topology of the processors in the system to management
    A demonstration of supported XML added by this patch:
          <cells num='3'>
            <cell id='0'>
              <cpus num='4'> <!-- this is node with Hyperthreading -->
                <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/>
                <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/>
                <cpu id='2' socket_id='0' core_id='1' siblings='2-3'/>
                <cpu id='3' socket_id='0' core_id='1' siblings='2-3'/>
            <cell id='1'>
              <cpus num='4'> <!-- this is node with modules (Bulldozer) -->
                <cpu id='4' socket_id='0' core_id='2' siblings='4-5'/>
                <cpu id='5' socket_id='0' core_id='3' siblings='4-5'/>
                <cpu id='6' socket_id='0' core_id='4' siblings='6-7'/>
                <cpu id='7' socket_id='0' core_id='5' siblings='6-7'/>
            <cell id='2'>
              <cpus num='4'> <!-- this is a normal multi-core node -->
                <cpu id='8' socket_id='1' core_id='0' siblings='8'/>
                <cpu id='9' socket_id='1' core_id='1' siblings='9'/>
                <cpu id='10' socket_id='1' core_id='2' siblings='10'/>
                <cpu id='11' socket_id='1' core_id='3' siblings='11'/>
    The socket_id field represents identification of the physical socket the
    CPU is plugged in. This ID may not be identical to the physical socket
    ID reported by the kernel.
    The core_id identifies a core within a socket. Also this field may not
    accurately represent physical ID's.
    The core_id is guaranteed to be unique within a cell and a socket. There
    may be duplicates between sockets. Only cores sharing core_id within one
    cell and one socket can be considered as threads. Cores sharing core_id
    within sparate cells are distinct cores.
    The siblings field is a list of CPU id's the cpu id's the CPU is sibling
    with - thus a thread. The list is in the cpuset format.

Moving to POST for 6.5

Comment 19 Amador Pahim 2013-01-31 14:46:30 UTC
Created attachment 690965 [details]
Test script

Comment 20 Amador Pahim 2013-01-31 14:47:55 UTC
I tried to reproduce a funky NUMA distribution I had to deal some days ago (with memory banks distributed in a bad way).

I started a VM with:

.. -smp 32,sockets=4,cores=4,threads=2 -numa node,nodeid=0,cpus=0-23 -numa node,nodeid=1,cpus=24-31 -numa node,nodeid=2 -numa node,nodeid=3 ...


# numactl --hardware
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
node 0 size: 1247 MB
node 0 free: 927 MB
node 1 cpus: 24 25 26 27 28 29 30 31
node 1 size: 1247 MB
node 1 free: 1118 MB
node 2 cpus:
node 2 size: 1248 MB
node 2 free: 1215 MB
node 3 cpus:
node 3 size: 1256 MB
node 3 free: 1221 MB

Libvirt is in fallback probe:

# virsh nodeinfo
CPU socket(s):       1
Core(s) per socket:  32
Thread(s) per core:  1
NUMA cell(s):        1

New capabilities working as expected:

      <cells num='4'>
        <cell id='0'>
          <cpus num='24'>
            <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/>
            <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/>
            <cpu id='2' socket_id='0' core_id='1' siblings='2-3'/>
            <cpu id='3' socket_id='0' core_id='1' siblings='2-3'/>
            <cpu id='4' socket_id='0' core_id='2' siblings='4-5'/>
            <cpu id='5' socket_id='0' core_id='2' siblings='4-5'/>
            <cpu id='6' socket_id='0' core_id='3' siblings='6-7'/>
            <cpu id='7' socket_id='0' core_id='3' siblings='6-7'/>
            <cpu id='8' socket_id='1' core_id='0' siblings='8-9'/>
            <cpu id='9' socket_id='1' core_id='0' siblings='8-9'/>
            <cpu id='10' socket_id='1' core_id='1' siblings='10-11'/>
            <cpu id='11' socket_id='1' core_id='1' siblings='10-11'/>
            <cpu id='12' socket_id='1' core_id='2' siblings='12-13'/>
            <cpu id='13' socket_id='1' core_id='2' siblings='12-13'/>
            <cpu id='14' socket_id='1' core_id='3' siblings='14-15'/>
            <cpu id='15' socket_id='1' core_id='3' siblings='14-15'/>
            <cpu id='16' socket_id='2' core_id='0' siblings='16-17'/>
            <cpu id='17' socket_id='2' core_id='0' siblings='16-17'/>
            <cpu id='18' socket_id='2' core_id='1' siblings='18-19'/>
            <cpu id='19' socket_id='2' core_id='1' siblings='18-19'/>
            <cpu id='20' socket_id='2' core_id='2' siblings='20-21'/>
            <cpu id='21' socket_id='2' core_id='2' siblings='20-21'/>
            <cpu id='22' socket_id='2' core_id='3' siblings='22-23'/>
            <cpu id='23' socket_id='2' core_id='3' siblings='22-23'/>
        <cell id='1'>
          <cpus num='8'>
            <cpu id='24' socket_id='3' core_id='0' siblings='24-25'/>
            <cpu id='25' socket_id='3' core_id='0' siblings='24-25'/>
            <cpu id='26' socket_id='3' core_id='1' siblings='26-27'/>
            <cpu id='27' socket_id='3' core_id='1' siblings='26-27'/>
            <cpu id='28' socket_id='3' core_id='2' siblings='28-29'/>
            <cpu id='29' socket_id='3' core_id='2' siblings='28-29'/>
            <cpu id='30' socket_id='3' core_id='3' siblings='30-31'/>
            <cpu id='31' socket_id='3' core_id='3' siblings='30-31'/>
        <cell id='2'>
          <cpus num='0'>
        <cell id='3'>
          <cpus num='0'>

And my script (attached) returning reasonable results:

Sockets: 4
Cores: 16
Threads: 32

Thank you Peter. Great work here.

Amador Pahim

Comment 25 Wayne Sun 2013-07-10 07:23:53 UTC

On a host with AMD bulldozer cpu:
# cat /proc/cpuinfo |grep "model name"|head -1
model name	: AMD Opteron(tm) Processor 6282 SE  

# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                64
On-line CPU(s) list:   0-63
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             4
NUMA node(s):          8
Vendor ID:             AuthenticAMD
CPU family:            21
Model:                 1
Stepping:              2
CPU MHz:               2593.501
BogoMIPS:              5186.42
Virtualization:        AMD-V
L1d cache:             16K
L1i cache:             64K
L2 cache:              2048K
L3 cache:              6144K
NUMA node0 CPU(s):     0,4,8,12,16,20,24,28
NUMA node1 CPU(s):     32,36,40,44,48,52,56,60
NUMA node2 CPU(s):     1,5,9,13,17,21,25,29
NUMA node3 CPU(s):     33,37,41,45,49,53,57,61
NUMA node4 CPU(s):     2,6,10,14,18,22,26,30
NUMA node5 CPU(s):     34,38,42,46,50,54,58,62
NUMA node6 CPU(s):     35,39,43,47,51,55,59,63
NUMA node7 CPU(s):     3,7,11,15,19,23,27,31

# numactl --hardware
available: 8 nodes (0-7)
node 0 cpus: 0 4 8 12 16 20 24 28
node 0 size: 16349 MB
node 0 free: 15482 MB
node 1 cpus: 32 36 40 44 48 52 56 60
node 1 size: 16384 MB
node 1 free: 15140 MB
node 2 cpus: 1 5 9 13 17 21 25 29
node 2 size: 16384 MB
node 2 free: 15933 MB
node 3 cpus: 33 37 41 45 49 53 57 61
node 3 size: 16384 MB
node 3 free: 15872 MB
node 4 cpus: 2 6 10 14 18 22 26 30
node 4 size: 16384 MB
node 4 free: 15672 MB
node 5 cpus: 34 38 42 46 50 54 58 62
node 5 size: 16384 MB
node 5 free: 15912 MB
node 6 cpus: 35 39 43 47 51 55 59 63
node 6 size: 16384 MB
node 6 free: 15806 MB
node 7 cpus: 3 7 11 15 19 23 27 31
node 7 size: 16367 MB
node 7 free: 15894 MB
node distances:
node   0   1   2   3   4   5   6   7 
  0:  10  16  16  22  16  16  22  22 
  1:  16  10  16  22  22  22  16  22 
  2:  16  16  10  16  22  22  22  16 
  3:  22  22  16  10  22  16  22  16 
  4:  16  22  22  22  10  16  16  16 
  5:  16  22  22  16  16  10  22  22 
  6:  22  16  22  22  16  22  10  16 
  7:  22  22  16  16  16  22  16  10 

# virsh nodeinfo
CPU model:           x86_64
CPU(s):              64
CPU frequency:       2593 MHz
CPU socket(s):       1
Core(s) per socket:  64
Thread(s) per core:  1
NUMA cell(s):        1
Memory size:         132035588 KiB

1. check capabilities
# virsh capabilities
      <cells num='8'>
        <cell id='0'>
          <cpus num='8'>
            <cpu id='0' socket_id='0' core_id='0' siblings='0,4'/>
            <cpu id='4' socket_id='0' core_id='1' siblings='0,4'/>
            <cpu id='8' socket_id='0' core_id='2' siblings='8,12'/>
            <cpu id='12' socket_id='0' core_id='3' siblings='8,12'/>
            <cpu id='16' socket_id='0' core_id='4' siblings='16,20'/>
            <cpu id='20' socket_id='0' core_id='5' siblings='16,20'/>
            <cpu id='24' socket_id='0' core_id='6' siblings='24,28'/>
            <cpu id='28' socket_id='0' core_id='7' siblings='24,28'/>
        <cell id='1'>
          <cpus num='8'>
            <cpu id='32' socket_id='0' core_id='0' siblings='32,36'/>
            <cpu id='36' socket_id='0' core_id='1' siblings='32,36'/>
            <cpu id='40' socket_id='0' core_id='2' siblings='40,44'/>
            <cpu id='44' socket_id='0' core_id='3' siblings='40,44'/>
            <cpu id='48' socket_id='0' core_id='4' siblings='48,52'/>
            <cpu id='52' socket_id='0' core_id='5' siblings='48,52'/>
            <cpu id='56' socket_id='0' core_id='6' siblings='56,60'/>
            <cpu id='60' socket_id='0' core_id='7' siblings='56,60'/>
        <cell id='2'>
          <cpus num='8'>
            <cpu id='1' socket_id='1' core_id='0' siblings='1,5'/>
            <cpu id='5' socket_id='1' core_id='1' siblings='1,5'/>
            <cpu id='9' socket_id='1' core_id='2' siblings='9,13'/>
            <cpu id='13' socket_id='1' core_id='3' siblings='9,13'/>
            <cpu id='17' socket_id='1' core_id='4' siblings='17,21'/>
            <cpu id='21' socket_id='1' core_id='5' siblings='17,21'/>
            <cpu id='25' socket_id='1' core_id='6' siblings='25,29'/>
            <cpu id='29' socket_id='1' core_id='7' siblings='25,29'/>
        <cell id='3'>
          <cpus num='8'>
            <cpu id='33' socket_id='1' core_id='0' siblings='33,37'/>
            <cpu id='37' socket_id='1' core_id='1' siblings='33,37'/>
            <cpu id='41' socket_id='1' core_id='2' siblings='41,45'/>
            <cpu id='45' socket_id='1' core_id='3' siblings='41,45'/>
            <cpu id='49' socket_id='1' core_id='4' siblings='49,53'/>
            <cpu id='53' socket_id='1' core_id='5' siblings='49,53'/>
            <cpu id='57' socket_id='1' core_id='6' siblings='57,61'/>
            <cpu id='61' socket_id='1' core_id='7' siblings='57,61'/>
        <cell id='4'>
          <cpus num='8'>
            <cpu id='2' socket_id='2' core_id='0' siblings='2,6'/>
            <cpu id='6' socket_id='2' core_id='1' siblings='2,6'/>
            <cpu id='10' socket_id='2' core_id='2' siblings='10,14'/>
            <cpu id='14' socket_id='2' core_id='3' siblings='10,14'/>
            <cpu id='18' socket_id='2' core_id='4' siblings='18,22'/>
            <cpu id='22' socket_id='2' core_id='5' siblings='18,22'/>
            <cpu id='26' socket_id='2' core_id='6' siblings='26,30'/>
            <cpu id='30' socket_id='2' core_id='7' siblings='26,30'/>
        <cell id='5'>
          <cpus num='8'>
            <cpu id='34' socket_id='2' core_id='0' siblings='34,38'/>
            <cpu id='38' socket_id='2' core_id='1' siblings='34,38'/>
            <cpu id='42' socket_id='2' core_id='2' siblings='42,46'/>
            <cpu id='46' socket_id='2' core_id='3' siblings='42,46'/>
            <cpu id='50' socket_id='2' core_id='4' siblings='50,54'/>
            <cpu id='54' socket_id='2' core_id='5' siblings='50,54'/>
            <cpu id='58' socket_id='2' core_id='6' siblings='58,62'/>
            <cpu id='62' socket_id='2' core_id='7' siblings='58,62'/>
        <cell id='6'>
          <cpus num='8'>
            <cpu id='35' socket_id='3' core_id='0' siblings='35,39'/>
            <cpu id='39' socket_id='3' core_id='1' siblings='35,39'/>
            <cpu id='43' socket_id='3' core_id='2' siblings='43,47'/>
            <cpu id='47' socket_id='3' core_id='3' siblings='43,47'/>
            <cpu id='51' socket_id='3' core_id='4' siblings='51,55'/>
            <cpu id='55' socket_id='3' core_id='5' siblings='51,55'/>
            <cpu id='59' socket_id='3' core_id='6' siblings='59,63'/>
            <cpu id='63' socket_id='3' core_id='7' siblings='59,63'/>
        <cell id='7'>
          <cpus num='8'>
            <cpu id='3' socket_id='3' core_id='0' siblings='3,7'/>
            <cpu id='7' socket_id='3' core_id='1' siblings='3,7'/>
            <cpu id='11' socket_id='3' core_id='2' siblings='11,15'/>
            <cpu id='15' socket_id='3' core_id='3' siblings='11,15'/>
            <cpu id='19' socket_id='3' core_id='4' siblings='19,23'/>
            <cpu id='23' socket_id='3' core_id='5' siblings='19,23'/>
            <cpu id='27' socket_id='3' core_id='6' siblings='27,31'/>
            <cpu id='31' socket_id='3' core_id='7' siblings='27,31'/>

socket_id is from 0 to 3 and siblings have 2 which match with socket and threads in lscpu output.

2. using attached python script
# virsh capabilities > capabilities.xml
# python with_new_caps_pythonic.py 
Sockets: 4
Cores: 32
Threads: 64

this is expected.

Hi Amador,

the machine you used in comment #20 we call it sparse NUMA box, is this machine happen to be in beaker with internal access?

Comment 26 Amador Pahim 2013-07-10 11:47:17 UTC
Hi Wayne,

The machine is not accessible, but the sparse NUMA box could be reproduced with qemu using something like this:

/usr/libexec/qemu-kvm -m 4096 -smp 32,sockets=4,cores=4,threads=2 -numa node,nodeid=0,cpus=0-23 -numa node,nodeid=1,cpus=24-31 -numa node,nodeid=2 -numa node,nodeid=3 /var/lib/vms/vm01.img

Comment 27 Wayne Sun 2013-07-15 09:10:47 UTC
(In reply to Amador Pahim from comment #26)
> Hi Wayne,
> The machine is not accessible, but the sparse NUMA box could be reproduced
> with qemu using something like this:
> /usr/libexec/qemu-kvm -m 4096 -smp 32,sockets=4,cores=4,threads=2 -numa
> node,nodeid=0,cpus=0-23 -numa node,nodeid=1,cpus=24-31 -numa node,nodeid=2
> -numa node,nodeid=3 /var/lib/vms/vm01.img

Hi Amador,

thx for reply, are you suggesting to test sparse NUMA inside a qemu-kvm vm?
I did start a vm with sparse NUMA, but problem is nested kvm is not supported in rhel7 yet, so the virsh cmd of nodeinfo and capabilities will fail of can't find hypervisor.
And I'm also not sure guest numa topo is fully supported now, as my test with numactl in vm did not output the right info as I passed to qemu.
Anyway, this bug is verified on a physical host. Thanks for help

Comment 29 errata-xmlrpc 2013-11-21 08:35:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.