Bug 833425 - 3.1.z - vdsm cpuCores shows the wrong number of cores on multi node systems - AMD (Magny-Cours 61XX)
3.1.z - vdsm cpuCores shows the wrong number of cores on multi node systems -...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm (Show other bugs)
6.2
Unspecified Unspecified
urgent Severity medium
: rc
: ---
Assigned To: Douglas Schilling Landgraf
Ido Begun
infra sla
: Patch, ZStream
: 860507 866708 (view as bug list)
Depends On: 825095 864543 874050 877024
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-19 08:58 EDT by Amador Pahim
Modified: 2013-08-09 01:40 EDT (History)
23 users (show)

See Also:
Fixed In Version: vdsm-4.9.6-39.0
Doc Type: Release Note
Doc Text:
On systems with AMD Magny-Cours and Bulldozer CPUs, the number of CPU cores reported always includes hyperthreads. This allows virtual machines running on the host to use up to double the recommended number of virtual CPUs. Additionally, this issue may lead to biased scheduling to favor affected hosts over others in the cluster if all hosts do not have the same number and type of CPU.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-12-04 14:00:11 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Affected system /proc/cpuinfo file (41.92 KB, text/plain)
2012-06-19 13:10 EDT, Amador Pahim
no flags Details
amd-6172-cpuinfo (41.83 KB, text/plain)
2012-06-20 08:37 EDT, Qin Guan
no flags Details
6272 arch (54.19 KB, image/png)
2012-12-04 13:29 EST, Amador Pahim
no flags Details

  None (edit)
Description Amador Pahim 2012-06-19 08:58:50 EDT
Multi node systems has the same combination of "physical id" and "core id" to different physical cores.

Vdsm is considering more than one occurrence of same "physical id"/"core id" as the same core. This is useful in order to not account the Hyper Threading instances, but it is not working to the multi-nodes systems.

Example:

4 sockets AMD Opteron Processor 6164 HE multi node system. Notice that "physical id: 0" and "core id: 0" is repeated to processor 0 and 24. 

processor	: 0
physical id	: 0
siblings	: 12
core id		: 0
cpu cores	: 12

processor	: 24
physical id	: 0
siblings	: 12
core id		: 0
cpu cores	: 12

And it's not caused by multi threading, since cpu cores and siblings are identical 12 valued.

On this server, vdsm is accounting 24 cpuCores. The correct number is 48:

# vdsClient -s 0 getVdsCapabilities | grep cpuCores
        cpuCores = 24
Comment 1 Amador Pahim 2012-06-19 09:12:39 EDT
Patch: http://gerrit.ovirt.org/5481
Comment 3 Amador Pahim 2012-06-19 13:10:27 EDT
Created attachment 593015 [details]
Affected system /proc/cpuinfo file
Comment 4 Qin Guan 2012-06-20 08:36:12 EDT
The same problem found on AMD Opteron(tm) Processor 6172, which has 48 cores (4 sockets * 12 cores per sockets) but also be recognized as 24 cores by vdsm.
Comment 5 Qin Guan 2012-06-20 08:37:12 EDT
Created attachment 593196 [details]
amd-6172-cpuinfo
Comment 7 Barak 2012-07-03 10:26:04 EDT
(In reply to comment #0)
> Multi node systems has the same combination of "physical id" and "core id"
> to different physical cores.

does node above refers to numa-node ?
Comment 8 Amador Pahim 2012-07-03 10:40:30 EDT
Yes. lscpu from AMD Opteron Processor 6164 HE: 

# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                48
On-line CPU(s) list:   0-47
Thread(s) per core:    1
Core(s) per socket:    12
CPU socket(s):         4
NUMA node(s):          8
Vendor ID:             AuthenticAMD
CPU family:            16
Model:                 9
Stepping:              1
CPU MHz:               1700.038
BogoMIPS:              3400.05
Virtualization:        AMD-V
L1d cache:             64K
L1i cache:             64K
L2 cache:              512K
L3 cache:              5118K
NUMA node0 CPU(s):     0,4,8,12,16,20
NUMA node1 CPU(s):     24,28,32,36,40,44
NUMA node2 CPU(s):     1,5,9,13,17,21
NUMA node3 CPU(s):     25,29,33,37,41,45
NUMA node4 CPU(s):     2,6,10,14,18,22
NUMA node5 CPU(s):     26,30,34,38,42,46
NUMA node6 CPU(s):     3,7,11,15,19,23
NUMA node7 CPU(s):     27,31,35,39,43,47


Kernel people had a good discussion here:
http://kerneltrap.org/mailarchive/linux-kernel/2010/8/12/4606365
Comment 9 Barak 2012-07-17 09:43:33 EDT
I think it's fixed in libvirt already (BZ#836919).

it may be a duplicate.
Comment 10 Amador Pahim 2012-07-17 09:50:03 EDT
no, it isn't. vdsm cpuCores is not coming from libvirt.
Comment 11 Douglas Schilling Landgraf 2012-07-17 16:00:34 EDT
Hi Dan,

     Since we don't have needinfo flag in gerrit.. Can you please share your thoughts about my comment in http://gerrit.ovirt.org/#/c/5481/  ?

Thanks
Douglas
Comment 12 Douglas Schilling Landgraf 2012-08-02 15:17:01 EDT
Moving the bugzilla to POST since we have a patch available.
Comment 14 Douglas Schilling Landgraf 2012-08-15 23:00:33 EDT
New patch from Dan available, review in progress:
http://gerrit.ovirt.org/#/c/7097/
Comment 15 Douglas Schilling Landgraf 2012-08-17 09:12:59 EDT
Hi,

   Just to share that Amador's patch resolves the wrong number of cores on multi node systems. Dan's patch resolves the count of threads as cores again.

Review in progress for Amador's patch.

Cheers
Douglas
Comment 16 Douglas Schilling Landgraf 2012-09-06 12:05:27 EDT
For reference only, patch downstream: 
http://gerrit.usersys.redhat.com/#change,1397

Thanks
Douglas
Comment 19 Itamar Heim 2012-09-25 06:41:20 EDT
the current patches seem to resolve ad-hoc config, which is relevant to intel hyperthreading.
what about the amd multi node case which isn't about hyperthreading?
Comment 20 Douglas Schilling Landgraf 2012-09-25 10:08:21 EDT
> the current patches seem to resolve ad-hoc config, which is relevant to intel
>  hyperthreading.
> what about the amd multi node case which isn't about hyperthreading?

Yes, it works. Amador shared few versions with the same behavior (patchset 1 is  simple, others versions include libvirt approach and reading data from filesystem as well.
Comment 21 Amador Pahim 2012-09-25 10:54:56 EDT
The proposed patch http://gerrit.ovirt.org/#/c/5481/ solves multi numa issue regardless vendor and ht availability.
Comment 26 Amador Pahim 2012-10-01 16:42:41 EDT
Hi Douglas,

In order to go forth with this bz, checking libvirt 0.9 series, it has the same issue probing cpu topology as in vdsm. But 0.10 series is ok.

Tested system:
AMD Opteron(tm) Processor 6134
- 4 Sockets
- 8 Cores per Socket
- 1 Thread per Core
- Total processors = 32 

Instead of developing one CPU with 8 cores, this AMD Magny-Cours is actually two 4 core "Bulldozer" CPUs combined in to one "package". Each package has two NUMA nodes, and the two numa nodes share the same core ID set.

So, the expected result in libvirt "nodeinfo" AND "capabilities" would be like this:
- NUMA cells = 8
- Sockets per numa = 1
- Cores per socket = 4
- Threads per core = 1
- Total CPUS = 32

----------------------------------------
# rpm -qa libvirt
libvirt-0.9.11.5-3.fc17.x86_64

# virsh nodeinfo
CPU model:           x86_64
CPU(s):              32
CPU frequency:       2300 MHz
CPU socket(s):       4
Core(s) per socket:  4
Thread(s) per core:  1
NUMA cell(s):        1
Memory size:         65964292 kB

# virsh capabilities
snip...
      <topology sockets='4' cores='4' threads='1'/>
snip...
      <cells num='8'>
snip...
----------------------------------------

As we can see, nodeinfo and capabilities are completely out of sync each other. But newer libvirt seems to solve the issue:

----------------------------------------
# rpm -qa libvirt
libvirt-0.10.2-3.fc17.x86_64

# virsh nodeinfo
CPU model:           x86_64
CPU(s):              32
CPU frequency:       2300 MHz
CPU socket(s):       1
Core(s) per socket:  4
Thread(s) per core:  1
NUMA cell(s):        8
Memory size:         65964292 KiB

# virsh capabilities
snip...
      <topology sockets='1' cores='4' threads='1'/>
snip...
      <cells num='8'>
snip...
----------------------------------------

Now libvirt is coherent, showing 8 NUMA cells (two per package), with 1 quad core socket each.
As .spec upstream is requiring libvirt >= 0.10.1-1, I think we are now safe to use the libvirt API and solve this BZ. 
New Patch Set sent, now using libvirt "capabilities()", as pointed by http://www.redhat.com/archives/libvir-list/2010-November/msg01093.html
 
Regards,
Amador Pahim
Comment 27 Douglas Schilling Landgraf 2012-10-01 17:37:38 EDT
Hi Amador,

> Hi Douglas,
> 
> In order to go forth with this bz, checking libvirt 0.9 series, it has the 
> same issue probing cpu topology as in vdsm. But 0.10 series is ok.

Right. As we talked previously, looks like libvirt folks improved a lot this CPU/NUMA area in the last versions.

> As we can see, nodeinfo and capabilities are completely out of sync each
> other. But newer libvirt seems to solve the issue:
>
> ----------------------------------------
> # rpm -qa libvirt
> libvirt-0.10.2-3.fc17.x86_64
> 
> # virsh nodeinfo
> CPU model:           x86_64
> CPU(s):              32
> CPU frequency:       2300 MHz
> CPU socket(s):       1
> Core(s) per socket:  4
> Thread(s) per core:  1
> NUMA cell(s):        8
> Memory size:         65964292 KiB
>
> # virsh capabilities
> snip...
>      <topology sockets='1' cores='4' threads='1'/>
> snip...
>      <cells num='8'>
> snip...
> ----------------------------------------
> 
> Now libvirt is coherent, showing 8 NUMA cells (two per package), with 1 quad 
> core socket each.

That's make sense, thanks for sending the new version/tests. At this point, let's wait more people review your upstream patch. After that, if merged, will be required to handle that downstream with libvirt guys too.

Cheers
Douglas
Comment 29 Douglas Schilling Landgraf 2012-10-03 13:18:08 EDT
*** Bug 860507 has been marked as a duplicate of this bug. ***
Comment 33 Antoni Segura Puimedon 2012-10-11 04:57:53 EDT
Hi, I think I might have reproduced the bug with:

libvirt-0.9.10-21.el6_3.5.x86_64
vdsm change id: I1ebd7c424e03942d6a13fa1c993dec3e3e78c9ed


Oct 11 10:00:49 dev-09 vdsm vds ERROR Exception raised
 (most recent call last):
  File "/usr/share/vdsm/vdsm", line 82, in run
    serve_clients(log)n
  File "/usr/share/vdsm/vdsm", line 50, in serve_clients
    cif = clientIF.getInstance(log)
  File "/usr/share/vdsm/clientIF.py", line 126, in getInstance
    cls._instance = clientIF(log)
  File "/usr/share/vdsm/clientIF.py", line 93, in __init__
    caps.CpuTopology().cores())
  File "/usr/share/vdsm/caps.py", line 88, in __init__
    self._topology = _getCpuTopology(capabilities)
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 822, in __call__
    value = self.func(*args)
  File "/usr/share/vdsm/caps.py", line 116, in _getCpuTopology
    'sockets': int(cpu.getElementsByTagName('topology')[0].
  IndexError: list index out of range
Oct 11 10:00:53 dev-09 respawn: slave '/usr/share/vdsm/vdsm' died too quickly for more than 30 seconds, master sleeping for 900 seconds
Comment 34 Dan Kenigsberg 2012-10-11 06:32:20 EDT
(In reply to comment #33)
> Hi, I think I might have reproduced the bug with:

Toni, it would be more exact to say that the patch introduced a regression on your hardware, that for some reason is missing a element like
  <topology sockets='1' cores='4' threads='2'/>
in <capabilities><host><cpu>.

we should either handle the missing element somehow, or require a newer libvirt that reports it for your hardware.

Could you post some information about your /proc/cpuinfo (here, or in a libvirt bz)?
Comment 35 Antoni Segura Puimedon 2012-10-11 06:48:42 EDT
Dan, I just sorted it out, the culprit was a missing:
/usr/share/libvirt/cpu_map.xml

reinstalling libvirt-client fixed the issue for me.
Comment 36 Barak 2012-10-16 05:33:53 EDT
*** Bug 866708 has been marked as a duplicate of this bug. ***
Comment 42 Ido Begun 2012-11-14 05:10:51 EST
Tested this on SI24.1 a host with 2 AMD Opteron(TM) Processor 6272 CPU's (8 cores each, supports hyperthreading).

cpuinfo lists 32 CPU's as expected.

I got those results on libvirt-0.9.10-21.el6_3.5:
With report_host_threads_as_cores=false (default), vdsClient reported 64 CPU cores (instead of 16).
With report_host_threads_as_cores=true, vdsClient reported 32 cores (as expected).

However, when testing on libvirt-0.9.10-21.el6_3.6 (considering https://bugzilla.redhat.com/show_bug.cgi?id=869723), vdsClient reported 32 CPU cores on both cases.

Seeing as values are still off, moving this back to ASSIGNED.
Comment 44 Peter Krempa 2012-11-14 08:06:31 EST
The node info detection as of libvirt-0.9.10-21.el6_3.6 still doesn't work 100% OK on some machines (especially AMD Bulldozer that has "modules" that count both as cores _and_ threads. That throws off the detection so that we report the machine has 2x the number of CPUS. This issue is already fixed upstream:

https://bugzilla.redhat.com/show_bug.cgi?id=874050

With the fix libvirt checks if the topology that is detected is compatible with the number of CPUs the system has. If it isn't (that happens on AMD Bulldozer) libvirt returns a compatibility fallback topology and the actual NUMA topology has to be determined from the capabilities XML.
Comment 51 Rami Vaknin 2012-11-25 03:23:18 EST
The above scratch build seems to have a regression in comparison to libvirt-0.9.10-21.el6_3.6.x86_64, tested on AMD Opteron(TM) Processor 6272.

Original issue reproduced with libvirt-0.9.10-21.el6_3.6.x86_64, the amount of cpuCores reported by vdsm is:
report_host_threads_as_cores = false - 32 (should be 16)
report_host_threads_as_cores = true - 32


When using libvirt-0.9.10-21.el6_3.6bulldozer.x86_64, the amount of cpuCores reported by vdsm is:
report_host_threads_as_cores = false - 128 (instead of 16!!!)
report_host_threads_as_cores = true - 32


# virsh -r capabilities
<capabilities>

  <host>
    <uuid>74cabcdd-43c9-44ec-8452-d50d61ff028a</uuid>
    <cpu>
      <arch>x86_64</arch>
      <model>Opteron_G4</model>
      <vendor>AMD</vendor>
      <topology sockets='1' cores='32' threads='1'/>
      <feature name='nodeid_msr'/>
      <feature name='wdt'/>
      <feature name='skinit'/>
      <feature name='ibs'/>
      <feature name='osvw'/>
      <feature name='cr8legacy'/>
      <feature name='extapic'/>
      <feature name='cmp_legacy'/>
      <feature name='fxsr_opt'/>
      <feature name='mmxext'/>
      <feature name='osxsave'/>
      <feature name='monitor'/>
      <feature name='ht'/>
      <feature name='vme'/>
    </cpu>
    <power_management>
      <suspend_disk/>
    </power_management>
    <migration_features>
      <live/>
      <uri_transports>
        <uri_transport>tcp</uri_transport>
      </uri_transports>
    </migration_features>
    <topology>
      <cells num='4'>
        <cell id='0'>
          <cpus num='8'>
            <cpu id='0'/>
            <cpu id='1'/>
            <cpu id='2'/>
            <cpu id='3'/>
            <cpu id='4'/>
            <cpu id='5'/>
            <cpu id='6'/>
            <cpu id='7'/>
          </cpus>
        </cell>
        <cell id='1'>
          <cpus num='8'>
            <cpu id='8'/>
            <cpu id='9'/>
            <cpu id='10'/>
            <cpu id='11'/>
            <cpu id='12'/>
            <cpu id='13'/>
            <cpu id='14'/>
            <cpu id='15'/>
          </cpus>
        </cell>
        <cell id='2'>
          <cpus num='8'>
            <cpu id='16'/>
            <cpu id='17'/>
            <cpu id='18'/>
            <cpu id='19'/>
            <cpu id='20'/>
            <cpu id='21'/>
            <cpu id='22'/>
            <cpu id='23'/>
          </cpus>
        </cell>
        <cell id='3'>
          <cpus num='8'>
            <cpu id='24'/>
            <cpu id='25'/>
            <cpu id='26'/>
            <cpu id='27'/>
            <cpu id='28'/>
            <cpu id='29'/>
            <cpu id='30'/>
            <cpu id='31'/>
          </cpus>
        </cell>
      </cells>
    </topology>
  </host>

  <guest>
    <os_type>hvm</os_type>
    <arch name='i686'>
      <wordsize>32</wordsize>
      <emulator>/usr/libexec/qemu-kvm</emulator>
      <machine>rhel6.3.0</machine>
      <machine canonical='rhel6.3.0'>pc</machine>
      <machine>rhel6.2.0</machine>
      <machine>rhel6.1.0</machine>
      <machine>rhel6.0.0</machine>
      <machine>rhel5.5.0</machine>
      <machine>rhel5.4.4</machine>
      <machine>rhel5.4.0</machine>
      <domain type='qemu'>
      </domain>
      <domain type='kvm'>
        <emulator>/usr/libexec/qemu-kvm</emulator>
      </domain>
    </arch>
    <features>
      <cpuselection/>
      <deviceboot/>
      <pae/>
      <nonpae/>
      <acpi default='on' toggle='yes'/>
      <apic default='on' toggle='no'/>
    </features>
  </guest>

  <guest>
    <os_type>hvm</os_type>
    <arch name='x86_64'>
      <wordsize>64</wordsize>
      <emulator>/usr/libexec/qemu-kvm</emulator>
      <machine>rhel6.3.0</machine>
      <machine canonical='rhel6.3.0'>pc</machine>
      <machine>rhel6.2.0</machine>
      <machine>rhel6.1.0</machine>
      <machine>rhel6.0.0</machine>
      <machine>rhel5.5.0</machine>
      <machine>rhel5.4.4</machine>
      <machine>rhel5.4.0</machine>
      <domain type='qemu'>
      </domain>
      <domain type='kvm'>
        <emulator>/usr/libexec/qemu-kvm</emulator>
      </domain>
    </arch>
    <features>
      <cpuselection/>
      <deviceboot/>
      <acpi default='on' toggle='yes'/>
      <apic default='on' toggle='no'/>
    </features>
  </guest>

</capabilities>
Comment 52 Daniele 2012-11-27 10:31:16 EST
the customer is currently using this workaround:

ps -L -C qemu-kvm | awk '/qemu-kvm/ {print $2}'  | xargs -n1 taskset -c 
-p 0-7

This changes the affinity of every thread id belonging to qemu-kvm 
processes to use all 8 cores.


Is this actually relevant to this BZ?
Comment 53 Douglas Schilling Landgraf 2012-11-27 11:49:23 EST
Hi Peter,

   Could you please check Rami's comment #51?
   https://bugzilla.redhat.com/show_bug.cgi?id=833425#c51

Should we share the comment into https://bugzilla.redhat.com/show_bug.cgi?id=877024 as well?

Thanks
Douglas
Comment 55 Peter Krempa 2012-11-27 17:57:45 EST
In case of AMD Magny Cours (AMD Piledriver) the output of nodeinfo/capabilities is correct. The problem arises on AMD Bulldozer and it's new core/thread topology where one Bulldozer "module" is reported as both separate cores and separate threads. 

This confuses libvirt so that the output of the nodeinfo (previous to my patch) was actually twice the number of processors (when counted by multiplying all nodeinfo fields). I fixed this for Bulldozer and many other strange architectures so that it reports a synthetic (... or maybe it would be better to call it "made up") topology that has 1 NUMA node, 1 socket, 1 thread and the number of cores equals to the number of CPUs in the host. This is done as a final solution to every possible NUMA machine that we might come across (hopefully).

In this case, the only sane way how to detect the actual topology is to use output of capabilities where the CPUs are grouped into NUMA nodes by the actual topology. Unfortunately due to historic reasons we can't actually change nodeinfo to report better results. The ouput of capabilities contains valuable information that can be used to assign CPUs to guests in a way that won't hurt performance. This kind of information cannot be acquired from nodeinfo. Nodeinfo isn't really useful with the modern machines.

The data shown in comment#51 are the result of the synthetic topology reported on the AMD Bulldozer machine. The machine (previously to that patch) would report a topology of 4 nodes, 1 socket (per node), 8 cores (per socket) and 2 threads (per core). Multiplying the result would yield a result of 64 that wouldn't be correct. The corrected topology on the other hand is 1 node, 1 socket, 32 cores and 1 thread yielding the correct result of 32 cpus.
Comment 56 Douglas Schilling Landgraf 2012-11-28 09:32:02 EST
Hi Peter,

Thanks a lot for the clarification, we have two different reports here one using AMD processor 61XX family which describe this bugzilla  (tests in comment #54)
and other bug about AMD Bulldozer (comment #51). 

I do believe we should move this tests about AMD bulldozer to https://bugzilla.redhat.com/show_bug.cgi?id=877024. Agreed?

Thanks
Douglas
Comment 57 Douglas Schilling Landgraf 2012-11-30 16:48:45 EST
Hi,

   Just to clarify this long bugzilla, comments #42 and #51 are tests based on processor from 62XX family (Bulldozer) [1]. 

The original report from Amador is based on 61XX family (Magny-Cours), from comment #54 shows that we fixed the original report.I have changed the bugzilla subject to track down both cases.

Peter, could you please give additional info about the comment in #51?

Output capabilities #51 shows
================================================
<topology sockets='1' cores='32' threads='1'/>
...
<cells num='4'>
==================================================

Shouldn't be:
=================================================
<topology sockets='1' cores='4' threads='2'/>
...
<cells num='4'>
==================================================

Where:
- cell numa = 4
- sockets per numa = 1
- cores per numa = 4
- threads = 2 (1 physical core + 1 thread)

4 Numa * 1 Socket per Numa * 4 cores per Numa * 2 Threads = 32

Based on the above statement, here how VDSM shows report_host_threads_as_cores:

If disabled, calculates cells * sockets * core = 4 * 1 * 4 = 16. On the other hand, if enabled it takes: 
cells.getElementsByTagName('cpu').length


Thanks
Douglas

[1]
http://www.amd.com/us/press-releases/Pages/amd-opteron-6200-series-processor-family-wins-2012jan23.aspx
Comment 59 Amador Pahim 2012-12-04 13:28:47 EST
We are used to have one or more sockets inside one NUMA cell. But according with [1] (and confirmed by /sys fs), AMD 6200 series has two NUMA cells inside the same Socket.
As libvirt shows the Socket per NUMA instead of total sockets, the count here will be difficult. I think to have libvirt showing 0,5 sockets per NUMA is not reasonable. So, the statement on #57 seems the best way to represent 6200 topology: 

<topology sockets='1' cores='4' threads='2'/>
...
<cells num='4'>

I drew an image (attached) based on [1], [2] and "lscpu" info to clarify 6272 architecture.

[1] - http://www.redhat.com/archives/libvir-list/2012-May/msg00663.html
[2] - http://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)
Comment 60 Amador Pahim 2012-12-04 13:29:51 EST
Created attachment 657675 [details]
6272 arch
Comment 61 errata-xmlrpc 2012-12-04 14:00:11 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1508.html
Comment 62 Peter Krempa 2012-12-05 09:09:46 EST
The picture is correct, although each of the "threads" on the picture has separate core_IDs, so they technically count also as cores. 

From a management app point of view, the physical topology of the machine is irrelevant. What counts is the NUMA topology as that is what limits the memory bandwidth. Guests should be scheduled on cores within one NUMA node. The nodeinfo output in libvirt is limited due to historic reasons and it's basicaly usable just for determining the maximum number of CPUS in a system.
Comment 63 Douglas Schilling Landgraf 2012-12-05 14:45:53 EST
Hi Peter,

Thanks for your feedback.

From comment #62:

> The picture is correct, although each of the "threads" on the picture has 
> separate core_IDs, so they technically count also as cores. 

Understood, it will be like:
<topology sockets='1' cores='32' threads='1'/>

It's ok, but we have others system resources sharing a different output:

/proc/cpuinfo we have the split:

amd-dinar-07.lab.bos.redhat.com (Bulldozer machine):
==========================================================
<snip>
  cpu cores	: 8   (number of cores per CPU package)
  siblings	: 16  (HT per CPU package) * (number of cores per CPU package)
</snip>

Socket:
=====================
# cat /proc/cpuinfo | grep "physical id" | sort | uniq | wc -l
2

also from lscpu:
=========================
<snip>
  Thread(s) per core:    2   (core + thread)
  Core(s) per socket:    8
  CPU socket(s):         2
  On-line CPU(s) list:   0-31

  NUMA node(s):          4 
  NUMA node0 CPU(s):     0-7
  NUMA node1 CPU(s):     8-15
  NUMA node2 CPU(s):     16-23
  NUMA node3 CPU(s):     24-31
</snip>

From comment #62:
> output in libvirt is limited due to historic reasons and it's basicaly usable
> just for determining the maximum number of CPUS in a system.

To avoid change the historic output from libvirt, what about add into the xml output the 'total CPU sockets' like lscpu and /pro/cpuinfo does and leave the current libvirt as:

<topology sockets='1' cores='8' threads='2'/> (as upstream libvirt-1.0.0 too)
...
<cells num='4'>
==================================================

Would be like:
<topology totalsockets='2' sockets='1' cores='8' threads='2'/>
...
<cells num='4'>
==================================================

This will show the totalSockets = 2 and sockets per numa = 1 (as libvirt already shows)

Just to clarify our needs, vdsm gets the total sockets and total cores (without threads) from libvirt. To this system for example, we are looking for a way to have 2 sockets and 16 cores total from libvirt.

With that, we would report in vdsm the field 'report_host_threads_as_cores' as:

if enabled:
================
(cores = 8) * (threads = 2) * (new_libvirt_field_total_sockets = 2) = 32 total cores

if disabled:
===================
(cores = 8) * (new_libvirt_field_total_sockets = 2) = 16 total cores

Thanks

Note You need to log in before you can comment on or make changes to this bug.