Bug 653293

Summary:	virsh vcpuinfo' doesn't report the number of CPU core of 'CPU Affinity' correctly on host with NUMA
Product:	Red Hat Enterprise Linux 6	Reporter:	Mark Wu <dwu>
Component:	libvirt	Assignee:	Jiri Denemark <jdenemar>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	6.0	CC:	berrange, dallan, eblake, esammons, gerrit.slomma, mjenner, moshiro, mzhan, vbian, xen-maint, yoyzhang
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	libvirt-0.8.7-1.el6	Doc Type:	Bug Fix
Doc Text:	When running "virsh vcpuinfo" or setting up virtual CPU pinning on a host machine that used NUMA, "virsh vcpuinfo" showed the incorrect number of virtual CPUs. Virtual CPU pinning could also fail because libvirt reported an incorrect number of CPU sockets per NUMA node. Virtual CPUs are now counted correctly.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-05-19 13:24:01 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Mark Wu 2010-11-15 08:31:58 UTC

Description of problem:
On a NUMA server, the command "virsh vcpuinfo <domain-id>" returns incorrect number of processors.


Version-Release number of selected component (if applicable):
libvirt-client-0.8.1-27.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
On the server
[root@hp-dl385g2-1 ~]# cat /proc/cpuinfo | grep processor
processor	: 0
processor	: 1
processor	: 2
processor	: 3

[root@hp-dl385g2-1 ~]# virsh vcpuinfo rhca_template
VCPU:           0
CPU:            0
State:          running
CPU time:       718.9s
CPU Affinity:   y--y----

VCPU:           1
CPU:            0
State:          running
CPU time:       625.7s
CPU Affinity:   y--y----

VCPU:           2
CPU:            0
State:          running
CPU time:       635.2s
CPU Affinity:   y--y----

  
Actual results:

So its clear that even though there are only 4 processors, virsh cpuinfo suggests that there are 8.


Expected results:
On a NUMA server, the command
virsh vcpuinfo <domain-id>
should return correct number of processors.

Additional info:
[root@hp-dl385g2-1 ~]# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 2
node 0 size: 4093 MB
node 0 free: 3261 MB
node 1 cpus: 1 3
node 1 size: 4095 MB
node 1 free: 3743 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10

Comment 2 Mark Wu 2010-11-15 08:40:15 UTC

This problem is caused by how the macro  VIR_NODEINFO_MAXCPUS calculate total cpu numbers:

/**
 * VIR_NODEINFO_MAXCPUS:
 * @nodeinfo: virNodeInfo instance
 *
 * This macro is to calculate the total number of CPUs supported
 * but not necessary active in the host.
 */


#define VIR_NODEINFO_MAXCPUS(nodeinfo) ((nodeinfo).nodes*(nodeinfo).sockets*(nodeinfo).cores*(nodeinfo).threads)

In this case, nodes=2, sockets=2, cores=2, threads=1, so virsh think it has 8 the maximum total number is 8.  But I think socket*cores*threads can give us the maximum of total cpus, so it needn't multiplied by the number of numa nodes.

Comment 3 Daniel Berrangé 2010-11-15 10:16:37 UTC

Sockets it gives the number of sockets per node, so it *does* need to be multiplied by nodes. More likely is that the QEMU driver is reporting the wrong value for 'sockets'.

Comment 4 Mark Wu 2010-11-16 01:06:27 UTC

Daniel,
My understanding is the number of sockets kept in nodeinfo is the total number of the whole system, not just one node, because it traverse all cpus under /sys/devices/system/cpu directory when populating the nodeinfo. 
<snip>
        static int parse_socket(unsigned int cpu)
        {
            return get_cpu_value(cpu, "topology/physical_package_id", false);
        }
        ...

        socket = parse_socket(cpu);
        if (socket < 0) {
            closedir(cpudir);
            return -1;
        }
        if (!(socket_mask & (1 << socket))) {
            socket_mask |= (1 << socket);
            nodeinfo->sockets++;
        }
</snip>
And in this case, the number of nodes is 2 which can be verified by the output of 'numactl -H'. And according to cpuinfo, the number of sockets and cores are also correct.

$ egrep "^processor|^physical|^siblings|^core id|^cpu cores|^$" proc/cpuinfo 
processor	: 0
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2

processor	: 1
physical id	: 0
siblings	: 2
core id		: 2
cpu cores	: 2

processor	: 2
physical id	: 1
siblings	: 2
core id		: 0
cpu cores	: 2

processor	: 3
physical id	: 1
siblings	: 2
core id		: 2
cpu cores	: 2

So my question is what the structure nodeinfo keep track of, one NUMA node or the whole system?  Thanks.

Comment 5 Daniel Berrangé 2010-11-16 10:25:10 UTC

No, as stated above, the API requirements for the 'sockets' field are that it represents 'sockets per node'. These semantics were defined when we wrote the Xen driver. If the QEMU driver code you quote is not complying with that, it is broken & needs fixing.   You can confirm this by booting RHEL5 Xen on the machine and querying 'virsh nodeinfo'. The RHEL6 QEMU driver 'virsh nodeinfo' output must match whatever the RHEL5 Xen driver generated on the same hardware.

Comment 6 Jiri Denemark 2011-01-14 12:59:53 UTC

This was fixed by v0.8.5-186-gac9dd4a:

commit ac9dd4a676f21b5e3ca6dbe0526f2a6709072beb
Author: Jiri Denemark <jdenemar>
Date:   Wed Nov 24 11:25:19 2010 +0100

    Fix host CPU counting on unusual NUMA topologies
    
    The nodeinfo structure includes
    
        nodes   : the number of NUMA cell, 1 for uniform mem access
        sockets : number of CPU socket per node
        cores   : number of core per socket
        threads : number of threads per core
    
    which does not work well for NUMA topologies where each node does not
    consist of integral number of CPU sockets.
    
    We also have VIR_NODEINFO_MAXCPUS macro in public libvirt.h which
    computes maximum number of CPUs as (nodes * sockets * cores * threads).
    
    As a result, we can't just change sockets to report total number of
    sockets instead of sockets per node. This would probably be the easiest
    since I doubt anyone is using the field directly. But because of the
    macro, some apps might be using sockets indirectly.
    
    This patch leaves sockets to be the number of CPU sockets per node (and
    fixes qemu driver to comply with this) on machines where sockets can be
    divided by nodes. If we can't divide sockets by nodes, we behave as if
    there was just one NUMA node containing all sockets. Apps interested in
    NUMA should consult capabilities XML, which is what they probably do
    anyway.
    
    This way, the only case in which apps that care about NUMA may break is
    on machines with funky NUMA topology. And there is a chance libvirt
    wasn't able to start any guests on those machines anyway (although it
    depends on the topology, total number of CPUs and kernel version).
    Nothing changes at all for apps that don't care about NUMA.

Comment 12 Vivian Bian 2011-02-15 11:03:56 UTC

# cat /proc/cpuinfo |grep processor
processor	: 0
processor	: 1
processor	: 2
processor	: 3
processor	: 4
processor	: 5
processor	: 6
processor	: 7
processor	: 8
processor	: 9
processor	: 10
processor	: 11

# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10
node 0 size: 16383 MB
node 0 free: 13607 MB
node 1 cpus: 1 3 5 7 9 11
node 1 size: 16384 MB
node 1 free: 14256 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 


[reproducer]
libvirt-client-0.8.1-27.el6.x86_64

# virsh vcpuinfo vr-rhel5u5-x86_64-kvm
VCPU:           0
CPU:            0
State:          running
CPU time:       2808.1s
CPU Affinity:   yyyyyyyyyyyy------------

[verification]
libvirt-client-0.8.7-6.el6.x86_64

# virsh vcpuinfo vr-rhel5u5-x86_64-kvm
VCPU:           0
CPU:            0
State:          running
CPU time:       2808.1s
CPU Affinity:   yyyyyyyyyyyy

on the NUMA machine with 12 processors , now it displays exactly 12 processor instead of previous 24 processors. 

So set bug status to VERIFIED

Comment 14 Jiri Denemark 2011-02-17 08:51:31 UTC

(In reply to comment #13)
> However, please compare the following report of "virsh nodeinfo".
> 
> [Before update]
> CPU(s):              4
> CPU socket(s):       2
> Core(s) per socket:  2
> Thread(s) per core:  1
> NUMA cell(s):        2
> 
> [After update]
> CPU(s):              4
> CPU socket(s):       1
> Core(s) per socket:  2
> Thread(s) per core:  1
> NUMA cell(s):        2
> 
> <The "CPU socket(s)" is changed from 2 to 1 after updating to your test
> package.>
> 
> Why does the value of "CPU socket(s)" change?
> The behavior is incorrect because we don't change a setting of cpu on guest
> domain.

That is actually the bug that was fixed. The values reported before were
incorrect. It's because (as can be seen in nodeinfo documentation) "CPU
socket(s)" means number of sockets per NUMA cell. So currently you have 2
cells with 1 dual-core socket in each of them, which gives the total of 4 CPUs
and that's correct. Before the fix the nodeinfo output suggested that you had
2 cells with 2 dual-core sockets in each of them, which resulted in 8 CPUs
although you only had 4.

Comment #6 contains more background info about this issue.

Comment 17 Eric Blake 2011-04-15 22:44:53 UTC

*** Bug 694539 has been marked as a duplicate of this bug. ***

Comment 18 Jiri Denemark 2011-05-03 12:20:59 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause
    running virsh vcpuinfo or setting up vCPU pinning on NUMA machines

Consequence
    virsh vcpuinfo shows incorrect number of vCPUs and vCPU pinning may fail
    because libvirt reports wrong number of CPU sockets per NUMA node

Fix
    virsh nodeinfo (and corresponding libvirt API) reports correct number of
    CPU sockets per NUMA node instead of total number of sockets

Result
    virsh vcpuinfo shows correct number of vCPUs

Comment 21 Laura Bailey 2011-05-04 05:08:47 UTC

    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,13 +1 @@
-Cause
+When running "virsh vcpuinfo" or setting up virtual CPU pinning on a host machine that used NUMA, "virsh vcpuinfo" showed the incorrect number of virtual CPUs. Virtual CPU pinning could also fail because libvirt reported an incorrect number of CPU sockets per NUMA node. Virtual CPUs are now counted correctly.-    running virsh vcpuinfo or setting up vCPU pinning on NUMA machines
-
-Consequence
-    virsh vcpuinfo shows incorrect number of vCPUs and vCPU pinning may fail
-    because libvirt reports wrong number of CPU sockets per NUMA node
-
-Fix
-    virsh nodeinfo (and corresponding libvirt API) reports correct number of
-    CPU sockets per NUMA node instead of total number of sockets
-
-Result
-    virsh vcpuinfo shows correct number of vCPUs

Comment 22 errata-xmlrpc 2011-05-19 13:24:01 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html