Bug 1192360

Summary:	[ppc] virsh nodeinfo show the wrong cpu cores and numa cells
Product:	Red Hat Enterprise Linux 7	Reporter:	Luyao Huang <lhuang>
Component:	libvirt	Assignee:	Andrea Bolognani <abologna>
Status:	CLOSED NOTABUG	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.1	CC:	dgibson, dyuan, michen, mzhan, ngu, pkrempa, rbalakri, weizhan, xuhan, ypu
Target Milestone:	rc	Keywords:	Reopened
Target Release:	---
Hardware:	ppc64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-04-28 08:26:05 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Luyao Huang 2015-02-13 09:05:14 UTC

Description of problem:
virsh nodeinfo show the wrong cpu cores and numa cells

Version-Release number of selected component (if applicable):
For PowerKVM:
libvirt-1.2.5-1.1.pkvm2_1_1.20.33.ppc64
For RHEL7:
libvirt-1.2.8-16.el7.ppc64

How reproducible:
100%

Steps to Reproduce:
1.
# numactl --hard
available: 4 nodes (0-1,16-17)
node 0 cpus: 0 8 16 24 32
node 0 size: 65536 MB
node 0 free: 61255 MB
node 1 cpus: 40 48 56 64 72
node 1 size: 65536 MB
node 1 free: 62835 MB
node 16 cpus: 80 88 96 104 112
node 16 size: 65536 MB
node 16 free: 63189 MB
node 17 cpus: 120 128 136 144 152
node 17 size: 65536 MB
node 17 free: 62661 MB
node distances:
node   0   1  16  17 
  0:  10  20  40  40 
  1:  20  10  40  40 
 16:  40  40  10  20 
 17:  40  40  20  10 

2.# ppc64_cpu --info
Core   0:    0*    1     2     3     4     5     6     7  
Core   1:    8*    9    10    11    12    13    14    15  
Core   2:   16*   17    18    19    20    21    22    23  
Core   3:   24*   25    26    27    28    29    30    31  
Core   4:   32*   33    34    35    36    37    38    39  
Core   5:   40*   41    42    43    44    45    46    47  
Core   6:   48*   49    50    51    52    53    54    55  
Core   7:   56*   57    58    59    60    61    62    63  
Core   8:   64*   65    66    67    68    69    70    71  
Core   9:   72*   73    74    75    76    77    78    79  
Core  10:   80*   81    82    83    84    85    86    87  
Core  11:   88*   89    90    91    92    93    94    95  
Core  12:   96*   97    98    99   100   101   102   103  
Core  13:  104*  105   106   107   108   109   110   111  
Core  14:  112*  113   114   115   116   117   118   119  
Core  15:  120*  121   122   123   124   125   126   127  
Core  16:  128*  129   130   131   132   133   134   135  
Core  17:  136*  137   138   139   140   141   142   143  
Core  18:  144*  145   146   147   148   149   150   151  
Core  19:  152*  153   154   155   156   157   158   159 

3.# virsh nodeinfo
CPU model:           ppc64
CPU(s):              20
CPU frequency:       2061 MHz
CPU socket(s):       1
Core(s) per socket:  160
Thread(s) per core:  1
NUMA cell(s):        1
Memory size:         267652032 KiB


Actual results:

libvirt show the wrong information in this place:
CPU socket(s):       1
Core(s) per socket:  160
Thread(s) per core:  1
NUMA cell(s):        1

Expected results:
should show a more correct information

Additional info:

Comment 1 Andrea Bolognani 2015-04-24 13:10:39 UTC

What you're seeing, while confusing, is actually the expected and documented
behavior.

From <libvirt/libvirt-host.h>:

  struct _virNodeInfo {
    /* [...] */
    unsigned int nodes;   /* the number of NUMA cell, 1 for unusual NUMA
                             topologies or uniform memory access; check
                             capabilities XML for the actual NUMA topology */
    unsigned int sockets; /* number of CPU sockets per node if nodes > 1,
                             1 in case of unusual NUMA topology */
    unsigned int cores;   /* number of cores per socket, total number of
                             processors in case of unusual NUMA topology*/
    unsigned int threads; /* number of threads per core, 1 in case of
                             unusual numa topology */
  };

Here "unusual NUMA topology" really means any situation where the Linux kernel
is not exposing enough information for libvirt to figure out the complete NUMA
topology, which happens not only on PPC64 but also on other architectures
where the offlining of CPUs is supported.

Here's the situation reported by my laptop (dual-core Intel processor with
Hyperthreading support) when all CPUs are online:

  [abologna@pandorica ~]$ virsh nodeinfo
  CPU(s):              4
  CPU socket(s):       1
  Core(s) per socket:  2
  Thread(s) per core:  2
  NUMA cell(s):        1

  [abologna@pandorica ~]$ lscpu
  CPU(s):                4
  On-line CPU(s) list:   0-3
  Thread(s) per core:    2
  Core(s) per socket:    2
  Socket(s):             1
  NUMA node(s):          1

Both virsh and lscpu report the correct topology information. I have edited
the output to remove information not relevant to the issue at hand.

If I bring offline one thread per core, as to reflect the configuration of the
PowerPC machine, this is what I get:

  [abologna@pandorica ~]$ virsh nodeinfo
  CPU(s):              2
  CPU socket(s):       1
  Core(s) per socket:  4
  Thread(s) per core:  1
  NUMA cell(s):        1

  [abologna@pandorica ~]$ lscpu
  CPU(s):                4
  On-line CPU(s) list:   0,2
  Off-line CPU(s) list:  1,3
  Thread(s) per core:    1
  Core(s) per socket:    2
  Socket(s):             1
  NUMA node(s):          1

As you can see, both commands are now reporting incorrect topology
information: they're just lying in different ways :)

Closing the bug.

Comment 2 David Gibson 2015-04-27 02:09:15 UTC

Given that numactl is able to get the right node information, I don't think this can really be CANTFIX, at least for the # of NUMA nodes.

The NUMA information really shouldn't be dependent on whether CPUS are all active - AIUI NUMA toplogy is tied to the sockets/cores/threads heirarchy on x86, but that's not the case on power.

Comment 3 Andrea Bolognani 2015-04-28 08:26:05 UTC

David recommended adding the following in-depth information to the bug
report and closing it again as NOTABUG.

---

libvirt obtains the data stored into a virNodeInfo object, the same data
that is eventually displayed to the user when virsh nodeinfo is called,
by looking at the contents of /sys/devices/system/node.

The topology information comes from the files in
/sys/devices/system/node/node*/cpu*/topology, but that data is not
available when a CPU is offline, which means that when SMT is off it
obtains the following information about the system:

    nodes:   4
    sockets: 1
    cores:   5
    threads: 1
    cpus:    20

which is a decent approximation of the actual topology. However, near
the end of the linuxNodeInfoCPUPopulate() function, which is called by
nodeGetInfo(), we have the following code:

    /* Now check if the topology makes sense. There are machines that
     * don't expose their real number of nodes or for example the AMD
     * Bulldozer architecture that exposes their Clustered integer core
     * modules as both threads and cores. This approach throws off our
     * detection. Unfortunately the nodeinfo structure isn't designed to
     * carry the full topology so we're going to lie about the detected
     * topology to notify the user to check the host capabilities for
     * the actual topology. */
    if ((nodeinfo->nodes *
         nodeinfo->sockets *
         nodeinfo->cores *
         nodeinfo->threads) != (nodeinfo->cpus + offline)) {
        nodeinfo->nodes = 1;
        nodeinfo->sockets = 1;
        nodeinfo->cores = nodeinfo->cpus + offline;
        nodeinfo->threads = 1;
    }

In our case:

    nodes * sockets * cores * threads == cpus + offline
    4     * 1       * 5     * 1       == 20   + 140

which obviously doesn't add up, which in turn means the virNodeInfo
object actually ends up looking like this:

    nodes:   1
    sockets: 1
    cores:   160
    threads: 1
    cpus:    20

I talked to Jirka and he confirmed that this is expected and that
well-behaved, non legacy applications are supposed to disregard the
information stored in virNodeInfo and look up the detailed topology
described in the XML capabilities whenever nodeinfo->nodes == 1.