Bug 1466735

Summary: turbostat does report only half of CPUs on AMD with Opteron Processor 6276
Product: Red Hat Enterprise Linux 6 Reporter: Jiri Hladky <jhladky>
Component: cpupowerutilsAssignee: Prarit Bhargava <prarit>
Status: CLOSED WONTFIX QA Contact: Erik Hamera <ehamera>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.9CC: jhladky, kkolakow, mpetlan
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1466743 (view as bug list) Environment:
Last Closed: 2017-12-06 10:59:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1456386, 1466743    
Attachments:
Description Flags
Output of /usr/bin/hwloc-gather-topology /tmp/$(uname -n)
none
CPU topology as displayed with lstopo none

Description Jiri Hladky 2017-06-30 10:59:53 UTC
Description of problem:

On AMD server with 2x Opteron Processor 6276 turbostat reports only half of CPUs:

turbostat reports only CPUs 8-15 and 24-31. CPUs with numbers 0-7 and 16-23 ARE MISSING

turbostat output:

     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz
       -    1122   97.64    1149    1150
       8    2219   96.55    2298    2300
       9    2242   97.57    2298    2300
      10    2247   97.76    2298    2300
      11    2249   97.87    2298    2300
      12    2246   97.74    2298    2300
      13    2246   97.71    2298    2300
      14    2246   97.74    2298    2300
      15    2246   97.74    2298    2300
      24    2237   97.30    2299    2300
      25    2246   97.69    2299    2300
      26    2248   97.77    2299    2300
      27    2250   97.90    2298    2300
      28    2247   97.74    2299    2300
      29    2247   97.74    2299    2300
      30    2247   97.74    2299    2300
      31    2247   97.74    2299    2300

lscpu output:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          4
Vendor ID:             AuthenticAMD
CPU family:            21
Model:                 1
Model name:            AMD Opteron(TM) Processor 6276
Stepping:              2
CPU MHz:               1400.000
BogoMIPS:              4599.35
Virtualization:        AMD-V
L1d cache:             16K
L1i cache:             64K
L2 cache:              2048K
L3 cache:              6144K
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15
NUMA node2 CPU(s):     16-23
NUMA node3 CPU(s):     24-31


Version-Release number of selected component (if applicable):

turbostat --version
turbostat version 4.8 26-Sep, 2015 - Len Brown <lenb>


How reproducible:
On system with Opteron Processor 6276:

yum -y install cpupowerutils
turbostat ls

and check if all CPUs appear in turbostat output. Compare it with output of lscpu from util-linux-ng package. 


Actual results:

turbostat shows only half of CPUs

Expected results:

turbostat shows all CPUs

Comment 2 Jiri Hladky 2017-06-30 11:02:01 UTC
I have informed upstream maintainer of turbostat about the issue

Len Brown <lenb>

Comment 3 Neil Horman 2017-06-30 17:22:03 UTC
Unless this gets fixed upstream very soon, I expect the fix won't make it into RHEL6 at all

Comment 4 Neil Horman 2017-07-10 10:57:13 UTC
do we have an upstream commit on this yet?

Comment 5 Jiri Hladky 2017-07-10 11:06:54 UTC
No, we don't.

Comment 6 Neil Horman 2017-07-10 18:59:44 UTC
Looking at this, I think this is actually a strange topology reporting bug.  The expectation for hyperthreaded processors is that for thread siblings, their core_id is the same.  You can see this on my desktop processor:

[nhorman@hmswarspite cpu]$ cat cpu0/topology/core_id
0
[nhorman@hmswarspite cpu]$ cat cpu0/topology/thread_siblings_list
0,4
[nhorman@hmswarspite cpu]$ cat cpu4/topology/core_id
0
[nhorman@hmswarspite cpu]$ cat cpu4/topology/thread_siblings_list
0,4
[nhorman@hmswarspite cpu]$

however on the opteron 6272 system I found in the lab, this is not the case:
[root@hp-bl465cgen8-01 cpu]# cat cpu0/topology/core_id 
0
[root@hp-bl465cgen8-01 cpu]# cat cpu0/topology/thread_siblings_list 
0-1
[root@hp-bl465cgen8-01 cpu]# cat cpu1/topology/core_id 
1
[root@hp-bl465cgen8-01 cpu]# cat cpu1/topology/thread_siblings_list 
0-1
[root@hp-bl465cgen8-01 cpu]#

The fact that the thread siblings are on different cores is somewhat non-sensical to the topolgy model in turbostat, and, really, in general.

On this opteron, cpu8 is the other cpu that exists on core 0, and so the thread_siblings_list should read 0,8, not 0-1. 

Given that this information is derived from the apic id that a given cpu is assigned to (which in turn I believe is a firmware defined setting), I don't believe there is going to be anything we can do about this (though a firmware update may correct the problem).

Comment 7 Jiri Hladky 2017-07-11 13:03:22 UTC
Hi Neil,

I think that from the customer perspective it doesn't matter if this is a kernel bug (/sys/devices/system/cpu) or turbostat bug - customer just expects turbostat to report all CPUs correctly. 

Other tools can recognize the CPU topology correctly (for example lstopo) so IMHO this is a turbostat bug. If you think this is a kernel bug could you please open appropriate kernel bug so that we get it resolved?

I have run 
/usr/bin/hwloc-gather-topology /tmp/$(uname -n)

on this Opteron server

https://beaker.cluster-qe.lab.eng.brq.redhat.com/bkr/view/kiff-02.cluster-qe.lab.eng.brq.redhat.com

and I then verified that lstopo (both are parts of lstopo package) can correctly parse the topology from /sys/devices/system/cpu

tar jxvf kiff-02.cluster-qe.lab.eng.brq.redhat.com.tar.bz2
lstopo --input kiff-02.cluster-qe.lab.eng.brq.redhat.com kiff-02.cluster-qe.lab.eng.brq.redhat.com.png

Please check the PNG output. 

I will attach both kiff-02.cluster-qe.lab.eng.brq.redhat.com.tar.bz2 and kiff-02.cluster-qe.lab.eng.brq.redhat.com.png files for you to check.

Based on this I believe that this is not a problem with information in /sys/devices/system/cpu but rather a turbostat problem. 

Thanks
Jirka

Comment 8 Jiri Hladky 2017-07-11 13:07:29 UTC
Created attachment 1296249 [details]
Output of /usr/bin/hwloc-gather-topology /tmp/$(uname -n)

Output of /usr/bin/hwloc-gather-topology /tmp/$(uname -n) on this AMD system:

https://beaker.cluster-qe.lab.eng.brq.redhat.com/bkr/view/kiff-02.cluster-qe.lab.eng.brq.redhat.com

Unpack it and use 

lstopo --input kiff-02.cluster-qe.lab.eng.brq.redhat.com

to check the topology graphically. It shows all 32 CPUs. 

turbostat shows only half of CPUs:

turbostat ls
     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz
       -      29    3.92     749    1171
       8      53    3.60    1462    2387
       9      47    3.19    1463    2381
      10      55    3.78    1449    2373
      11     126    8.72    1443    2363
      12      60    4.16    1439    2358
      13      42    2.90    1442    2360
      14      52    3.62    1439    2355
      15      44    3.07    1438    2347
      24      57    3.96    1430    2341
      25      44    3.10    1423    2331
      26      62    2.66    2343    2323
      27      47    2.01    2328    2316
      28      54    3.92    1384    2316
      29      43    3.02    1418    2313
      30      55    3.88    1409    2309
      31     100    7.08    1410    2310

Comment 9 Jiri Hladky 2017-07-11 13:09:42 UTC
Created attachment 1296250 [details]
CPU topology as displayed with lstopo

Output of 

lstopo --input kiff-02.cluster-qe.lab.eng.brq.redhat.com kiff-02.cluster-qe.lab.eng.brq.redhat.com.png

(Use it with kiff-02.cluster-qe.lab.eng.brq.redhat.com.tar.bz2 I have submitted earlier)

lstopo can correctly recognize the CPU topology based on info in /sys/devices/system/cpu

Comment 10 Neil Horman 2017-07-11 14:58:59 UTC
You're image in comment 9 illustrates the problem quite well.

1) You will note that on for each set of paired thread siblings (e.g PU 0 and PU 1), they are listed on separate cores.  Processing units are by definition must be on the same core, otherwise they aren't siblings.

2) core id's within a single package are unique, and clearly lstopo is showing multiple same core ids within a single package.

So its not working, its just acting broken in a different way than turbostat, based on an erroneous topology map as exported by the kernel.

and the kernel in turn isn't generating that map, its just reporting it based on the ids that it reads from combinations of the cpuid instruction and apic registers, both of which are established by the system firmware.

So theres nothing for us to do here.  Either we write a quirk into the kernel to fix up maps for this system (if we can uniquely detect it), or we get the vendor to fix their firmware to configure the topology correctly.  We're not going to do the former in RHEL6 at this late date, and so we're left with the latter, which is the proper fix anyway.

Comment 11 Prarit Bhargava 2017-07-11 15:53:19 UTC
(In reply to Neil Horman from comment #10)
> You're image in comment 9 illustrates the problem quite well.
> 
> 1) You will note that on for each set of paired thread siblings (e.g PU 0
> and PU 1), they are listed on separate cores.  Processing units are by
> definition must be on the same core, otherwise they aren't siblings.
> 
> 2) core id's within a single package are unique, and clearly lstopo is
> showing multiple same core ids within a single package.

You're right ... but I'm wondering if the real bug here is that AMD 0x16 doesn't have a "Core" and has Processing Units.  Those *are* unique AFAICT, and maybe that's the real problem here.

/me thinks ... and will get back to everyone in a bit

P.

Comment 12 Prarit Bhargava 2017-07-11 18:05:33 UTC
So this is the situation.  The way turbostat handles the topology is not correct.  First let's get the terminology right.  There are processors, cores, and threads.  The processor is the thingy that is stuck to the mobo.  Cores are the thingys that _can_ execute code, but more modern processors have a pair of what they call threads to execute code for every Core.

There's this other thing that can be used to group Cores together for processing power that is called a Node.  It doesn't really matter why it exists but just note that it is only a way of grouping Cores together.

In Intel's world, each Processor, Node (group of Cores), Core, and Thread are uniquely identified.

In AMD's world, each Processor, Node, and Thread are uniquely identified with a number.  This is NOT the case with Cores.  In AMD's world, each Core is uniquely identified _within a Node_.

And the results in an enumeration problem in turbostat.  Turbostat enumerates assuming that each Core is a unique object -- but it isn't.  This results in the overwriting of existing data in turbostat.  So, for example, suppose we have the following simple topology:

Processor 0 has Node 0, which contains Core 0 and Threads 0 and 1.
Processor 0 has Node 1, which Contains Core 0 and Threads 2 and 3.

When turbostat enumerates, it considers the *Core* as the main thing to enumerate, so it sets

Core[0].thread_ids = { 0, 1 }
Core[0].node = 0
Core[0].processor = 0

and then *overwrites that data* with


Core[0].thread_ids = { 2, 3 }
Core[0].node = 1
Core[0].processor = 0

.... which results in the first set of data being overwritten so we only see 1/2 of the data.

This isn't an issue with NUMA, ACPI, etc.  This is solely an issue with turbostat not handling enumeration of AMD 0x16 and 0x17 processors.  Again, AFAICT what AMD is doing is valid; the threads are uniquely identifiable.  The problem is how turbostat is enumerating the data.

This can be fixed IMO but I'm going to have to think about the easiest way to do it.


P.

Comment 13 Jiri Hladky 2017-07-13 11:45:55 UTC
Hi Prarit,

thanks a lot for the detailed analyses. I fully agree with it. 

Please note also that turbostat does not report all CPUs on wide range of AMD systems - including the Ryzen CPU - see BZ1454489. The issue is not server vendor specific.

Jirka

Comment 16 Jan Kurik 2017-12-06 10:59:16 UTC
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/