Bug 1882730 - z systems not reporting correct core count in recording rule
Summary: z systems not reporting correct core count in recording rule
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.7.0
Assignee: Simon Pasquier
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks: 1892706
TreeView+ depends on / blocked
 
Reported: 2020-09-25 13:08 UTC by Clayton Coleman
Modified: 2021-02-24 15:21 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1892706 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:21:16 UTC
Target Upstream Version:
Embargoed:
psundara: needinfo-


Attachments (Terms of Use)
node_cpu_info metric from x86_64 machine (894 bytes, text/plain)
2020-09-29 14:49 UTC, Simon Pasquier
no flags Details
cpuinfo and lscpu for LPAR (5.71 KB, text/plain)
2020-10-01 12:17 UTC, Prashanth Sundararaman
no flags Details
node_cpu_info_zkvm (4.91 KB, text/plain)
2020-10-01 18:26 UTC, Prashanth Sundararaman
no flags Details
topology_lscpu (1.30 KB, text/plain)
2020-10-02 13:10 UTC, Prashanth Sundararaman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 953 0 None closed Bug 1882730: fix cluster:cpu_core_hyperthreading rule for s390x 2021-02-19 14:21:15 UTC
Github prometheus procfs issues 333 0 None closed Improve CPU information for physical s390x platforms 2021-02-19 14:21:15 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:21:56 UTC

Description Clayton Coleman 2020-09-25 13:08:51 UTC
node_role_os_version_machine:cpu_capacity_cores:sum{label_kubernetes_io_arch="s390x"} should be recording the number of cores for all machines with the given node roles, but it appears to report 1 for each machine.  No clusters were reporting any number other than 1 per cluster, so it’s likely that the metric from node_exporter is reporting an incorrect value on z across all versions (vs the less likely outcome that every z user is only using a single code for every machine they run).

This metric is used to associate subscription data with clusters and thus this is a high severity bug (we cannot correctly communicate to customers what their amount owed.

If this is a limitation of the z virtualization environment we may need to return an estimated value (if possible) since 1 is definitely not correct.

When looking at cluster:capacity_cpu_cores:sum for a z cluster, I’m seeing node role labels appear, so it’s possible that we significantly regressed the calculation of that metric as part of a recent change, so bumping this to urgent (we can’t regress reporting these metrics) until that is resolved.  If reverting does not fix the z core count, then we can drop this to high.

Comment 1 Clayton Coleman 2020-09-25 13:11:53 UTC
The capacity change appears to be correct (sum of capacity by _id works ok on 4.5 and 4.6) so dropping severity back to high.  The z core reporting bug is likely independent.

Comment 6 Prashanth Sundararaman 2020-09-29 12:12:19 UTC
This is the cpuinfo from an s390x zVM. it has no "core" as such. I am not sure what the equivalent is - i am adding the IBM folks here so they can chime in:

[root@rock-zvm-3-1 ~]# cat /proc/cpuinfo
vendor_id       : IBM/S390
# processors    : 2
bogomips per cpu: 1048.00
max thread id   : 0
features	: esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx vxd vxe gs sie 
facilities      : 0 1 2 3 4 6 7 8 9 10 12 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 30 31 32 33 34 35 36 37 38 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 57 58 59 60 73 74 75 76 77 80 81 82 128 129 130 131 133 134 135 146 147 156 168
cache0          : level=1 type=Data scope=Private size=128K line_size=256 associativity=8
cache1          : level=1 type=Instruction scope=Private size=128K line_size=256 associativity=8
cache2          : level=2 type=Data scope=Private size=4096K line_size=256 associativity=8
cache3          : level=2 type=Instruction scope=Private size=2048K line_size=256 associativity=8
cache4          : level=3 type=Unified scope=Shared size=131072K line_size=256 associativity=32
cache5          : level=4 type=Unified scope=Shared size=688128K line_size=256 associativity=42
processor 0: version = FF,  identification = 01F2E8,  machine = 3907
processor 1: version = FF,  identification = 01F2E8,  machine = 3907

cpu number      : 0
cpu MHz dynamic : 4504
cpu MHz static  : 4504

cpu number      : 1
cpu MHz dynamic : 4504
cpu MHz static  : 4504

Comment 7 Simon Pasquier 2020-09-29 14:49:36 UTC
Created attachment 1717565 [details]
node_cpu_info metric from x86_64 machine

For the record this is what node_cpu_info metric looks like for a x86_64 machine with 4 vCPUs (2 cores).

Comment 10 Dan Li 2020-09-30 19:37:53 UTC
Changing the Target Release to 4.6.z after chatting with Clayton

Comment 11 Prashanth Sundararaman 2020-09-30 19:38:10 UTC
So few things here - most of the Z installations are on zVM guests or zKVMs - both of these are VMs - they have vcpus. Doing an lscpu on one of the zVM nodes show:

[core@master-2 ~]$ lscpu
Architecture:        s390x
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Big Endian
CPU(s):              2
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s) per book:  1
Book(s) per drawer:  1
Drawer(s):           4
NUMA node(s):        1
--snip--

The cores per socket here is one. And as i pasted above, there is no core information in /proc/cpuinfo


Contrast this to an lscpu and /proc/cpuinfo from an LPAR:

[sharkcz@rock-kvmlp-fedora ~]$ lscpu
Architecture:                    s390x
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Big Endian
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              2
Core(s) per socket:              10
Socket(s) per book:              3
--snip--

There are 10 cores. And even the cpuinfo has all the information like any other architecture and there is a "core id" here unlike for the display in a VM which doesn't have most of this info.

--snip--
cpu number      : 0
physical id     : 2
core id         : 0
book id         : 2
drawer id       : 1
dedicated       : 0
address         : 0
siblings        : 8
cpu cores       : 4
version         : 00
identification  : 06F2E8
machine         : 3907
cpu MHz dynamic : 4504
cpu MHz static  : 4504
--snip--

I would say that for a zVM guest/ zKVM installation the values displayed are correct. If we ever support LPAR installations, this would need to change, but that is a conversation for later i guess.

Comment 12 Dan Li 2020-09-30 19:45:09 UTC
Changing the Target Release back to 4.6 as agreement is needed from stakeholders to move this bug off the blocker list

@Dennis, @Clayton, @Hendrick, please see Prashanth's Comment 11. If I am understanding the problem correctly, if we support LPAR installation, then this bug would be a high severity, if we only support VMs, then it would not be a blocker.

Comment 13 Simon Pasquier 2020-10-01 07:43:32 UTC
@Prashanth could you attach the full /proc/cpuinfo from an LPAR to the ticket? It would help to fix node_exporter & procfs if we need to eventually.

> I would say that for a zVM guest/ zKVM installation the values displayed are correct.

Can we get also the node_cpu_info metric from a zKVM cluster to confirm that the "core" and "package" labels are missing? If this is the case we need either to fix the recording rules or node_exporter.

Comment 14 Prashanth Sundararaman 2020-10-01 12:17:33 UTC
Created attachment 1718159 [details]
cpuinfo and lscpu for LPAR

Comment 15 Prashanth Sundararaman 2020-10-01 18:26:59 UTC
Created attachment 1718237 [details]
node_cpu_info_zkvm

Comment 16 Prashanth Sundararaman 2020-10-01 18:28:10 UTC
Added the cpu info for zKVM and yes - it is the same there too.

Comment 19 Simon Pasquier 2020-10-02 09:32:32 UTC
Can you attach also a copy of /sys/devices/system/cpu/cpu*/topology for a zVM guest/ zKVM installation?

Comment 20 Prashanth Sundararaman 2020-10-02 13:10:19 UTC
Created attachment 1718447 [details]
topology_lscpu

I have attached the topology info provided by lscpu and lscpu -e which should be what is present in /sys/devices/system/cpu/*/topology/.

On further discussion with the IBM team , it looks like for an LPAR environment each logical processor in linux is one core. With SMT one core becomes 2 processors. But for zKVM/zVM looks like each processor is one core. I am wondering whether for zVM/zKVM we should report it such that cores=cpus

Comment 21 Prashanth Sundararaman 2020-10-06 15:29:46 UTC
@Simon - Anything else you need from me ? How do we proceed on this ?

Comment 22 Simon Pasquier 2020-10-07 08:47:13 UTC
After more testing, I confirm that the "cluster:cpu_core_hyperthreading" recording rule is broken because node_exporter/procfs reports no "core" and "package" label values for zVM guests and zKVMs.

I can see 2 options:

a) Update the "cluster:cpu_core_hyperthreading" recording rule to account for this corner case: if the "core" label is empty then we fallback to the "cpu" label value. The updated rule would look like this:

            clamp_max(
              (
                label_replace( ( ( sum ( label_replace(label_join(node_cpu_info{core="",package=""}, "core", "", "cpu"), "package", "0", "package", "") ) by (instance, package, core) )  > 1 ), "label_node_hyperthread_enabled", "true", "instance", "(.*)" )
                or on (instance, package)
                label_replace( ( ( sum ( label_replace(label_join(node_cpu_info{core="",package=""}, "core", "", "cpu"), "package", "0", "package", "") ) by (instance, package, core) ) <= 1 ), "label_node_hyperthread_enabled", "false", "instance", "(.*)" )
              ), 1
            )


b) "Fix" procfs upstream for s390x arch to set the "core" label to "cpu" if it's missing from /proc/cpuing.

Personally I'd go for option a) because I don't think it makes sense to hack procfs if the information isn't present in /proc/cpuinfo (and upstream maintainers would probably feel the same since it isn't a pattern already followed in procfs). But happy to get more inputs from SMEs.

Comment 23 Simon Pasquier 2020-10-08 14:44:37 UTC
@Clayton what would be your recommendation regarding the options listed in comment 22?

Comment 32 errata-xmlrpc 2021-02-24 15:21:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.