Bug 1996718

Summary:	KSM flag --node should be --nodes in CMO assets
Product:	OpenShift Container Platform	Reporter:	Jan Fajerski <jfajersk>
Component:	Monitoring	Assignee:	Jan Fajerski <jfajersk>
Status:	CLOSED ERRATA	QA Contact:	Junqi Zhao <juzhao>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.9	CC:	amuller, anpicker, aos-bugs, erooth, juzhao, spasquie
Target Milestone:	---
Target Release:	4.9.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-10-18 17:47:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jan Fajerski 2021-08-23 14:20:29 UTC

This causes a test failure after we updated KSM and dropped a patch that was hiding the issue.

Test failure:
fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:454]: Unexpected error:
    <errors.aggregate | len:2, cap:2>: [
        {
            s: "promQL query returned unexpected results:\nsum(node_role_os_version_machine:cpu_capacity_cores:sum{label_kubernetes_io_arch!=\"\",label_node_role_kubernetes_io_master!=\"\"}) > 0\n[]",
        },
        {
            s: "promQL query returned unexpected results:\nsum(node_role_os_version_machine:cpu_capacity_sockets:sum{label_kubernetes_io_arch!=\"\",label_node_hyperthread_enabled!=\"\",label_node_role_kubernetes_io_master!=\"\"}) > 0\n[]",
        },
    ]
    [promQL query returned unexpected results:
    sum(node_role_os_version_machine:cpu_capacity_cores:sum{label_kubernetes_io_arch!="",label_node_role_kubernetes_io_master!=""}) > 0
    [], promQL query returned unexpected results:
    sum(node_role_os_version_machine:cpu_capacity_sockets:sum{label_kubernetes_io_arch!="",label_node_hyperthread_enabled!="",label_node_role_kubernetes_io_master!=""}) > 0
    []]
occurred

seen in PR https://github.com/openshift/kube-state-metrics/pull/56

Comment 3 Junqi Zhao 2021-08-25 04:07:12 UTC

checked with 4.9.0-0.nightly-2021-08-24-203710, metric-labels-allowlist label is nodes now
#  oc -n openshift-monitoring get deploy kube-state-metrics -oyaml | grep metric-labels-allowlist
        - --metric-labels-allowlist=pods=[*],nodes=[*]
but
https://github.com/openshift/kube-state-metrics/pull/56
is still open, and kube-state-metrics version is still v2.0.0 now
move back to assigned

Comment 5 Jan Fajerski 2021-09-03 07:36:03 UTC

The PR https://github.com/openshift/kube-state-metrics/pull/56 only make this issue visible. Passing --node instead of --nodes is a bug regardless. Currently its simply masked.
@juzhao feel free to close this.

Comment 6 Junqi Zhao 2021-09-03 09:18:15 UTC

based on Comment 3 and Comment 5, set to verified

Comment 11 errata-xmlrpc 2021-10-18 17:47:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759