Bug 1996718

Summary: KSM flag --node should be --nodes in CMO assets
Product: OpenShift Container Platform Reporter: Jan Fajerski <jfajersk>
Component: MonitoringAssignee: Jan Fajerski <jfajersk>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.9CC: amuller, anpicker, aos-bugs, erooth, juzhao, spasquie
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:47:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jan Fajerski 2021-08-23 14:20:29 UTC
This causes a test failure after we updated KSM and dropped a patch that was hiding the issue.

Test failure:
fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:454]: Unexpected error:
    <errors.aggregate | len:2, cap:2>: [
        {
            s: "promQL query returned unexpected results:\nsum(node_role_os_version_machine:cpu_capacity_cores:sum{label_kubernetes_io_arch!=\"\",label_node_role_kubernetes_io_master!=\"\"}) > 0\n[]",
        },
        {
            s: "promQL query returned unexpected results:\nsum(node_role_os_version_machine:cpu_capacity_sockets:sum{label_kubernetes_io_arch!=\"\",label_node_hyperthread_enabled!=\"\",label_node_role_kubernetes_io_master!=\"\"}) > 0\n[]",
        },
    ]
    [promQL query returned unexpected results:
    sum(node_role_os_version_machine:cpu_capacity_cores:sum{label_kubernetes_io_arch!="",label_node_role_kubernetes_io_master!=""}) > 0
    [], promQL query returned unexpected results:
    sum(node_role_os_version_machine:cpu_capacity_sockets:sum{label_kubernetes_io_arch!="",label_node_hyperthread_enabled!="",label_node_role_kubernetes_io_master!=""}) > 0
    []]
occurred

seen in PR https://github.com/openshift/kube-state-metrics/pull/56

Comment 3 Junqi Zhao 2021-08-25 04:07:12 UTC
checked with 4.9.0-0.nightly-2021-08-24-203710, metric-labels-allowlist label is nodes now
#  oc -n openshift-monitoring get deploy kube-state-metrics -oyaml | grep metric-labels-allowlist
        - --metric-labels-allowlist=pods=[*],nodes=[*]
but
https://github.com/openshift/kube-state-metrics/pull/56
is still open, and kube-state-metrics version is still v2.0.0 now
move back to assigned

Comment 5 Jan Fajerski 2021-09-03 07:36:03 UTC
The PR https://github.com/openshift/kube-state-metrics/pull/56 only make this issue visible. Passing --node instead of --nodes is a bug regardless. Currently its simply masked.
@juzhao feel free to close this.

Comment 6 Junqi Zhao 2021-09-03 09:18:15 UTC
based on Comment 3 and Comment 5, set to verified

Comment 11 errata-xmlrpc 2021-10-18 17:47:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759