Bug 1948037
Summary: | Telemetry info not completely available to identify windows nodes | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> | |
Component: | Windows Containers | Assignee: | Mansi Kulkarni <mankulka> | |
Status: | CLOSED ERRATA | QA Contact: | gaoshang <sgao> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 4.8 | CC: | aos-bugs, aravindh, mankulka, ssoto | |
Target Milestone: | --- | |||
Target Release: | 4.8.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1955319 (view as bug list) | Environment: | ||
Last Closed: | 2021-08-03 20:29:16 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1955319 |
Description
Clayton Coleman
2021-04-09 19:51:52 UTC
This bug has been verified on OCP 4.8.0-0.nightly-2021-05-06-210840 and passed, thanks. Version-Release number of selected component (if applicable): WMCO built from https://github.com/openshift/windows-machine-config-operator/commit/1ca41c250ff937d1543559ba19e805a7473d45bf OCP version 4.8.0-0.nightly-2021-05-06-210840 Steps: 1. Install WMCO operator on OCP 4.8, make sure WMCO namespace is monitored by selecting checkbox "Enable Operator recommended cluster monitoring on this Namespace". 2. Create Windows machineset and scale up Windows nodes 3. Check cluster reports node_role_os_version_machine:cpu_capacity_cores:sum with label_node_openshift_io_os_id="Windows" via prometheus e.g Search `node_role_os_version_machine:cpu_capacity_cores:sum` in https://prometheus-k8s-openshift-monitoring.apps.sgao-a1.qe.devcluster.openshift.com/graph, got: node_role_os_version_machine:cpu_capacity_cores:sum{label_kubernetes_io_arch="amd64", label_node_hyperthread_enabled="false", label_node_openshift_io_os_id="Windows"} 2 node_role_os_version_machine:cpu_capacity_cores:sum{label_kubernetes_io_arch="amd64", label_node_hyperthread_enabled="true", label_node_openshift_io_os_id="rhcos"} 3 node_role_os_version_machine:cpu_capacity_cores:sum{label_kubernetes_io_arch="amd64", label_node_hyperthread_enabled="true", label_node_openshift_io_os_id="rhcos", label_node_role_kubernetes_io_master="true"} 6 4. Check rules in https://prometheus-k8s-openshift-monitoring.apps.sgao-a1.qe.devcluster.openshift.com/rules, did not find anything wrong. node.rules Rule State Error Last Evaluation Evaluation Time record:node_namespace_pod:kube_pod_info: expr:topk by(namespace, pod) (1, max by(node, namespace, pod) (label_replace(kube_pod_info{job="kube-state-metrics",node!=""}, "pod", "$1", "pod", "(.*)"))) OK 8.731s ago 3.970ms record:node:node_num_cpu:sum expr:count by(cluster, node) (sum by(node, cpu) (node_cpu_seconds_total{job="node-exporter"} * on(namespace, pod) group_left(node) topk by(namespace, pod) (1, node_namespace_pod:kube_pod_info:))) OK 8.727s ago 3.897ms record::node_memory_MemAvailable_bytes:sum expr:sum by(cluster) (node_memory_MemAvailable_bytes{job="node-exporter"} or (node_memory_Buffers_bytes{job="node-exporter"} + node_memory_Cached_bytes{job="node-exporter"} + node_memory_MemFree_bytes{job="node-exporter"} + node_memory_Slab_bytes{job="node-exporter"})) OK 8.724s ago 0.650ms windows.rules Rule State Error Last Evaluation Evaluation Time record:instance:node_cpu_utilisation:rate1m expr:avg without(core, mode) (rate(windows_cpu_time_total{mode="idle"}[1m])) OK 4m 13s ago 0.276ms record:instance:node_cpu:rate:sum expr:sum by(instance) (rate(windows_cpu_time_total{mode!="iowait",mode="idle"}[3m])) OK 4m 13s ago 0.177ms record:node_filesystem_size_bytes expr:windows_logical_disk_size_bytes OK 4m 13s ago 0.092ms record:node_filesystem_avail_bytes expr:windows_logical_disk_free_bytes OK 4m 13s ago 0.087ms record:node_network_receive_bytes_total expr:rate(windows_net_bytes_received_total[1m]) OK 4m 13s ago 0.122ms record:node_network_transmit_bytes_total expr:rate(windows_net_bytes_sent_total[1m]) OK 4m 13s ago 0.096ms record:node_filesystem_free_bytes expr:windows_logical_disk_free_bytes OK 4m 13s ago 0.084ms record:node_memory_MemAvailable_bytes expr:windows_memory_available_bytes OK 4m 13s ago 0.087ms record:node_memory_MemTotal_bytes expr:windows_cs_physical_memory_bytes OK 4m 13s ago 0.078ms record:node_cpu_info expr:windows_cpu_info OK 4m 13s ago 0.102ms Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Platform for Windows Containers 3.0.0 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3001 |