Bug 1952149
Summary: | oc adm top reporting unknown status for Windows node | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Mansi Kulkarni <mankulka> |
Component: | Monitoring | Assignee: | Mansi Kulkarni <mankulka> |
Status: | CLOSED ERRATA | QA Contact: | Ronnie Rasouli <rrasouli> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.7 | CC: | alegrand, anpicker, aos-bugs, dgrisonn, erooth, kakkoyun, lcosic, mankulka, obulatov, pkrupa, rrasouli, sgao, spasquie, team-winc, vhire |
Target Milestone: | --- | ||
Target Release: | 4.7.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1920903 | Environment: | |
Last Closed: | 2021-05-24 17:14:37 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1920903 | ||
Bug Blocks: |
Description
Mansi Kulkarni
2021-04-21 15:41:50 UTC
>oc adm top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-10-0-148-35.us-east-2.compute.internal 746m 21% 7566Mi 51%
ip-10-0-156-90.us-east-2.compute.internal 370m 24% 3463Mi 52%
ip-10-0-173-191.us-east-2.compute.internal 104m 6% 1932Mi 29%
ip-10-0-184-76.us-east-2.compute.internal 698m 19% 7261Mi 49%
ip-10-0-203-37.us-east-2.compute.internal 466m 31% 4802Mi 73%
ip-10-0-207-184.us-east-2.compute.internal 518m 14% 5547Mi 37%
ip-10-0-133-203.us-east-2.compute.internal <unknown> <unknown> <unknown> <unknown>
ip-10-0-132-187.us-east-2.compute.internal <unknown> <unknown> <unknown> <unknown>
Server Version: 4.7.0-0.nightly-2021-05-17-040457
@rrasouli since the fix was merged on May 14th, it might not be available on a nightly and would have to be tested on a CI cluster. Could you provide more details on how the operator was installed? It should be built from release-4.7 branch of WMCO, the released 2.0.0 version of WMCO does not include latest developments with metrics configuration. @rrasouli tested this out on a latest CI cluster and it worked. Server version: 4.7.0-0.ci-2021-05-17-153541 Steps: 1. Install WMCO operator by building from releas-4.7 operator branch on OCP 4.7, ensure cluster monitoring is enabled in operator namespace. 2. Create Windows machineset and scale up Windows nodes 3. Check `oc adm top nodes` should monitor Windows nodes >oc adm top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-10-0-134-241.us-east-2.compute.internal 285m 19% 3401Mi 51% ip-10-0-139-95.us-east-2.compute.internal 529m 15% 5918Mi 40% ip-10-0-152-127.us-east-2.compute.internal 90m 6% 1533Mi 22% ip-10-0-164-118.us-east-2.compute.internal 671m 19% 6029Mi 41% ip-10-0-170-159.us-east-2.compute.internal 219m 14% 3432Mi 51% ip-10-0-212-23.us-east-2.compute.internal 174m 11% 2702Mi 40% ip-10-0-220-59.us-east-2.compute.internal 718m 20% 6681Mi 45% >oc adm top node -l kubernetes.io/os=windows NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-10-0-152-127.us-east-2.compute.internal 91m 6% 1521Mi 22% Can you verify this? @rrasouli Please ensure the commit that adds this fix to the release-4.7 -> Bug 1952149: oc adm top reporting unknown status for Windows node[https://github.com/openshift/cluster-monitoring-operator/pull/1130/commits/1c9b296b55fc36175d39b4e7230a5c0674db69fa] is a part of the cluster payload to test this. @rrasouli the WMCO should be built by pulling in the latest from release-4.7 branch since there are some renaming changes related to the metrics job that went in windows-machine-config-operator-metrics -> windows-exporter, please make sure the following commits that were part of this change, are pulled in when building the operator-> https://github.com/openshift/windows-machine-config-operator/pull/353/commits version": "2.0.1+ae13f4c was built from the latest 4.7 branch Server Version: 4.7.0-0.nightly-2021-05-17-040457 Indeed after few minutes the metrics are working: oc adm top node --selector=beta.kubernetes.io/os=windows NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-10-0-154-238.us-east-2.compute.internal 1119m 74% 1569Mi 23% Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.12 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1561 |