Description of problem: In OCP 4.7, NTO implemented a set of metrics that are reported by the operator. One of them, nto_profile_set_total, is not reported correctly after reboot of the node the operator runs on. Version-Release number of selected component (if applicable): OCP 4.7 How reproducible: Always Steps to Reproduce: 1. Reboot a node NTO runs on. 2. oc project openshift-cluster-node-tuning-operator ; 3. oc rsh cluster-node-tuning-operator-<id> 4. sh-4.4$ curl --insecure https://localhost:60000/metrics Actual results: Observe nto_profile_set_total not reported. Expected results: A metric indicating nto_profile_set_total or similar (nto_profile_calculated_total) set. Additional info: https://github.com/openshift/cluster-node-tuning-operator/pull/189
Cluster version: 4.7.0-0.nightly-2020-12-20-055006 $ oc project openshift-cluster-node-tuning-operator Now using project "openshift-cluster-node-tuning-operator" on server "https://api.skordas1218a.qe.devcluster.openshift.com:6443". $ oc get pods NAME READY STATUS RESTARTS AGE cluster-node-tuning-operator-7d89b84b6c-m5xz7 1/1 Running 0 4h1m tuned-8kdkg 1/1 Running 0 4h19m tuned-8lqsn 1/1 Running 0 4h19m tuned-b6lm4 1/1 Running 0 4h24m tuned-k9ms6 1/1 Running 0 4h24m tuned-kzq5s 1/1 Running 0 4h24m tuned-wjcsx 1/1 Running 0 4h19m $ oc rsh cluster-node-tuning-operator-7d89b84b6c-m5xz7 sh-4.4$ curl --insecure https://localhost:60000/metrics # HELP nto_build_info A metric with a constant '1' value labeled version from which Node Tuning Operator was built. # TYPE nto_build_info gauge nto_build_info{version="v4.7.0-202012190243.p0-0-g5c99b95-dirty"} 1 # HELP nto_degraded_info Indicates whether the Node Tuning Operator is degraded. # TYPE nto_degraded_info gauge nto_degraded_info 0 # HELP nto_pod_labels_used_info Is the Pod label functionality turned on (1) or off (0)? # TYPE nto_pod_labels_used_info gauge nto_pod_labels_used_info 0 # HELP nto_profile_calculated_total The number of times a Tuned profile was calculated for a given node. # TYPE nto_profile_calculated_total counter nto_profile_calculated_total{node="ip-10-0-129-18.us-east-2.compute.internal",profile="openshift-node"} 3 nto_profile_calculated_total{node="ip-10-0-147-65.us-east-2.compute.internal",profile="openshift-control-plane"} 3 nto_profile_calculated_total{node="ip-10-0-165-118.us-east-2.compute.internal",profile="openshift-control-plane"} 3 nto_profile_calculated_total{node="ip-10-0-167-70.us-east-2.compute.internal",profile="openshift-node"} 3 nto_profile_calculated_total{node="ip-10-0-203-237.us-east-2.compute.internal",profile="openshift-control-plane"} 3 nto_profile_calculated_total{node="ip-10-0-216-106.us-east-2.compute.internal",profile="openshift-node"} 3
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633