1907936 – NTO is not reporting nto_profile_set_total metrics correctly after reboot

Bug 1907936 - NTO is not reporting nto_profile_set_total metrics correctly after reboot

Summary: NTO is not reporting nto_profile_set_total metrics correctly after reboot

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node Tuning Operator
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Jiří Mencák
QA Contact:	Simon
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-12-15 14:42 UTC by Jiří Mencák
Modified:	2021-02-24 15:44 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-24 15:44:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-node-tuning-operator pull 189	0	None	closed	Bug 1907936: Switch to nto_profile_calculated_total.	2021-01-07 19:57:51 UTC
Red Hat Product Errata	RHSA-2020:5633	0	None	None	None	2021-02-24 15:44:50 UTC

Description Jiří Mencák 2020-12-15 14:42:40 UTC

Description of problem:
In OCP 4.7, NTO implemented a set of metrics that are reported by the operator. One of them, nto_profile_set_total, is not reported correctly after reboot of the node the operator runs on.

Version-Release number of selected component (if applicable):
OCP 4.7

How reproducible:
Always

Steps to Reproduce:
1. Reboot a node NTO runs on.
2. oc project openshift-cluster-node-tuning-operator ; 
3. oc rsh cluster-node-tuning-operator-<id>
4. 
sh-4.4$ curl --insecure https://localhost:60000/metrics

Actual results:
Observe nto_profile_set_total not reported.

Expected results:
A metric indicating nto_profile_set_total or similar (nto_profile_calculated_total) set.

Additional info:
https://github.com/openshift/cluster-node-tuning-operator/pull/189

Comment 3 Simon 2020-12-21 20:14:04 UTC

Cluster version: 4.7.0-0.nightly-2020-12-20-055006

$ oc project openshift-cluster-node-tuning-operator
Now using project "openshift-cluster-node-tuning-operator" on server "https://api.skordas1218a.qe.devcluster.openshift.com:6443".

$ oc get pods
NAME                                            READY   STATUS    RESTARTS   AGE
cluster-node-tuning-operator-7d89b84b6c-m5xz7   1/1     Running   0          4h1m
tuned-8kdkg                                     1/1     Running   0          4h19m
tuned-8lqsn                                     1/1     Running   0          4h19m
tuned-b6lm4                                     1/1     Running   0          4h24m
tuned-k9ms6                                     1/1     Running   0          4h24m
tuned-kzq5s                                     1/1     Running   0          4h24m
tuned-wjcsx                                     1/1     Running   0          4h19m

$ oc rsh cluster-node-tuning-operator-7d89b84b6c-m5xz7
sh-4.4$ curl --insecure https://localhost:60000/metrics
# HELP nto_build_info A metric with a constant '1' value labeled version from which Node Tuning Operator was built.
# TYPE nto_build_info gauge
nto_build_info{version="v4.7.0-202012190243.p0-0-g5c99b95-dirty"} 1
# HELP nto_degraded_info Indicates whether the Node Tuning Operator is degraded.
# TYPE nto_degraded_info gauge
nto_degraded_info 0
# HELP nto_pod_labels_used_info Is the Pod label functionality turned on (1) or off (0)?
# TYPE nto_pod_labels_used_info gauge
nto_pod_labels_used_info 0
# HELP nto_profile_calculated_total The number of times a Tuned profile was calculated for a given node.
# TYPE nto_profile_calculated_total counter
nto_profile_calculated_total{node="ip-10-0-129-18.us-east-2.compute.internal",profile="openshift-node"} 3
nto_profile_calculated_total{node="ip-10-0-147-65.us-east-2.compute.internal",profile="openshift-control-plane"} 3
nto_profile_calculated_total{node="ip-10-0-165-118.us-east-2.compute.internal",profile="openshift-control-plane"} 3
nto_profile_calculated_total{node="ip-10-0-167-70.us-east-2.compute.internal",profile="openshift-node"} 3
nto_profile_calculated_total{node="ip-10-0-203-237.us-east-2.compute.internal",profile="openshift-control-plane"} 3
nto_profile_calculated_total{node="ip-10-0-216-106.us-east-2.compute.internal",profile="openshift-node"} 3

Comment 5 errata-xmlrpc 2021-02-24 15:44:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.