Bug 1917327

Summary: annotations.message maybe wrong for NTOPodsNotReady alert
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: Node Tuning OperatorAssignee: Jiří Mencák <jmencak>
Status: CLOSED ERRATA QA Contact: Simon <skordas>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.7CC: sejug
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:53:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
NTOPodsNotReady alert none

Description Junqi Zhao 2021-01-18 10:06:09 UTC
Description of problem:
see from the attached picture, there is NTOPodsNotReady in administrator console, its message is:
Pod {{ $labels.pod }} on node {{ $labels.node }} is not ready.

$labels.pod and $labels.node is not replaced by the actual vaule

checked from prometheus, pod tuned-96d6k is not ready due to compute-1 node is NotReady
kube_pod_status_ready{condition="true",namespace="openshift-cluster-node-tuning-operator"} == 0
Element 	Value
kube_pod_status_ready{condition="true",container="kube-rbac-proxy-main",endpoint="https-main",instance="10.128.2.12:8443",job="kube-state-metrics",namespace="openshift-cluster-node-tuning-operator",pod="tuned-96d6k",service="kube-state-metrics"}	0


# oc -n openshift-cluster-node-tuning-operator get pod -o wide
NAME                                            READY   STATUS    RESTARTS   AGE     IP              NODE              NOMINATED NODE   READINESS GATES
cluster-node-tuning-operator-7df4fdb8c9-nt2w4   1/1     Running   0          115m    10.129.0.14     control-plane-0   <none>           <none>
tuned-8rmp8                                     1/1     Running   0          6h10m   172.31.248.31   compute-0         <none>           <none>
tuned-96d6k                                     1/1     Running   0          6h10m   172.31.248.60   compute-1         <none>           <none>
tuned-jwpnq                                     1/1     Running   0          6h15m   172.31.248.29   control-plane-2   <none>           <none>
tuned-q9tp8                                     1/1     Running   0          6h15m   172.31.248.28   control-plane-1   <none>           <none>
tuned-wxxsg                                     1/1     Running   0          6h15m   172.31.248.39   control-plane-0   <none>           <none>

# oc get node compute-1
NAME        STATUS     ROLES    AGE     VERSION
compute-1   NotReady   worker   6h13m   v1.20.0+d9c52cc


alert: NTOPodsNotReady
expr: kube_pod_status_ready{condition="true",namespace="openshift-cluster-node-tuning-operator"} == 0
for: 30m
labels:
  severity: warning
annotations:
  message: Pod {{"{{"}} $labels.pod {{"}}"}} on node {{"{{"}} $labels.node {{"}}"}} is not ready.
********************************************
I think the following is better and can replaced to the actual value
annotations:
  message: Pod {{ $labels.pod }} on node {{ $labels.node }} is not ready. 



Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2021-01-18-000316

How reproducible:
always

Steps to Reproduce:
1. see the description
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Junqi Zhao 2021-01-18 10:07:44 UTC
Created attachment 1748424 [details]
NTOPodsNotReady alert

Comment 2 Jiří Mencák 2021-01-18 12:48:34 UTC
Thank you for the report, upstream PR to fix this: https://github.com/openshift/cluster-node-tuning-operator/pull/193

Comment 6 errata-xmlrpc 2021-02-24 15:53:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633