Description of problem: CheckNodeProviderID and CheckNodeDiskUUID have the same name: CheckNodeDiskUUID, this makes the metrics result is not correct. Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2020-12-14-165231 How reproducible: Always Steps to Reproduce: 1. Install an OCP cluster on vsphere 2. Check the metrics oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=vsphere_node_check_total' | jq 3. Actual results: We can see after 8 hours: vsphere_node_check_total.CheckNodeDiskUUID has value 4. { "metric": { "__name__": "vsphere_node_check_total", "check": "CheckNodeDiskUUID", "container": "vsphere-problem-detector", "endpoint": "https", "instance": "10.129.0.64:8444", "job": "vsphere-problem-detector-metrics", "namespace": "openshift-cluster-storage-operator", "node": "piqin-1216-qffmm-master-0", "pod": "cluster-storage-operator-54d959f78b-v4vf2", "service": "vsphere-problem-detector-metrics" }, "value": [ 1608124501.267, "4" ] }, vsphere_node_check_total.CollectNodeESXiVersion and vsphere_node_check_total.CollectNodeHWVersion have value 2. { "metric": { "__name__": "vsphere_node_check_total", "check": "CollectNodeESXiVersion", "container": "vsphere-problem-detector", "endpoint": "https", "instance": "10.129.0.64:8444", "job": "vsphere-problem-detector-metrics", "namespace": "openshift-cluster-storage-operator", "node": "piqin-1216-qffmm-worker-lh6vx", "pod": "cluster-storage-operator-54d959f78b-v4vf2", "service": "vsphere-problem-detector-metrics" }, "value": [ 1608124501.267, "2" ] }, { "metric": { "__name__": "vsphere_node_check_total", "check": "CollectNodeHWVersion", "container": "vsphere-problem-detector", "endpoint": "https", "instance": "10.129.0.64:8444", "job": "vsphere-problem-detector-metrics", "namespace": "openshift-cluster-storage-operator", "node": "piqin-1216-qffmm-worker-lh6vx", "pod": "cluster-storage-operator-54d959f78b-v4vf2", "service": "vsphere-problem-detector-metrics" }, "value": [ 1608124501.267, "2" ] } No metric for vsphere_node_check_total.CheckNodeProviderID Expected results: Add vsphere_node_check_total.CheckNodeProviderID metric and correct the vsphere_node_check_total.CheckNodeDiskUUID metric. Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
Verified with: 4.7.0-0.nightly-2021-01-05-220959
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633