Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1908344

Summary: [vsphere-problem-detector] CheckNodeProviderID and CheckNodeDiskUUID have the same name
Product: OpenShift Container Platform Reporter: Qin Ping <piqin>
Component: StorageAssignee: Jan Safranek <jsafrane>
Storage sub component: Operators QA Contact: Qin Ping <piqin>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs, jsafrane, xtian
Version: 4.7   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:45:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qin Ping 2020-12-16 13:31:23 UTC
Description of problem:
CheckNodeProviderID and CheckNodeDiskUUID have the same name: CheckNodeDiskUUID, this makes the metrics result is not correct.

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2020-12-14-165231

How reproducible:
Always

Steps to Reproduce:
1. Install an OCP cluster on vsphere
2. Check the metrics
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=vsphere_node_check_total' | jq
3.

Actual results:
We can see after 8 hours:
vsphere_node_check_total.CheckNodeDiskUUID has value 4.
      {
        "metric": {
          "__name__": "vsphere_node_check_total",
          "check": "CheckNodeDiskUUID",
          "container": "vsphere-problem-detector",
          "endpoint": "https",
          "instance": "10.129.0.64:8444",
          "job": "vsphere-problem-detector-metrics",
          "namespace": "openshift-cluster-storage-operator",
          "node": "piqin-1216-qffmm-master-0",
          "pod": "cluster-storage-operator-54d959f78b-v4vf2",
          "service": "vsphere-problem-detector-metrics"
        },
        "value": [
          1608124501.267,
          "4"
        ]
      },
vsphere_node_check_total.CollectNodeESXiVersion and vsphere_node_check_total.CollectNodeHWVersion have value 2.
      {
        "metric": {
          "__name__": "vsphere_node_check_total",
          "check": "CollectNodeESXiVersion",
          "container": "vsphere-problem-detector",
          "endpoint": "https",
          "instance": "10.129.0.64:8444",
          "job": "vsphere-problem-detector-metrics",
          "namespace": "openshift-cluster-storage-operator",
          "node": "piqin-1216-qffmm-worker-lh6vx",
          "pod": "cluster-storage-operator-54d959f78b-v4vf2",
          "service": "vsphere-problem-detector-metrics"
        },
        "value": [
          1608124501.267,
          "2"
        ]
      },

      {
        "metric": {
          "__name__": "vsphere_node_check_total",
          "check": "CollectNodeHWVersion",
          "container": "vsphere-problem-detector",
          "endpoint": "https",
          "instance": "10.129.0.64:8444",
          "job": "vsphere-problem-detector-metrics",
          "namespace": "openshift-cluster-storage-operator",
          "node": "piqin-1216-qffmm-worker-lh6vx",
          "pod": "cluster-storage-operator-54d959f78b-v4vf2",
          "service": "vsphere-problem-detector-metrics"
        },
        "value": [
          1608124501.267,
          "2"
        ]
      }
No metric for vsphere_node_check_total.CheckNodeProviderID


Expected results:
Add vsphere_node_check_total.CheckNodeProviderID metric and correct the vsphere_node_check_total.CheckNodeDiskUUID metric.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 2 Qin Ping 2021-01-06 06:00:09 UTC
Verified with: 4.7.0-0.nightly-2021-01-05-220959

Comment 5 errata-xmlrpc 2021-02-24 15:45:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633