Bug 1908344 - [vsphere-problem-detector] CheckNodeProviderID and CheckNodeDiskUUID have the same name
Summary: [vsphere-problem-detector] CheckNodeProviderID and CheckNodeDiskUUID have the...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.7.0
Assignee: Jan Safranek
QA Contact: Qin Ping
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-16 13:31 UTC by Qin Ping
Modified: 2021-02-24 15:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:45:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift vsphere-problem-detector pull 16 0 None closed Bug 1908344: Fix CheckNodeProviderID name 2021-01-15 09:57:48 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:45:25 UTC

Description Qin Ping 2020-12-16 13:31:23 UTC
Description of problem:
CheckNodeProviderID and CheckNodeDiskUUID have the same name: CheckNodeDiskUUID, this makes the metrics result is not correct.

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2020-12-14-165231

How reproducible:
Always

Steps to Reproduce:
1. Install an OCP cluster on vsphere
2. Check the metrics
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=vsphere_node_check_total' | jq
3.

Actual results:
We can see after 8 hours:
vsphere_node_check_total.CheckNodeDiskUUID has value 4.
      {
        "metric": {
          "__name__": "vsphere_node_check_total",
          "check": "CheckNodeDiskUUID",
          "container": "vsphere-problem-detector",
          "endpoint": "https",
          "instance": "10.129.0.64:8444",
          "job": "vsphere-problem-detector-metrics",
          "namespace": "openshift-cluster-storage-operator",
          "node": "piqin-1216-qffmm-master-0",
          "pod": "cluster-storage-operator-54d959f78b-v4vf2",
          "service": "vsphere-problem-detector-metrics"
        },
        "value": [
          1608124501.267,
          "4"
        ]
      },
vsphere_node_check_total.CollectNodeESXiVersion and vsphere_node_check_total.CollectNodeHWVersion have value 2.
      {
        "metric": {
          "__name__": "vsphere_node_check_total",
          "check": "CollectNodeESXiVersion",
          "container": "vsphere-problem-detector",
          "endpoint": "https",
          "instance": "10.129.0.64:8444",
          "job": "vsphere-problem-detector-metrics",
          "namespace": "openshift-cluster-storage-operator",
          "node": "piqin-1216-qffmm-worker-lh6vx",
          "pod": "cluster-storage-operator-54d959f78b-v4vf2",
          "service": "vsphere-problem-detector-metrics"
        },
        "value": [
          1608124501.267,
          "2"
        ]
      },

      {
        "metric": {
          "__name__": "vsphere_node_check_total",
          "check": "CollectNodeHWVersion",
          "container": "vsphere-problem-detector",
          "endpoint": "https",
          "instance": "10.129.0.64:8444",
          "job": "vsphere-problem-detector-metrics",
          "namespace": "openshift-cluster-storage-operator",
          "node": "piqin-1216-qffmm-worker-lh6vx",
          "pod": "cluster-storage-operator-54d959f78b-v4vf2",
          "service": "vsphere-problem-detector-metrics"
        },
        "value": [
          1608124501.267,
          "2"
        ]
      }
No metric for vsphere_node_check_total.CheckNodeProviderID


Expected results:
Add vsphere_node_check_total.CheckNodeProviderID metric and correct the vsphere_node_check_total.CheckNodeDiskUUID metric.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 2 Qin Ping 2021-01-06 06:00:09 UTC
Verified with: 4.7.0-0.nightly-2021-01-05-220959

Comment 5 errata-xmlrpc 2021-02-24 15:45:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.