Bug 1905141 - vsphere-problem-detector: report metrics through telemetry
Summary: vsphere-problem-detector: report metrics through telemetry
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.7.0
Assignee: Jan Safranek
QA Contact: Qin Ping
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-07 16:07 UTC by Jan Safranek
Modified: 2021-02-24 15:40 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:40:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1037 0 None closed Bug 1905141: Add vsphere-problem-detector to telemetry 2021-02-01 06:15:57 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:40:49 UTC

Description Jan Safranek 2020-12-07 16:07:22 UTC
As described in https://github.com/openshift/enhancements/blob/master/enhancements/storage/vsphere-problem-detector.md#metrics, we want to get through telemetry:

* List of failed periodic tests.
* HW version of vSphere VMs.
* vCenter version.
* ESXi host version (of each host).

As a stretch goal:
* List of installed storage plugins (3rd party vendor VIBs) - if possible.
* List of enabled features:
  * HA
  * DRS
  * SDRS w/DatastoreCluster

Comment 2 Qin Ping 2021-01-22 13:13:25 UTC
Tried to verify this bug with: 4.7.0-0.nightly-2021-01-21-235301
Can find the metrics in the telemeter client, but looks like these metrics are not pushed to the telemeter server. 

@Jan Could you help check if we need to push the metrics to the telemeter server? Thanks!



$ oc -n openshift-monitoring get cm telemetry-config -o jsonpath="{.data.metrics\.yaml}"|grep vsphere
- '{__name__="cluster:vsphere_vcenter_info:sum"}'
- '{__name__="cluster:vsphere_esxi_version_total:sum"}'
- '{__name__="cluster:vsphere_node_hw_version_total:sum"}'

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=cluster:vsphere_vcenter_info:sum'|jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "cluster:vsphere_vcenter_info:sum",
          "version": "7.0.0"
        },
        "value": [
          1611320313.197,
          "1"
        ]
      }
    ]
  }
}

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=cluster:vsphere_esxi_version_total:sum'|jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "cluster:vsphere_esxi_version_total:sum",
          "version": "7.0.0"
        },
        "value": [
          1611320397.471,
          "3"
        ]
      }
    ]
  }
}

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=cluster:vsphere_node_hw_version_total:sum'|jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "cluster:vsphere_node_hw_version_total:sum",
          "hw_version": "vmx-13"
        },
        "value": [
          1611320510.435,
          "5"
        ]
      }
    ]
  }
}

Comment 3 Jan Safranek 2021-01-25 16:14:34 UTC
I opened https://gitlab.cee.redhat.com/observatorium/configuration/-/merge_requests/270 to add the new metrics to observatorium allow list, hoping that's all what is needed.

Comment 13 errata-xmlrpc 2021-02-24 15:40:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.