As described in https://github.com/openshift/enhancements/blob/master/enhancements/storage/vsphere-problem-detector.md#metrics, we want to get through telemetry: * List of failed periodic tests. * HW version of vSphere VMs. * vCenter version. * ESXi host version (of each host). As a stretch goal: * List of installed storage plugins (3rd party vendor VIBs) - if possible. * List of enabled features: * HA * DRS * SDRS w/DatastoreCluster
Tried to verify this bug with: 4.7.0-0.nightly-2021-01-21-235301 Can find the metrics in the telemeter client, but looks like these metrics are not pushed to the telemeter server. @Jan Could you help check if we need to push the metrics to the telemeter server? Thanks! $ oc -n openshift-monitoring get cm telemetry-config -o jsonpath="{.data.metrics\.yaml}"|grep vsphere - '{__name__="cluster:vsphere_vcenter_info:sum"}' - '{__name__="cluster:vsphere_esxi_version_total:sum"}' - '{__name__="cluster:vsphere_node_hw_version_total:sum"}' $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=cluster:vsphere_vcenter_info:sum'|jq { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "cluster:vsphere_vcenter_info:sum", "version": "7.0.0" }, "value": [ 1611320313.197, "1" ] } ] } } $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=cluster:vsphere_esxi_version_total:sum'|jq { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "cluster:vsphere_esxi_version_total:sum", "version": "7.0.0" }, "value": [ 1611320397.471, "3" ] } ] } } $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=cluster:vsphere_node_hw_version_total:sum'|jq { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "cluster:vsphere_node_hw_version_total:sum", "hw_version": "vmx-13" }, "value": [ 1611320510.435, "5" ] } ] } }
I opened https://gitlab.cee.redhat.com/observatorium/configuration/-/merge_requests/270 to add the new metrics to observatorium allow list, hoping that's all what is needed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633