report apiversion of esxi host and vcenter server
Reporting api version lets have accurate metrics on patch release etc of esxi and vcenter versions.
Test on 4.9.0-0.nightly-2021-09-05-204238: There are 3 masters + 2 workers in cluster: $ oc get node NAME STATUS ROLES AGE VERSION wduan-0906a-rlj9t-master-0 Ready master 108m v1.22.0-rc.0+75ee307 wduan-0906a-rlj9t-master-1 Ready master 108m v1.22.0-rc.0+75ee307 wduan-0906a-rlj9t-master-2 Ready master 108m v1.22.0-rc.0+75ee307 wduan-0906a-rlj9t-worker-cd2rr Ready worker 98m v1.22.0-rc.0+75ee307 wduan-0906a-rlj9t-worker-p5ktm Ready worker 98m v1.22.0-rc.0+75ee307 From the vsphere-problem-detector log, looks like only check the masters: $ oc -n openshift-cluster-storage-operator logs vsphere-problem-detector-operator-fbf45bff-dpjl6 | egrep "ESXi version" I0906 02:00:31.821529 1 node_esxi_version.go:83] Node wduan-0906a-rlj9t-master-0 runs on host host-203583 (10.3.32.4) with ESXi version: 7.0.2 I0906 02:00:31.822167 1 node_esxi_version.go:83] Node wduan-0906a-rlj9t-master-2 runs on host host-259503 (10.3.32.7) with ESXi version: 7.0.2 I0906 02:00:31.822608 1 node_esxi_version.go:83] Node wduan-0906a-rlj9t-master-1 runs on host host-259509 (10.3.32.9) with ESXi version: 7.0.2 Also in vsphere_esxi_version_total metrics the number is "3" $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=vsphere_esxi_version_total' | jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 484 0 484 0 0 32266 0 --:--:-- --:--:-- --:--:-- 32266 { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "vsphere_esxi_version_total", "api_version": "7.0.2.0", "container": "vsphere-problem-detector-operator", "endpoint": "vsphere-metrics", "instance": "10.129.0.4:8444", "job": "vsphere-problem-detector-metrics", "namespace": "openshift-cluster-storage-operator", "pod": "vsphere-problem-detector-operator-fbf45bff-dpjl6", "service": "vsphere-problem-detector-metrics", "version": "7.0.2" }, "value": [ 1630898344.026, "3" ] } ] } } I temporally change the status to "POST", please let me know if this is expected or something I missed here.
The first check probably happens during installation, when there are only master nodes in the cluster, so wait some time for another check.
Checked in another cluster and wait for the second round check, all nodes are reported: $ oc -n openshift-cluster-storage-operator logs vsphere-problem-detector-operator-fbf45bff-l9hkw | grep ESXi I0906 05:04:52.426170 1 node_esxi_version.go:83] Node control-plane-0 runs on host host-259509 (10.3.32.9) with ESXi version: 7.0.2 I0906 05:04:52.426206 1 operator.go:305] CollectNodeESXiVersion:control-plane-0 passed I0906 05:04:52.431054 1 node_esxi_version.go:83] Node control-plane-2 runs on host host-221014 (10.3.32.8) with ESXi version: 7.0.2 I0906 05:04:52.431087 1 operator.go:305] CollectNodeESXiVersion:control-plane-2 passed I0906 05:04:52.443919 1 node_esxi_version.go:83] Node control-plane-1 runs on host host-172909 (10.3.32.5) with ESXi version: 7.0.2 I0906 05:04:52.443952 1 operator.go:305] CollectNodeESXiVersion:control-plane-1 passed I0906 13:04:57.586066 1 node_esxi_version.go:83] Node compute-0 runs on host host-259503 (10.3.32.7) with ESXi version: 7.0.2 I0906 13:04:57.586097 1 operator.go:305] CollectNodeESXiVersion:compute-0 passed I0906 13:04:57.588661 1 node_esxi_version.go:83] Node compute-1 runs on host host-203583 (10.3.32.4) with ESXi version: 7.0.2 I0906 13:04:57.588686 1 operator.go:305] CollectNodeESXiVersion:compute-1 passed I0906 13:04:57.589594 1 node_esxi_version.go:83] Node control-plane-1 runs on host host-172909 (10.3.32.5) with ESXi version: 7.0.2 I0906 13:04:57.589828 1 operator.go:305] CollectNodeESXiVersion:control-plane-1 passed I0906 13:04:57.593048 1 node_esxi_version.go:83] Node control-plane-0 runs on host host-259509 (10.3.32.9) with ESXi version: 7.0.2 I0906 13:04:57.593064 1 operator.go:305] CollectNodeESXiVersion:control-plane-0 passed I0906 13:04:57.597787 1 node_esxi_version.go:83] Node control-plane-2 runs on host host-221014 (10.3.32.8) with ESXi version: 7.0.2 I0906 13:04:57.597806 1 operator.go:305] CollectNodeESXiVersion:control-plane-2 passed { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "vsphere_esxi_version_total", "api_version": "7.0.2.0", "container": "vsphere-problem-detector-operator", "endpoint": "vsphere-metrics", "instance": "10.129.0.3:8444", "job": "vsphere-problem-detector-metrics", "namespace": "openshift-cluster-storage-operator", "pod": "vsphere-problem-detector-operator-fbf45bff-l9hkw", "service": "vsphere-problem-detector-metrics", "version": "7.0.2" }, "value": [ 1630934173.332, "5" ] } ] } } So mark it as "VERIFIED".
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759