Bug 2076637
| Summary: | Configure metrics for vsphere driver to be reported | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Hemant Kumar <hekumar> |
| Component: | Storage | Assignee: | Jan Safranek <jsafrane> |
| Storage sub component: | Operators | QA Contact: | Wei Duan <wduan> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | jsafrane |
| Version: | 4.11 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 11:07:44 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Hemant Kumar
2022-04-19 14:26:02 UTC
Adding driver + syncer metrics to Prometheus.
The driver reports regular go metrics (process_open_fds, go_threads, ...), the only driver specific thing is `vsphere_csi_info`:
# HELP vsphere_csi_info CSI Info
# TYPE vsphere_csi_info gauge
vsphere_csi_info{version="09175db5"} 1
The syncer is mostly the same, it has vsphere_syncer_info:
# HELP vsphere_syncer_info Syncer Info
# TYPE vsphere_syncer_info gauge
vsphere_syncer_info{version="09175db5"} 1
(to be honest, the metrics don't look very useful)
Verified passed on 4.11.0-0.nightly-2022-04-23-153426
The following metrics are be albe to reported from Prometheus.
"vsphere_csi_driver_error",
"vsphere_csi_info",
"vsphere_csi_volume_ops_histogram_bucket",
"vsphere_csi_volume_ops_histogram_count",
"vsphere_csi_volume_ops_histogram_sum",
"vsphere_full_sync_ops_histogram_bucket",
"vsphere_full_sync_ops_histogram_count",
"vsphere_full_sync_ops_histogram_sum",
"vsphere_sync_errors",
"vsphere_syncer_info",
Move status to "VERIFIED"
On an upgrade cluser (from 4.10.0-0.nightly-2022-04-23-095048 to 4.11.0-0.nightly-2022-04-23-153426), only vsphere_csi_driver_error and vsphere_sync_errors presnet. Will double check. Move status to "ON_QA" This should be fixed by library-go bump in https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/85 Verified pass on upgrade case:
"vsphere_cns_volume_ops_histogram_bucket",
"vsphere_cns_volume_ops_histogram_count",
"vsphere_cns_volume_ops_histogram_sum",
"vsphere_csi_info",
"vsphere_full_sync_ops_histogram_bucket",
"vsphere_full_sync_ops_histogram_count",
"vsphere_full_sync_ops_histogram_sum",
"vsphere_sync_errors",
"vsphere_syncer_info",
$ oc -n openshift-cluster-csi-drivers get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
vmware-vsphere-csi-driver-controller-metrics ClusterIP 172.30.38.193 <none> 442/TCP,443/TCP,444/TCP,445/TCP,446/TCP,447/TCP 7h48m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |