Hide Forgot
There should be an info-level alert when non-OCP vSphere CSI driver is found. In https://bugzilla.redhat.com/show_bug.cgi?id=2089419 I've added an metric that an upstream CSI driver is installed, but OCP does not emit any alert for it. This must also include configuration of OCP to actually collect metrics from vSphere CSI driver operator.
We need to define the alert in CSO.
1. unsupported csi.vsphere.vmware.com driver detected $ oc get co storage -o yaml - lastTransitionTime: "2022-08-10T03:20:15Z" message: 'VSphereCSIDriverOperatorCRUpgradeable: VMwareVSphereControllerUpgradeable: found existing unsupported csi.vsphere.vmware.com driver' reason: AsExpected status: "True" type: Upgradeable 2. no Alert fired: $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/alerts' | jq -r '.data.alerts[] | {alertname: .labels.alertname, state: .state}' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 2048 0 2048 0 0 111k 0 --:--:-- --:--:-- --:--:-- 111k { "alertname": "AlertmanagerReceiversNotConfigured", "state": "firing" } { "alertname": "Watchdog", "state": "firing" } { "alertname": "PodSecurityViolation", "state": "firing" } 3. Seems vsphere_csi_driver_error metric is not present $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -i vsphere % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 75855 0 75855 0 0 5291k 0 --:--:-- --:--:-- --:--:-- 5291k "cloudprovider_vsphere_vcenter_versions", "cluster:vsphere_esxi_version_total:sum", "cluster:vsphere_node_hw_version_total:sum", "cluster:vsphere_vcenter_info:sum", "vsphere_cluster_check_errors", "vsphere_cluster_check_total", "vsphere_datastore_total", "vsphere_esxi_version_total", "vsphere_node_check_errors", "vsphere_node_check_total", "vsphere_node_hw_version_total", "vsphere_rwx_volumes_total", "vsphere_sync_errors", $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -i vsphere_csi % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 75855 0 75855 0 0 3220k 0 --:--:-- --:--:-- --:--:-- 3220k
The cluster is upgraded from 4.9 with upstream CSI Driver to 4.12.0-0.nightly-2022-08-09-223806.
I forgot to add RBAC rules to give Prometheus to scan openshift-cluster-csi-drivers namespace.
Alert raise on 4.12.0-0.nightly-2022-08-24-053339. $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/alerts' | jq -r '.data.alerts[] | select(.labels.alertname == "UnsupportedCSIDriverInstalled")' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 4169 0 4169 0 0 313k 0 --:--:-- --:--:-- --:--:-- 313k { "labels": { "alertname": "UnsupportedCSIDriverInstalled", "condition": "install_blocked", "container": "vmware-vsphere-csi-driver-operator", "endpoint": "vsphere-omp", "failure_reason": "existing_driver_found", "instance": "10.128.0.27:8445", "job": "vmware-vsphere-csi-driver-operator-metrics", "namespace": "openshift-cluster-csi-drivers", "pod": "vmware-vsphere-csi-driver-operator-858bbfcb6f-9jmr2", "service": "vmware-vsphere-csi-driver-operator-metrics", "severity": "info" }, "annotations": { "description": "OpenShift has detected that an unsupported version of vSphere CSI driver is installed.\nIt is OK to use this CSI driver for now, however, Red Hat does not support it.\nIn a future OpenShift version it will be required to use OpenShift's version of the CSI\ndriver to correctly migrate vSphere PersistentVolumes to CSI. Please consult OpenShift\nrelease notes before upgrading to the next version.\nTo get a version of the CSI driver supported by Red Hat, uninstall the CSI driver,\nincluding its Deployment, DaemonSet and CSIDriver objects and OpenShift will\nautomatically install a supported version of the CSI driver.\n", "message": "An unsupported version of vSphere CSI driver installation detected.", "summary": "Unsupported VSphere CSI driver installed" }, "state": "firing", "activeAt": "2022-08-25T10:13:04.427695578Z", "value": "1e+00" } Mark status as "Verified".
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399