Bug 2108054

Summary: Report alert when upstream CSI driver is found
Product: OpenShift Container Platform Reporter: Jan Safranek <jsafrane>
Component: StorageAssignee: Jan Safranek <jsafrane>
Storage sub component: Operators QA Contact: Wei Duan <wduan>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified    
Version: 4.10   
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-17 19:53:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jan Safranek 2022-07-18 11:43:51 UTC
There should be an info-level alert when non-OCP vSphere CSI driver is found.

In https://bugzilla.redhat.com/show_bug.cgi?id=2089419 I've added an metric that an upstream CSI driver is installed, but OCP does not emit any alert for it.

This must also include configuration of OCP to actually collect metrics from vSphere CSI driver operator.

Comment 2 Jan Safranek 2022-07-20 08:41:21 UTC
We need to define the alert in CSO.

Comment 4 Wei Duan 2022-08-10 13:25:28 UTC
1. unsupported csi.vsphere.vmware.com driver detected
$ oc get co storage -o yaml
  - lastTransitionTime: "2022-08-10T03:20:15Z"
    message: 'VSphereCSIDriverOperatorCRUpgradeable: VMwareVSphereControllerUpgradeable:
      found existing unsupported csi.vsphere.vmware.com driver'
    reason: AsExpected
    status: "True"
    type: Upgradeable

2. no Alert fired:
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/alerts' | jq -r '.data.alerts[] | {alertname: .labels.alertname, state: .state}'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2048    0  2048    0     0   111k      0 --:--:-- --:--:-- --:--:--  111k
{
  "alertname": "AlertmanagerReceiversNotConfigured",
  "state": "firing"
}
{
  "alertname": "Watchdog",
  "state": "firing"
}
{
  "alertname": "PodSecurityViolation",
  "state": "firing"
}


3. Seems vsphere_csi_driver_error metric is not present
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -i vsphere
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 75855    0 75855    0     0  5291k      0 --:--:-- --:--:-- --:--:-- 5291k
    "cloudprovider_vsphere_vcenter_versions",
    "cluster:vsphere_esxi_version_total:sum",
    "cluster:vsphere_node_hw_version_total:sum",
    "cluster:vsphere_vcenter_info:sum",
    "vsphere_cluster_check_errors",
    "vsphere_cluster_check_total",
    "vsphere_datastore_total",
    "vsphere_esxi_version_total",
    "vsphere_node_check_errors",
    "vsphere_node_check_total",
    "vsphere_node_hw_version_total",
    "vsphere_rwx_volumes_total",
    "vsphere_sync_errors",

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -i vsphere_csi
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 75855    0 75855    0     0  3220k      0 --:--:-- --:--:-- --:--:-- 3220k

Comment 5 Wei Duan 2022-08-10 13:33:08 UTC
The cluster is upgraded from 4.9 with upstream CSI Driver to 4.12.0-0.nightly-2022-08-09-223806.

Comment 6 Jan Safranek 2022-08-17 11:23:40 UTC
I forgot to add RBAC rules to give Prometheus to scan openshift-cluster-csi-drivers namespace.

Comment 8 Wei Duan 2022-08-25 11:55:48 UTC
Alert raise on 4.12.0-0.nightly-2022-08-24-053339.

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/alerts' | jq -r '.data.alerts[] | select(.labels.alertname == "UnsupportedCSIDriverInstalled")'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4169    0  4169    0     0   313k      0 --:--:-- --:--:-- --:--:--  313k
{
  "labels": {
    "alertname": "UnsupportedCSIDriverInstalled",
    "condition": "install_blocked",
    "container": "vmware-vsphere-csi-driver-operator",
    "endpoint": "vsphere-omp",
    "failure_reason": "existing_driver_found",
    "instance": "10.128.0.27:8445",
    "job": "vmware-vsphere-csi-driver-operator-metrics",
    "namespace": "openshift-cluster-csi-drivers",
    "pod": "vmware-vsphere-csi-driver-operator-858bbfcb6f-9jmr2",
    "service": "vmware-vsphere-csi-driver-operator-metrics",
    "severity": "info"
  },
  "annotations": {
    "description": "OpenShift has detected that an unsupported version of vSphere CSI driver is installed.\nIt is OK to use this CSI driver for now, however, Red Hat does not support it.\nIn a future OpenShift version it will be required to use OpenShift's version of the CSI\ndriver to correctly migrate vSphere PersistentVolumes to CSI. Please consult OpenShift\nrelease notes before upgrading to the next version.\nTo get a version of the CSI driver supported by Red Hat, uninstall the CSI driver,\nincluding its Deployment, DaemonSet and CSIDriver objects and OpenShift will\nautomatically install a supported version of the CSI driver.\n",
    "message": "An unsupported version of vSphere CSI driver installation detected.",
    "summary": "Unsupported VSphere CSI driver installed"
  },
  "state": "firing",
  "activeAt": "2022-08-25T10:13:04.427695578Z",
  "value": "1e+00"
}


Mark status as "Verified".

Comment 11 errata-xmlrpc 2023-01-17 19:53:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399