Bug 2108054 - Report alert when upstream CSI driver is found
Summary: Report alert when upstream CSI driver is found
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.12.0
Assignee: Jan Safranek
QA Contact: Wei Duan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-18 11:43 UTC by Jan Safranek
Modified: 2023-01-17 19:53 UTC (History)
0 users

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-17 19:53:01 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-storage-operator pull 305 0 None open Bug 2108054: Add alert about unsupported CSI driver 2022-08-02 12:31:36 UTC
Github openshift cluster-storage-operator pull 308 0 None open Bug 2108054: Allow Prometheus to scan openshift-cluster-csi-drivers namespace 2022-08-17 11:31:16 UTC
Github openshift library-go pull 1384 0 None open Bug 2108054: Add ReadGenericWithUnstructured 2022-08-05 10:14:51 UTC
Github openshift vmware-vsphere-csi-driver-operator pull 100 0 None open Bug 2108054: Reset metrics only after all check succeed 2022-07-18 16:02:03 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:53:15 UTC

Description Jan Safranek 2022-07-18 11:43:51 UTC
There should be an info-level alert when non-OCP vSphere CSI driver is found.

In https://bugzilla.redhat.com/show_bug.cgi?id=2089419 I've added an metric that an upstream CSI driver is installed, but OCP does not emit any alert for it.

This must also include configuration of OCP to actually collect metrics from vSphere CSI driver operator.

Comment 2 Jan Safranek 2022-07-20 08:41:21 UTC
We need to define the alert in CSO.

Comment 4 Wei Duan 2022-08-10 13:25:28 UTC
1. unsupported csi.vsphere.vmware.com driver detected
$ oc get co storage -o yaml
  - lastTransitionTime: "2022-08-10T03:20:15Z"
    message: 'VSphereCSIDriverOperatorCRUpgradeable: VMwareVSphereControllerUpgradeable:
      found existing unsupported csi.vsphere.vmware.com driver'
    reason: AsExpected
    status: "True"
    type: Upgradeable

2. no Alert fired:
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/alerts' | jq -r '.data.alerts[] | {alertname: .labels.alertname, state: .state}'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2048    0  2048    0     0   111k      0 --:--:-- --:--:-- --:--:--  111k
{
  "alertname": "AlertmanagerReceiversNotConfigured",
  "state": "firing"
}
{
  "alertname": "Watchdog",
  "state": "firing"
}
{
  "alertname": "PodSecurityViolation",
  "state": "firing"
}


3. Seems vsphere_csi_driver_error metric is not present
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -i vsphere
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 75855    0 75855    0     0  5291k      0 --:--:-- --:--:-- --:--:-- 5291k
    "cloudprovider_vsphere_vcenter_versions",
    "cluster:vsphere_esxi_version_total:sum",
    "cluster:vsphere_node_hw_version_total:sum",
    "cluster:vsphere_vcenter_info:sum",
    "vsphere_cluster_check_errors",
    "vsphere_cluster_check_total",
    "vsphere_datastore_total",
    "vsphere_esxi_version_total",
    "vsphere_node_check_errors",
    "vsphere_node_check_total",
    "vsphere_node_hw_version_total",
    "vsphere_rwx_volumes_total",
    "vsphere_sync_errors",

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -i vsphere_csi
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 75855    0 75855    0     0  3220k      0 --:--:-- --:--:-- --:--:-- 3220k

Comment 5 Wei Duan 2022-08-10 13:33:08 UTC
The cluster is upgraded from 4.9 with upstream CSI Driver to 4.12.0-0.nightly-2022-08-09-223806.

Comment 6 Jan Safranek 2022-08-17 11:23:40 UTC
I forgot to add RBAC rules to give Prometheus to scan openshift-cluster-csi-drivers namespace.

Comment 8 Wei Duan 2022-08-25 11:55:48 UTC
Alert raise on 4.12.0-0.nightly-2022-08-24-053339.

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/alerts' | jq -r '.data.alerts[] | select(.labels.alertname == "UnsupportedCSIDriverInstalled")'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4169    0  4169    0     0   313k      0 --:--:-- --:--:-- --:--:--  313k
{
  "labels": {
    "alertname": "UnsupportedCSIDriverInstalled",
    "condition": "install_blocked",
    "container": "vmware-vsphere-csi-driver-operator",
    "endpoint": "vsphere-omp",
    "failure_reason": "existing_driver_found",
    "instance": "10.128.0.27:8445",
    "job": "vmware-vsphere-csi-driver-operator-metrics",
    "namespace": "openshift-cluster-csi-drivers",
    "pod": "vmware-vsphere-csi-driver-operator-858bbfcb6f-9jmr2",
    "service": "vmware-vsphere-csi-driver-operator-metrics",
    "severity": "info"
  },
  "annotations": {
    "description": "OpenShift has detected that an unsupported version of vSphere CSI driver is installed.\nIt is OK to use this CSI driver for now, however, Red Hat does not support it.\nIn a future OpenShift version it will be required to use OpenShift's version of the CSI\ndriver to correctly migrate vSphere PersistentVolumes to CSI. Please consult OpenShift\nrelease notes before upgrading to the next version.\nTo get a version of the CSI driver supported by Red Hat, uninstall the CSI driver,\nincluding its Deployment, DaemonSet and CSIDriver objects and OpenShift will\nautomatically install a supported version of the CSI driver.\n",
    "message": "An unsupported version of vSphere CSI driver installation detected.",
    "summary": "Unsupported VSphere CSI driver installed"
  },
  "state": "firing",
  "activeAt": "2022-08-25T10:13:04.427695578Z",
  "value": "1e+00"
}


Mark status as "Verified".

Comment 11 errata-xmlrpc 2023-01-17 19:53:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399


Note You need to log in before you can comment on or make changes to this bug.