Bug 1904503 - vsphere-problem-detector: emit alerts
Summary: vsphere-problem-detector: emit alerts
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.7.0
Assignee: Hemant Kumar
QA Contact: Qin Ping
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-04 16:08 UTC by Jan Safranek
Modified: 2021-02-24 15:40 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:38:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-storage-operator pull 126 0 None closed Bug 1904503: Add prometheus alerts for vsphere 2021-01-19 06:29:55 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:40:03 UTC

Description Jan Safranek 2020-12-04 16:08:43 UTC
vsphere-problem-detector should emit alerts when it finds a serious issue with vSphere configuration.

* Find what to alert on (how long must be a check failing, when to clear the event).
* Update AlertManager.
* Make sure the events are documented, so user know what to do / where to find details when the alert fires.

Comment 2 Qin Ping 2021-01-22 13:31:42 UTC
Verified with: 4.7.0-0.nightly-2021-01-21-235301

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/alerts' |jq|grep VSphereOpenshiftClusterHealthFail -A 15
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7106    0  7106    0     0   408k      0 --:--:-- --:--:-- --:--:--  385k
          "alertname": "VSphereOpenshiftClusterHealthFail",
          "check": "CheckPVs",
          "container": "vsphere-problem-detector-operator",
          "endpoint": "vsphere-metrics",
          "instance": "10.130.0.75:8444",
          "job": "vsphere-problem-detector-metrics",
          "namespace": "openshift-cluster-storage-operator",
          "pod": "vsphere-problem-detector-operator-f46b5cfb-xclx6",
          "service": "vsphere-problem-detector-metrics",
          "severity": "warning"
        },
        "annotations": {
          "message": "VSphere cluster health checks are failing with CheckPVs"
        },
        "state": "firing",
        "activeAt": "2021-01-22T13:20:52.396347327Z",
--
          "alertname": "VSphereOpenshiftClusterHealthFail",
          "check": "CheckStorageClasses",
          "container": "vsphere-problem-detector-operator",
          "endpoint": "vsphere-metrics",
          "instance": "10.130.0.75:8444",
          "job": "vsphere-problem-detector-metrics",
          "namespace": "openshift-cluster-storage-operator",
          "pod": "vsphere-problem-detector-operator-f46b5cfb-xclx6",
          "service": "vsphere-problem-detector-metrics",
          "severity": "warning"
        },
        "annotations": {
          "message": "VSphere cluster health checks are failing with CheckStorageClasses"
        },
        "state": "firing",
        "activeAt": "2021-01-22T13:20:52.396347327Z",

Comment 5 errata-xmlrpc 2021-02-24 15:38:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.