Bug 1772564
Summary: | need alerts for aggregated API metrics | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | David Eads <deads> | |
Component: | kube-apiserver | Assignee: | Lili Cosic <lcosic> | |
Status: | CLOSED ERRATA | QA Contact: | Ke Wang <kewang> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 4.3.0 | CC: | aos-bugs, lszaszki, mfojtik, nagrawal, sttts, xxia | |
Target Milestone: | --- | |||
Target Release: | 4.4.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Release Note | ||
Doc Text: |
Added AggregatedAPIErrors prometheus alert: An aggregated API has reported errors. The number of errors have increased for it in the past five minutes. High values indicate that the availability of the service changes too often.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1810424 (view as bug list) | Environment: | ||
Last Closed: | 2020-05-04 11:15:35 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1810424 | |||
Bug Blocks: |
Description
David Eads
2019-11-14 15:58:30 UTC
I am assigning it to Lili as she will use it to merge https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/358 into the master branch. Checked the latest OCP 4.4 nightly build, the related PR https://github.com/openshift/cluster-kube-apiserver-operator/pull/746 has not been merged in. [ke@ke-fedora cluster-kube-apiserver-operator]$ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-03-26-041820 |grep kube-apiserver cluster-kube-apiserver-operator https://github.com/openshift/cluster-kube-apiserver-operator c1adad084525c6252b5725d237b59d1068d145a2 [ke@ke-fedora cluster-kube-apiserver-operator]$ git log --date local --pretty="%h %an %cd - %s" c1adad08 | grep '#746' Nothing found. @Ke the alerts have been added in https://github.com/openshift/cluster-monitoring-operator/pull/669, could you check one more time? @Lukasz, I checked PR 669 with OCP build 4.4.0-0.nightly-2020-03-26-041820, it's already in. [ke@ke-fedora cluster-monitoring-operator]$ git log --date local --pretty="%h %an %cd - %s" 76b306f2 | grep '#669' dfb08550 OpenShift Merge Robot Sat Mar 7 05:52:14 2020 - Merge pull request #669 from lilic/update-deps-4.4 Verified with OCP build 4.4.0-0.nightly-2020-03-26-225521. Verification steps, 1. Make some apiservice fail, e.g. remove openshift-apiserver by: $ oc patch openshiftapiserver cluster --type=json -p '[{"op": "replace", "path": "/spec/managementState", "value": "Removed"}]' 2. Wait for a while about more than 5 minutes, try the following command line, $ TK=`oc sa get-token cluster-monitoring-operator -n openshift-monitoring` $ oc -n openshift-monitoring exec -c alertmanager alertmanager-main-0 -- curl -s -k -H "Authorization: Bearer $TK" https://localhost:9095/api/v1/alerts | jq -r '.data[] | select(.labels.alertname=="AggregatedAPIDown") | .labels' { "alertname": "AggregatedAPIDown", "name": "v1.security.openshift.io", "namespace": "default", "prometheus": "openshift-monitoring/k8s", "severity": "warning" } { "alertname": "AggregatedAPIDown", "name": "v1.authorization.openshift.io", "namespace": "default", "prometheus": "openshift-monitoring/k8s", "severity": "warning" } { "alertname": "AggregatedAPIDown", "name": "v1.oauth.openshift.io", "namespace": "default", "prometheus": "openshift-monitoring/k8s", "severity": "warning" } { "alertname": "AggregatedAPIDown", "name": "v1.image.openshift.io", "namespace": "default", "prometheus": "openshift-monitoring/k8s", "severity": "warning" } { "alertname": "AggregatedAPIDown", "name": "v1.route.openshift.io", "namespace": "default", "prometheus": "openshift-monitoring/k8s", "severity": "warning" } { "alertname": "AggregatedAPIDown", "name": "v1.project.openshift.io", "namespace": "default", "prometheus": "openshift-monitoring/k8s", "severity": "warning" } { "alertname": "AggregatedAPIDown", "name": "v1.build.openshift.io", "namespace": "default", "prometheus": "openshift-monitoring/k8s", "severity": "warning" } { "alertname": "AggregatedAPIDown", "name": "v1.apps.openshift.io", "namespace": "default", "prometheus": "openshift-monitoring/k8s", "severity": "warning" } { "alertname": "AggregatedAPIDown", "name": "v1.user.openshift.io", "namespace": "default", "prometheus": "openshift-monitoring/k8s", "severity": "warning" } { "alertname": "AggregatedAPIDown", "name": "v1.template.openshift.io", "namespace": "default", "prometheus": "openshift-monitoring/k8s", "severity": "warning" } { "alertname": "AggregatedAPIDown", "name": "v1.quota.openshift.io", "namespace": "default", "prometheus": "openshift-monitoring/k8s", "severity": "warning" } Total 11 APIs down. We can see the feature works well with PR merged into OCP build. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |