Bug 1772564
| Summary: | need alerts for aggregated API metrics | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | David Eads <deads> | |
| Component: | kube-apiserver | Assignee: | Lili Cosic <lcosic> | |
| Status: | CLOSED ERRATA | QA Contact: | Ke Wang <kewang> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.3.0 | CC: | aos-bugs, lszaszki, mfojtik, nagrawal, sttts, xxia | |
| Target Milestone: | --- | |||
| Target Release: | 4.4.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Release Note | ||
| Doc Text: |
Added AggregatedAPIErrors prometheus alert: An aggregated API has reported errors. The number of errors have increased for it in the past five minutes. High values indicate that the availability of the service changes too often.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1810424 (view as bug list) | Environment: | ||
| Last Closed: | 2020-05-04 11:15:35 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1810424 | |||
| Bug Blocks: | ||||
|
Description
David Eads
2019-11-14 15:58:30 UTC
I am assigning it to Lili as she will use it to merge https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/358 into the master branch. Checked the latest OCP 4.4 nightly build, the related PR https://github.com/openshift/cluster-kube-apiserver-operator/pull/746 has not been merged in. [ke@ke-fedora cluster-kube-apiserver-operator]$ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-03-26-041820 |grep kube-apiserver cluster-kube-apiserver-operator https://github.com/openshift/cluster-kube-apiserver-operator c1adad084525c6252b5725d237b59d1068d145a2 [ke@ke-fedora cluster-kube-apiserver-operator]$ git log --date local --pretty="%h %an %cd - %s" c1adad08 | grep '#746' Nothing found. @Ke the alerts have been added in https://github.com/openshift/cluster-monitoring-operator/pull/669, could you check one more time? @Lukasz, I checked PR 669 with OCP build 4.4.0-0.nightly-2020-03-26-041820, it's already in.
[ke@ke-fedora cluster-monitoring-operator]$ git log --date local --pretty="%h %an %cd - %s" 76b306f2 | grep '#669'
dfb08550 OpenShift Merge Robot Sat Mar 7 05:52:14 2020 - Merge pull request #669 from lilic/update-deps-4.4
Verified with OCP build 4.4.0-0.nightly-2020-03-26-225521.
Verification steps,
1. Make some apiservice fail, e.g. remove openshift-apiserver by:
$ oc patch openshiftapiserver cluster --type=json -p '[{"op": "replace", "path": "/spec/managementState", "value": "Removed"}]'
2. Wait for a while about more than 5 minutes, try the following command line,
$ TK=`oc sa get-token cluster-monitoring-operator -n openshift-monitoring`
$ oc -n openshift-monitoring exec -c alertmanager alertmanager-main-0 -- curl -s -k -H "Authorization: Bearer $TK" https://localhost:9095/api/v1/alerts | jq -r '.data[] | select(.labels.alertname=="AggregatedAPIDown") | .labels'
{
"alertname": "AggregatedAPIDown",
"name": "v1.security.openshift.io",
"namespace": "default",
"prometheus": "openshift-monitoring/k8s",
"severity": "warning"
}
{
"alertname": "AggregatedAPIDown",
"name": "v1.authorization.openshift.io",
"namespace": "default",
"prometheus": "openshift-monitoring/k8s",
"severity": "warning"
}
{
"alertname": "AggregatedAPIDown",
"name": "v1.oauth.openshift.io",
"namespace": "default",
"prometheus": "openshift-monitoring/k8s",
"severity": "warning"
}
{
"alertname": "AggregatedAPIDown",
"name": "v1.image.openshift.io",
"namespace": "default",
"prometheus": "openshift-monitoring/k8s",
"severity": "warning"
}
{
"alertname": "AggregatedAPIDown",
"name": "v1.route.openshift.io",
"namespace": "default",
"prometheus": "openshift-monitoring/k8s",
"severity": "warning"
}
{
"alertname": "AggregatedAPIDown",
"name": "v1.project.openshift.io",
"namespace": "default",
"prometheus": "openshift-monitoring/k8s",
"severity": "warning"
}
{
"alertname": "AggregatedAPIDown",
"name": "v1.build.openshift.io",
"namespace": "default",
"prometheus": "openshift-monitoring/k8s",
"severity": "warning"
}
{
"alertname": "AggregatedAPIDown",
"name": "v1.apps.openshift.io",
"namespace": "default",
"prometheus": "openshift-monitoring/k8s",
"severity": "warning"
}
{
"alertname": "AggregatedAPIDown",
"name": "v1.user.openshift.io",
"namespace": "default",
"prometheus": "openshift-monitoring/k8s",
"severity": "warning"
}
{
"alertname": "AggregatedAPIDown",
"name": "v1.template.openshift.io",
"namespace": "default",
"prometheus": "openshift-monitoring/k8s",
"severity": "warning"
}
{
"alertname": "AggregatedAPIDown",
"name": "v1.quota.openshift.io",
"namespace": "default",
"prometheus": "openshift-monitoring/k8s",
"severity": "warning"
}
Total 11 APIs down.
We can see the feature works well with PR merged into OCP build.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |