Bug 1888866
Summary: | AggregatedAPIDown permanently firing after removing APIService | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Seth Jennings <sjenning> | ||||
Component: | kube-apiserver | Assignee: | Damien Grisonnet <dgrisonn> | ||||
Status: | CLOSED ERRATA | QA Contact: | Ke Wang <kewang> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 4.6 | CC: | alegrand, anpicker, aos-bugs, dgrisonn, erooth, kakkoyun, kechung, lcosic, lmartinh, mfojtik, palonsor, pkrupa, spasquie, surbania, xxia | ||||
Target Milestone: | --- | ||||||
Target Release: | 4.7.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1915247 1916660 (view as bug list) | Environment: | |||||
Last Closed: | 2021-02-24 15:26:23 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1915247 | ||||||
Attachments: |
|
Description
Seth Jennings
2020-10-16 01:27:32 UTC
Created attachment 1721947 [details]
console.png
See attached console screenshot. APIServices triggering the alert do not exist anymore.
$ oc get apiservices | grep False
<nothing returned, all apiservices are responding>
$ oc get apiservices | grep v1.admission.hive.openshift.io
<nothing returned, apiservice triggering alert does not exist>
I'm observing this issue as well, does a workaround (even an unsupported one) exist? I have an ephemeral monitoring stack and tried everything from deleting pods, prometheusrules, even 'oc delete --raw /metrics' but I wait a few minutes and this alert still ends up triggering in my dashboard: An aggregated API <name of the apiservice>/default has been only 0% available over the last 5m. Also, this was previously reported upstream in GitHub, but there doesn't seem to be progress there. I'm going to link it to this BZ. There is, unfortunately, no workaround for this. However, I noticed that this only affect the API services that were deleted while being unavailable. Maybe this information can help you somehow.
> Also, this was previously reported upstream in GitHub, but there doesn't seem to be progress there. I'm going to link it to this BZ.
Yes, I tried to give it some traction but without any result. Hence, I am currently working on a PR to fix the issue.
I linked the PR I opened against Kubernetes to this BZ. The upstream PR being LGTM, it is now in the hand of the api team to cherry-pick the fix. We need 4.6 and 4.5 backports of this. A workaround would be to restart the kube-apiservers after deleting the APIService. It should allow to silence the AggregatedAPIDown alert. Lowering to high priority and severity as a workaround exists. Manually moving this BZ to MODIFIED as the upstream PR was synced in 4.7 by the 1.20 rebase. https://github.com/openshift/kubernetes/pull/471/commits/b525f9e0ed0003471438fb42fa37ff4ebe36d653 $ oc adm release info --commits registry.ci.openshift.org/ocp/release:4.7.0-0.nightly-2021-01-18-214951 | grep 'hyperkube' hyperkube https://github.com/openshift/kubernetes d9c52cc4e02894215b0d1c2aeea240fe77765c66 $ cd kubernetes $ git pull $ git log --date=local --pretty="%h %an %cd - %s" d9c52cc4 | grep '#96421' 5ed4b76a03b Kubernetes Prow Robot Thu Nov 26 23:24:19 2020 - Merge pull request #96421 from dgrisonnet/fix-apiservice-availability $ git log --date=local --pretty="%h %an %cd - %s" d9c52cc4 | grep '#92671' No results found. The PR 92671 has not been loaded on the latest OCP 4.7 payload, will wait it loading. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |