Bug 1942740
| Summary: | [sig-arch] Check if alerts are firing during or after upgrade success | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Michael Gugino <mgugino> |
| Component: | kube-apiserver | Assignee: | Stefan Schimanski <sttts> |
| Status: | CLOSED DUPLICATE | QA Contact: | Ke Wang <kewang> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.8 | CC: | aos-bugs, mfojtik, wking, xxia |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: |
[sig-arch] Check if alerts are firing during or after upgrade success
|
|
| Last Closed: | 2021-03-25 08:47:04 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Michael Gugino
2021-03-24 20:02:28 UTC
In the linked job [1], the relevant API-server alert was:
alert AggregatedAPIDown fired for 180 seconds with labels: {name="v1beta1.metrics.k8s.io", namespace="default", severity="warning"}
I'm a bit fuzzy on the details, but this might be a dup of bug 1928946. If so, probably mention the AggregatedAPIDown alert and:
[sig-arch] Check if alerts are firing during or after upgrade success
test-case in that bug, to help Sippy find it.
[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade/1374760429821104128
In the same run: 4 kube-apiserver reports a non-graceful termination. Probably kubelet or CRI-O is not giving the time to cleanly shut down. This can lead to connection refused and network I/O timeout errors in other components. ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-219-80.ec2.internal node/ip-10-0-219-80 - reason/NonGracefulTermination Previous pod kube-apiserver-ip-10-0-219-80.ec2.internal started at 2021-03-24 17:47:35.168007668 +0000 UTC did not terminate gracefully ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-129-73.ec2.internal node/ip-10-0-129-73 - reason/NonGracefulTermination Previous pod kube-apiserver-ip-10-0-129-73.ec2.internal started at 2021-03-24 17:51:42.007655855 +0000 UTC did not terminate gracefully ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-129-73.ec2.internal node/ip-10-0-129-73 - reason/NonGracefulTermination Previous pod kube-apiserver-ip-10-0-129-73.ec2.internal started at 2021-03-24 17:51:42.007655855 +0000 UTC did not terminate gracefully ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-145-108.ec2.internal node/ip-10-0-145-108 - reason/NonGracefulTermination Previous pod kube-apiserver-ip-10-0-145-108.ec2.internal started at 2021-03-24 17:56:45.743304935 +0000 UTC did not terminate gracefully *** This bug has been marked as a duplicate of bug 1928946 *** |