test: [sig-arch] Check if alerts are firing during or after upgrade success is failing frequently in CI, see search results: https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-arch%5C%5D+Check+if+alerts+are+firing+during+or+after+upgrade+success https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade/1374760429821104128 fail [github.com/openshift/origin/test/extended/util/disruption/controlplane/controlplane.go:118]: Mar 24 18:27:40.333: API "oauth-api-available-new-connections" was unreachable during disruption for at least 22s of 1h17m37s (0%): Mar 24 17:19:22.233 E oauth-apiserver-new-connection oauth-apiserver-new-connection started failing: Get "https://api.**************-8d118.origin-ci-int-aws.dev.rhcloud.com:6443/apis/oauth.openshift.io/v1/oauthclients": dial tcp 3.216.225.134:6443: connect: connection refused
In the linked job [1], the relevant API-server alert was: alert AggregatedAPIDown fired for 180 seconds with labels: {name="v1beta1.metrics.k8s.io", namespace="default", severity="warning"} I'm a bit fuzzy on the details, but this might be a dup of bug 1928946. If so, probably mention the AggregatedAPIDown alert and: [sig-arch] Check if alerts are firing during or after upgrade success test-case in that bug, to help Sippy find it. [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade/1374760429821104128
In the same run: 4 kube-apiserver reports a non-graceful termination. Probably kubelet or CRI-O is not giving the time to cleanly shut down. This can lead to connection refused and network I/O timeout errors in other components. ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-219-80.ec2.internal node/ip-10-0-219-80 - reason/NonGracefulTermination Previous pod kube-apiserver-ip-10-0-219-80.ec2.internal started at 2021-03-24 17:47:35.168007668 +0000 UTC did not terminate gracefully ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-129-73.ec2.internal node/ip-10-0-129-73 - reason/NonGracefulTermination Previous pod kube-apiserver-ip-10-0-129-73.ec2.internal started at 2021-03-24 17:51:42.007655855 +0000 UTC did not terminate gracefully ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-129-73.ec2.internal node/ip-10-0-129-73 - reason/NonGracefulTermination Previous pod kube-apiserver-ip-10-0-129-73.ec2.internal started at 2021-03-24 17:51:42.007655855 +0000 UTC did not terminate gracefully ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-145-108.ec2.internal node/ip-10-0-145-108 - reason/NonGracefulTermination Previous pod kube-apiserver-ip-10-0-145-108.ec2.internal started at 2021-03-24 17:56:45.743304935 +0000 UTC did not terminate gracefully *** This bug has been marked as a duplicate of bug 1928946 ***