It looks like registry is disrupted during upgrade due to not having graceful shutdown. Now that we have fixed the router, this is purely workload level. Happens on all platforms that I can see. https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-ci-4.8-e2e-azure-upgrade&include-filter-by-regex=remain%20available https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-e2e-azure-upgrade/1405002280675053568 Needs investigation, it's like the router should gracefully react to term and attempt to drain fast connections and interrupt slow connections. This bug will be used to block making the flake into a failure.
The registry already has shutdown gracefully [1][2]. Not sure if there anything else we can do on the registry side. [1]: https://github.com/openshift/image-registry/pull/192 [2]: https://github.com/openshift/cluster-image-registry-operator/blob/6e88375a583645f65179836027954021eb5fdd30/test/e2e/graceful_shutdown_test.go#L97
No progress so far. I suspect it might be related to problems with `Application behind service load balancer with PDB is not disrupted`.
Until now image registry test is still flake, since it's 9/16 data, will check this afternoon again. https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-ci-openshift-release-master-ci-4.10-e2e-azure-upgrade&include-filter-by-regex=remain%20available https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-azure-upgrade/1438587460106850304
That's a tricky one. In order to have upgrades without disruptions, the fix should be in the previous release. Old pods shouldn't disappear immediately, they should give OCP some time to cleanup before they are gone. So I'd expect flakes to stay there until 4.9 BZ is merged.
Thanks @oleg's response, will verify it first.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056