Bug 1972827
Summary: | image registry does not remain available during upgrade | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> |
Component: | Image Registry | Assignee: | Oleg Bulatov <obulatov> |
Status: | CLOSED ERRATA | QA Contact: | wewang <wewang> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.9 | CC: | aos-bugs, wking, xiuwang |
Target Milestone: | --- | ||
Target Release: | 4.10.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: the registry were immediately exiting on a shut down request
Consequence: the router didn't have time to discover that the registry pod is gone and could send requests to it
Fix: when the pod is being deleted, keep it alive for few extra seconds to give other components time to discover its deletion
Result: the router doesn't send requests to non-existing pods during upgrades, i.e. there are no disruptions
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-03-10 16:03:59 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2005049 |
Description
Clayton Coleman
2021-06-16 17:15:19 UTC
The registry already has shutdown gracefully [1][2]. Not sure if there anything else we can do on the registry side. [1]: https://github.com/openshift/image-registry/pull/192 [2]: https://github.com/openshift/cluster-image-registry-operator/blob/6e88375a583645f65179836027954021eb5fdd30/test/e2e/graceful_shutdown_test.go#L97 No progress so far. I suspect it might be related to problems with `Application behind service load balancer with PDB is not disrupted`. Until now image registry test is still flake, since it's 9/16 data, will check this afternoon again. https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-ci-openshift-release-master-ci-4.10-e2e-azure-upgrade&include-filter-by-regex=remain%20available https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-azure-upgrade/1438587460106850304 That's a tricky one. In order to have upgrades without disruptions, the fix should be in the previous release. Old pods shouldn't disappear immediately, they should give OCP some time to cleanup before they are gone. So I'd expect flakes to stay there until 4.9 BZ is merged. Thanks @oleg's response, will verify it first. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |