Bug 1949040
Summary: | image-registry operator is Degraded when upgrade from 4.6.24 to 4.6.0-0.nightly-2021-04-09-145812 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Wenjing Zheng <wzheng> |
Component: | Image Registry | Assignee: | Oleg Bulatov <obulatov> |
Status: | CLOSED ERRATA | QA Contact: | Wenjing Zheng <wzheng> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.6.z | CC: | aivaras.laimikis, aos-bugs, jima, mfuruta, obulatov, openshift-bugzilla-robot, rsandu, wduan, wewang, wking, xiuwang |
Target Milestone: | --- | Keywords: | Regression |
Target Release: | 4.6.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-04-20 19:27:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1897520 | ||
Bug Blocks: |
Description
Wenjing Zheng
2021-04-13 10:08:41 UTC
*** Bug 1949086 has been marked as a duplicate of this bug. *** I'm a bit confused. This bug is now a child of bug 1897520, and is backporting a fix that landed in 4.7 in November. How is it only impacting 4.6 now? Has this been an issue with all 4.6->4.6 updates, and we only noticed now? Or is this a corner case that only impacts some fraction of 4.6->4.6 updates? Or...? Who is impacted? Anyone who uses 4.6.24 and later 4.6 without the fix, if the registry processes crash or restart for any reason after the pod is created. What is the impact? The registry does not survive restarts, once the process is restarted it enters into a crash loop. Manual intervention is needed. How involved is remediation? Deleting image-registry pods should bring the registry back to the normal state. Updating to a fixed release will also recover the registry. Is this a regression? Yes, we regressed in 4.6.24 while fixing bug 1936984. I am not clear on why QE has been able to consistently reproduce this, since comment 4 claims the need for some kind of initial crash inside the pod to get to the broken state. And [1] shows the cluster-bot update from 4.6.24 to 4.6.0-0.nightly-2021-04-09-145812 that I launched today, which succeeded without hitting this issue [2]. I dunno what could be different between QE's updates and the cluster-bot update run... [1]: https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.6.24#upgrades-to [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp/1382007608419815424 This bug can also be reproduced sometime from 4.5.37-x86_64 to 4.6.24-x86_64: https://mastern-jenkins-csb-openshift-qe.apps.ocp4.prod.psi.redhat.com/job/upgrade_CI/13262/console Verified with several upgrade paths from/to 4.6.0-0.nightly-2021-04-14-161003: https://docs.google.com/spreadsheets/d/1T-tmF1tjNmuNTgMvve9ZkeiUvFLXl1Y3-t55Kfj8egQ/edit#gid=0 (In reply to Wenjing Zheng from comment #10) > This bug can also be reproduced sometime from 4.5.37-x86_64 to > 4.6.24-x86_64: > https://mastern-jenkins-csb-openshift-qe.apps.ocp4.prod.psi.redhat.com/job/ > upgrade_CI/13262/console Sorry, should be this job: https://mastern-jenkins-csb-openshift-qe.apps.ocp4.prod.psi.redhat.com/job/upgrade_CI/13164/consoleFull Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.25 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1153 Removing UpgradeBlocker, because I don't think we blocked any update recommendations based on this bug. |