Bug 2001505
Summary: | Forever pending auto upgrade in case of breaking API changes | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | fvaleri |
Component: | OLM | Assignee: | Kevin Rizza <krizza> |
OLM sub component: | OLM | QA Contact: | Jian Zhang <jiazha> |
Status: | CLOSED NOTABUG | Docs Contact: | |
Severity: | medium | ||
Priority: | unspecified | Keywords: | Reopened |
Version: | 4.7 | ||
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-09-10 17:00:38 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
fvaleri
2021-09-06 09:04:30 UTC
Hi, I'm reopening this, please double check. The bug is that the OLM upgrade process remains pending, even after applying the required CR conversion steps (no more resources using the OLD CRD version, which can now be removed safely). The only way to recover from this is to uninstall the old operator and the new (pending) operators. After this, you need to reinstall the new one. More details here: https://access.redhat.com/solutions/6273981 As I described before, this is because OLM does not and cannot know how to autorecover in specific situations like this because of the way the InstallPlan resource reconciles. It is a book of record run once operation that does not attempt to retry in the case of failures. This does mean that in any arbitrary case once the installplan is in a failed state that OLM will not recover after the cluster is put into a configuration that would allow the install to succeed. From OLM's perspective, it tried to install, it got partway through the operation, and it failed. Part of the reinstall process today involves undoing the existing OLM install steps and then reinstalling from scratch. We have some future proposals about how to make OLM more declarative, but we are not currently able to track a failure condition like this as a bug that can be trivially fixed with the existing OLM control plane. Semantically, it requires new installation concepts and a new API. See https://github.com/operator-framework/rukpak#rukpak for the beginnings of some of that work, but it will most likely be several openshift releases before that replaces the current install workflow. |