Bug 2097557
Summary: | can not upgrade. Incorrect reading of olm.maxOpenShiftVersion | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | jroche |
Component: | OLM | Assignee: | Per da Silva <pegoncal> |
OLM sub component: | OLM | QA Contact: | kuiwang |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | agreene, bparees, cblecker, dhellmann, dmesser, jaldinge, jkeister, kramraja, krizza, mbargenq, nmalik, sreber, wking |
Version: | 4.9 | Keywords: | ServiceDeliveryBlocker, Triaged |
Target Milestone: | --- | ||
Target Release: | 4.12.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
* Previously, Operator Lifecycle Manager (OLM) would attempt to update namespaces to apply a label, even if the label was present on the namespace. Consequently, the update requests increased the workload in API and etcd services. With this update, OLM compares existing labels against the expected labels on a namespace before issuing an update. As a result, OLM no longer attempts to make unnecessary update requests on namespaces. (link:https://bugzilla.redhat.com/show_bug.cgi?id=2105045[*BZ#2105045*])
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2023-01-17 19:50:02 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2114574 |
Description
jroche
2022-06-16 00:47:25 UTC
One question also, should the operator-lifecycle-manager CO be degraded when Upgradeable is false? We cancelled the upgrade using `oc adm upgrade --clear=true` which has had he effect of resolving the Upgradeable: false condition. The customer would like to reschedule the upgrade, would like to know what you think was the issue and whether a retry would work. ty The issue here is that OLM is using ClusterVersion's status.desired [1] to compute the next 4.y [2]. So: 1. Cluster is running 4.9. 2. Update to 4.10 requested. 3. Cluster-version operator is mulling over whether 4.10 is a good idea. 4.9 CVOs set status.desired to point at the requested target while they do this. More recent CVO, including 4.10.7 (tombstoned [3]), 4.10.8 and later, leave status.desired alone while considering an update request, see bug 2064991 and bug 1826115. 4. Cluster-version operator fails the first round of preconditions on EtcdRecentBackup, waiting for etcd to perform the pre-minor-update snapshot. 5. Meanwhile, OLM is looking at ClusterVersion's status.desired, notices the 4.10 version, knows it has some operators that are not compatible with 4.11, and sets Upgradeable=False in its ClusterOperator. 6. Cluster-version operator comes back around for a new round of precondition checks. Now etcd's ClusterOperator has RecentBackup=True, so we pass that precondition. But because OLM is Upgradeable=False, and the CVO interprets Upgradeable=False from a 4.9 operator as "please don't go to 4.10", we fail prechecks again. So the 4.9 OLM is trying to say "don't go to 4.11", but the CVO is hearing "don't go to 4.10". But because we'll never ask a 4.9 OLM to go straight to 4.11, all the 4.9 OLM has to be concerned about is compat with 4.10. One possible fix would be similar to bug 2097431, pivoting to using the RELEASE_VERSION environment variable [3] to figure out which version OLM is, instead of looking at ClusterVersion. One possible way to unstick minor updates out of impacted versions is: 1. Request the update to 4.10. 2. Wait until 'oc get -o clusteroperator etcd | grep -5 RecentBackup' shows a RecentBackup=True condition. 3. 'oc adm upgrade --clear' to give up on the update. 4. Give OLM time to cool off, checking 'oc get -o yaml clusterversion version | grep -5 Upgradeable' until there are no Upgradeable conditions (possibly taking steps to address any Upgradeable=False conditions that are not OLM complaining about 4.11). 5. Request the update to 4.10 again. That should get you a fresh round of 4.10 preconditions while RecentBackup=True (etcd does not seem to expire this on "not recent any more" very quickly, at least in 4.9.29). And you'll also be going through the preconditions before OLM has time to get worried about 4.11. [1]: https://github.com/openshift/operator-framework-olm/blame/7f8ad598528b2d029fac23dac6d860c433cbf962/staging/operator-lifecycle-manager/pkg/controller/operators/openshift/helpers.go#L171-L189 [2]: https://github.com/openshift/operator-framework-olm/blame/7f8ad598528b2d029fac23dac6d860c433cbf962/staging/operator-lifecycle-manager/pkg/controller/operators/openshift/helpers.go#L132 [3]: https://github.com/openshift/operator-framework-olm/blob/7f8ad598528b2d029fac23dac6d860c433cbf962/manifests/0000_50_olm_07-olm-operator.deployment.yaml#L79-L80 Thanks Trevor. So this is a fix for OLM. Is it a matter of retrying the upgrade again for the customer? setting to blocker- since we have a workaround and a path forward Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399 |