Description of problem: Cluster attempted to upgrade from 4.9.29 -> 4.10.15. The upgrade could not progress because of: Last Transition Time: 2022-06-15T16:08:33Z Message: ClusterServiceVersions blocking cluster upgrade: redhat-rhoam-operator/managed-api-service.v1.22.0 is incompatible with OpenShift minor versions greater than 4.10 Reason: IncompatibleOperatorsInstalled Status: False Type: Upgradeable The mentioned CSV has olm.maxOpenShiftVersion of 4.10: { "type": "olm.maxOpenShiftVersion", "value": "4.10" } Actual results: The upgrade is blocked on this olm.maxOpenShiftVersion Expected results: The upgrade should not be blocked on this olm.maxOpenShiftVersion Additional info: The particular customer has successfully upgraded twice on different clusters on the same OCP edge and with managed-api-service.v1.22.0 installed must-gather will be attached in a private comment
One question also, should the operator-lifecycle-manager CO be degraded when Upgradeable is false?
We cancelled the upgrade using `oc adm upgrade --clear=true` which has had he effect of resolving the Upgradeable: false condition. The customer would like to reschedule the upgrade, would like to know what you think was the issue and whether a retry would work. ty
The issue here is that OLM is using ClusterVersion's status.desired [1] to compute the next 4.y [2]. So: 1. Cluster is running 4.9. 2. Update to 4.10 requested. 3. Cluster-version operator is mulling over whether 4.10 is a good idea. 4.9 CVOs set status.desired to point at the requested target while they do this. More recent CVO, including 4.10.7 (tombstoned [3]), 4.10.8 and later, leave status.desired alone while considering an update request, see bug 2064991 and bug 1826115. 4. Cluster-version operator fails the first round of preconditions on EtcdRecentBackup, waiting for etcd to perform the pre-minor-update snapshot. 5. Meanwhile, OLM is looking at ClusterVersion's status.desired, notices the 4.10 version, knows it has some operators that are not compatible with 4.11, and sets Upgradeable=False in its ClusterOperator. 6. Cluster-version operator comes back around for a new round of precondition checks. Now etcd's ClusterOperator has RecentBackup=True, so we pass that precondition. But because OLM is Upgradeable=False, and the CVO interprets Upgradeable=False from a 4.9 operator as "please don't go to 4.10", we fail prechecks again. So the 4.9 OLM is trying to say "don't go to 4.11", but the CVO is hearing "don't go to 4.10". But because we'll never ask a 4.9 OLM to go straight to 4.11, all the 4.9 OLM has to be concerned about is compat with 4.10. One possible fix would be similar to bug 2097431, pivoting to using the RELEASE_VERSION environment variable [3] to figure out which version OLM is, instead of looking at ClusterVersion. One possible way to unstick minor updates out of impacted versions is: 1. Request the update to 4.10. 2. Wait until 'oc get -o clusteroperator etcd | grep -5 RecentBackup' shows a RecentBackup=True condition. 3. 'oc adm upgrade --clear' to give up on the update. 4. Give OLM time to cool off, checking 'oc get -o yaml clusterversion version | grep -5 Upgradeable' until there are no Upgradeable conditions (possibly taking steps to address any Upgradeable=False conditions that are not OLM complaining about 4.11). 5. Request the update to 4.10 again. That should get you a fresh round of 4.10 preconditions while RecentBackup=True (etcd does not seem to expire this on "not recent any more" very quickly, at least in 4.9.29). And you'll also be going through the preconditions before OLM has time to get worried about 4.11. [1]: https://github.com/openshift/operator-framework-olm/blame/7f8ad598528b2d029fac23dac6d860c433cbf962/staging/operator-lifecycle-manager/pkg/controller/operators/openshift/helpers.go#L171-L189 [2]: https://github.com/openshift/operator-framework-olm/blame/7f8ad598528b2d029fac23dac6d860c433cbf962/staging/operator-lifecycle-manager/pkg/controller/operators/openshift/helpers.go#L132 [3]: https://github.com/openshift/operator-framework-olm/blob/7f8ad598528b2d029fac23dac6d860c433cbf962/manifests/0000_50_olm_07-olm-operator.deployment.yaml#L79-L80
Thanks Trevor. So this is a fix for OLM. Is it a matter of retrying the upgrade again for the customer?
setting to blocker- since we have a workaround and a path forward
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399