This bug was initially created as a copy of Bug #1927340 I am copying this bug because: Description of problem: There is a potential race when using the `OperatorCondition` Upgradeable supported condition that can do the opposite of what the user expects. What should happen on the happy path: - operator writes operatorcondition with upgradeable=false - olm sees operatorcondition and sets pending = true - operator sets pod ready = true - olm waits for the operator to set operatorcondition upgradeable=true But OLM is not gauranteed to see the updated `OperatorCondition` value that the operator has set before it sees that the operator has set `ready=True` on its pod. In that case, this is what actually happens: - operator writes operatorcondition with upgradeable=false - operator sets pod ready = true - olm sees pod status first and sets success = true - olm upgrades operator to the next one - olm gets a bug report from an user or operator author that `upgradeable=false` didn't prevent the upgrade Version-Release number of selected component (if applicable): 4.7 There are a couple of ways to investigate fixing this: 1. When OLM queries for OperatorCondition status, hit the API directly instead of using the cache. This would fix the issue but will increase API load, especially during rollout of an upgrade, so it's not the ideal. 2. Use a generation / some other mechanism so that OLM can ack that it has seen the latest updates on OperatorCondition. The operator would wait to see the ack (i.e. ObservedGeneration = Generation) before setting pod readiness = true.
[cloud-user@preserve-olm-env jian]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-07-01-052427 True False 96m Cluster version is 4.7.0-0.nightly-2021-07-01-052427 [cloud-user@preserve-olm-env jian]$ oc -n openshift-operator-lifecycle-manager exec catalog-operator-5644cf4978-jmgdq -- olm --version OLM version: 0.17.0 git commit: a5f086a0cec32be5577b55c89d58adc40b677eb1 1, Install etcd operator v0.9.2 [cloud-user@preserve-olm-env jian]$ oc project default Now using project "default" on server "https://api.ci-ln-bdpflpb-d5d6b.origin-ci-int-aws.dev.rhcloud.com:6443". [cloud-user@preserve-olm-env jian]$ oc get og NAME AGE default-og 4s [cloud-user@preserve-olm-env jian]$ cat sub-etcd.yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: etcd namespace: default spec: channel: singlenamespace-alpha installPlanApproval: Manual name: etcd source: community-operators sourceNamespace: openshift-marketplace startingCSV: etcdoperator.v0.9.2 [cloud-user@preserve-olm-env jian]$ oc create -f sub-etcd.yaml subscription.operators.coreos.com/etcd created [cloud-user@preserve-olm-env jian]$ oc get sub NAME PACKAGE SOURCE CHANNEL etcd etcd community-operators singlenamespace-alpha [cloud-user@preserve-olm-env jian]$ oc get ip NAME CSV APPROVAL APPROVED install-pghlp etcdoperator.v0.9.2 Manual false [cloud-user@preserve-olm-env jian]$ oc get csv No resources found in default namespace. [cloud-user@preserve-olm-env jian]$ oc edit ip install-pghlp installplan.operators.coreos.com/install-pghlp edited [cloud-user@preserve-olm-env jian]$ [cloud-user@preserve-olm-env jian]$ oc get ip NAME CSV APPROVAL APPROVED install-kpn9l etcdoperator.v0.9.4 Manual false install-pghlp etcdoperator.v0.9.2 Manual true [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 Succeeded [cloud-user@preserve-olm-env jian]$ oc get operatorcondition NAME AGE etcdoperator.v0.9.2 58s 2, Patch the `spec.conditions[0].Upgradeable` of operatorcondition to False. Looks good. [cloud-user@preserve-olm-env jian]$ oc patch operatorcondition etcdoperator.v0.9.2 -p '{"spec":{"conditions":[{"type":"Upgradeable", "observedGeneration":1,"status":"False","reason":"bug","message":"not ready","lastUpdateTime":"2021-06-16T16:56:44Z","lastTransitionTime":"2021-06-16T16:56:44Z"}]}}' --type=merge operatorcondition.operators.coreos.com/etcdoperator.v0.9.2 patched [cloud-user@preserve-olm-env jian]$ oc get operatorcondition etcdoperator.v0.9.2 -o yaml apiVersion: operators.coreos.com/v2 kind: OperatorCondition metadata: creationTimestamp: "2021-07-01T08:07:22Z" generation: 2 labels: operators.coreos.com/etcd.default: "" name: etcdoperator.v0.9.2 namespace: default ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: true kind: ClusterServiceVersion name: etcdoperator.v0.9.2 uid: 36254607-312f-4b32-931c-20021747ed48 resourceVersion: "59966" selfLink: /apis/operators.coreos.com/v2/namespaces/default/operatorconditions/etcdoperator.v0.9.2 uid: 2ddc5d28-ee38-466f-93f0-f2a1748b964e spec: conditions: - lastTransitionTime: "2021-06-16T16:56:44Z" message: not ready observedGeneration: 1 reason: bug status: "False" type: Upgradeable deployments: - etcd-operator serviceAccounts: - etcd-operator status: conditions: - lastTransitionTime: "2021-06-16T16:56:44Z" message: not ready observedGeneration: 2 reason: bug status: "False" type: Upgradeable 3, Approve the etcdoperator.v0.9.4. [cloud-user@preserve-olm-env jian]$ oc get ip NAME CSV APPROVAL APPROVED install-kpn9l etcdoperator.v0.9.4 Manual false install-pghlp etcdoperator.v0.9.2 Manual true [cloud-user@preserve-olm-env jian]$ oc edit ip install-kpn9l installplan.operators.coreos.com/install-kpn9l edited [cloud-user@preserve-olm-env jian]$ [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 Replacing etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Pending [cloud-user@preserve-olm-env jian]$ oc get csv etcdoperator.v0.9.4 -o yaml ... status: conditions: - lastTransitionTime: "2021-07-01T08:10:28Z" lastUpdateTime: "2021-07-01T08:10:28Z" message: requirements not yet checked phase: Pending reason: RequirementsUnknown - lastTransitionTime: "2021-07-01T08:10:28Z" lastUpdateTime: "2021-07-01T08:10:28Z" message: 'operator is not upgradeable: The operator is not upgradeable: not ready' phase: Pending reason: OperatorConditionNotUpgradeable lastTransitionTime: "2021-07-01T08:10:28Z" lastUpdateTime: "2021-07-01T08:10:28Z" message: 'operator is not upgradeable: The operator is not upgradeable: not ready' phase: Pending reason: OperatorConditionNotUpgradeable 4, Patch the `spec.conditions[0].Upgradeable` of operatorcondition to True. etcdoperator.v0.9.2 can be upgraded to etcdoperator.v0.9.4, looks good. Verify it. [cloud-user@preserve-olm-env jian]$ oc patch operatorcondition etcdoperator.v0.9.2 -p '{"spec":{"conditions":[{"type":"Upgradeable", "observedGeneration":1,"status":"True","reason":"bug","message":"now ready","lastUpdateTime":"2021-06-16T16:56:44Z","lastTransitionTime":"2021-06-16T16:56:44Z"}]}}' --type=merge operatorcondition.operators.coreos.com/etcdoperator.v0.9.2 patched [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Succeeded [cloud-user@preserve-olm-env jian]$ oc get operatorcondition NAME AGE etcdoperator.v0.9.4 2m3s
OCP engineering has decided to not ship 4.7.20 due to a blocker. This bug will be shipped as part of next z-stream release 4.7.21 planned on July 27th
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.7.21 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2762