Description of problem: There is a potential race when using the `OperatorCondition` Upgradeable supported condition that can do the opposite of what the user expects. What should happen on the happy path: - operator writes operatorcondition with upgradeable=false - olm sees operatorcondition and sets pending = true - operator sets pod ready = true - olm waits for the operator to set operatorcondition upgradeable=true But OLM is not gauranteed to see the updated `OperatorCondition` value that the operator has set before it sees that the operator has set `ready=True` on its pod. In that case, this is what actually happens: - operator writes operatorcondition with upgradeable=false - operator sets pod ready = true - olm sees pod status first and sets success = true - olm upgrades operator to the next one - olm gets a bug report from an user or operator author that `upgradeable=false` didn't prevent the upgrade Version-Release number of selected component (if applicable): 4.7 There are a couple of ways to investigate fixing this: 1. When OLM queries for OperatorCondition status, hit the API directly instead of using the cache. This would fix the issue but will increase API load, especially during rollout of an upgrade, so it's not the ideal. 2. Use a generation / some other mechanism so that OLM can ack that it has seen the latest updates on OperatorCondition. The operator would wait to see the ack (i.e. ObservedGeneration = Generation) before setting pod readiness = true.
> 1. When OLM queries for OperatorCondition status, hit the API directly instead of using the cache. This would fix the issue but will increase API load, especially during rollout of an upgrade, so it's not the ideal. While this would reduce the time frame of the race condition, the Operator can still not be sure whether setting the Upgradable condition to True will be respected by OLM. If OLM checked the OperatorCondition right before the Operator set it to False the Operator will be upgraded while the Operator already begun doing something important. > 2. Use a generation / some other mechanism so that OLM can ack that it has seen the latest updates on OperatorCondition. The operator would wait to see the ack (i.e. ObservedGeneration = Generation) before setting pod readiness = true. This would work. When the Operator can wait for an ACK from OLM. The Operator can be sure that the Condition update will be respected. While it's very counter-intuitive, I think that the Conditions reported by the Operator belong into the OperatorCondition.Spec instead of OperatorCondition.Status. This way .metadata.generation is incremented and the Operator can wait for .status.observedGeneration to sync and for a "Acknowledged" condition, because OLM needs to signal to the Operator if the upgrade is already in progress (). I would also like to propose 3.) 3. Instead of the Operator acquiring a lock on OLM, OLM could acquire a lock on the Operator, but this requires a CRD instance from the Operator to interact with. (Which is a common pattern, but not universally adopted) - OLM sets/increments an annotation before starting to upgrade an operator - The Operator reads the condition -> If no upgrade-blocking operation is in progress set Upgradable condition of the Operators CRD to True - don't start anything blocking until upgraded by OLM -> If a upgrade-blocking operation is in progress set Upgradable condition to False - OLM may respect this condition or still choose to force-upgrade the Operator - Operators should update the Upgradable condition to True when finished with the blocking operation - OLM may retry this flow from the start (by updating an annotation) to trigger the Operator to reconsider the Upgradable status I think it might be worth thinking about this for the future: - doesn't need a new API (no OperatorCondition) - could be adopted as a standard pattern for operators, as it can be used via Helm or manually as well - obtaining the Upgrade lock is now a burden on OLM instead of every Operator - Operators may work out-of-the Box by reporting the Upgradable condition and refusing to Terminate till done? With long `terminationGracePeriodSeconds`? - no dependency on operator-sdk runtime libraries or APIs yaml examples here: https://github.com/operator-framework/operator-lifecycle-manager/issues/1973#issuecomment-781314822
[cloud-user@preserve-olm-env jian]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-06-14-145150 True False 5h25m Cluster version is 4.8.0-0.nightly-2021-06-14-145150 [cloud-user@preserve-olm-env jian]$ oc exec catalog-operator-5bff594bb5-8gvcp -- olm --version OLM version: 0.17.0 git commit: f25f670c03e849ba0fd53a56daa0d8a697f68d16 1, Create an OperatorGroup in the default namespace. [cloud-user@preserve-olm-env jian]$ oc create -f og.yaml operatorgroup.operators.coreos.com/default-og created [cloud-user@preserve-olm-env jian]$ oc get og -n default NAME AGE default-og 17s 2, Subscribe to the etcd 0.9.2 version with manual approve. [cloud-user@preserve-olm-env jian]$ cat sub-etcd.yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: etcd namespace: default spec: channel: singlenamespace-alpha installPlanApproval: Manual name: etcd source: community-operators sourceNamespace: openshift-marketplace startingCSV: etcdoperator.v0.9.2 [cloud-user@preserve-olm-env jian]$ oc create -f sub-etcd.yaml subscription.operators.coreos.com/etcd created [cloud-user@preserve-olm-env jian]$ oc project default Now using project "default" on server "https://api.jiazha15.qe.devcluster.openshift.com:6443". [cloud-user@preserve-olm-env jian]$ oc get operatorcondition No resources found in default namespace. [cloud-user@preserve-olm-env jian]$ oc get sub NAME PACKAGE SOURCE CHANNEL etcd etcd community-operators singlenamespace-alpha [cloud-user@preserve-olm-env jian]$ oc get ip NAME CSV APPROVAL APPROVED install-cbnkv etcdoperator.v0.9.2 Manual false 3, Approve it. [cloud-user@preserve-olm-env jian]$ oc get ip NAME CSV APPROVAL APPROVED install-cbnkv etcdoperator.v0.9.2 Manual true install-l8g4b etcdoperator.v0.9.4 Manual false [cloud-user@preserve-olm-env jian]$ oc get operatorcondition NAME AGE etcdoperator.v0.9.2 18s [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.1.0-56 OpenShift Elasticsearch Operator 5.1.0-56 Succeeded etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 Succeeded 4, Set the Upgradeable status to False. [cloud-user@preserve-olm-env jian]$ oc proxy --port=8081 & [1] 773253 [cloud-user@preserve-olm-env jian]$ Starting to serve on 127.0.0.1:8081 [cloud-user@preserve-olm-env jian]$ curl -X PATCH -H 'Content-Type: application/merge-patch+json' --data '{"status": {"conditions": [{"lastTransitionTime": "2021-06-15T15:39:01Z","message": "Bug","reason": "NotUpgradeable","status": "False","type": "Upgradeable"}]}}' localhost:8081/apis/operators.coreos.com/v1/namespaces/default/operatorconditions/etcdoperator.v0.9.2/status {"apiVersion":"operators.coreos.com/v1","kind":"OperatorCondition","metadata":{"creationTimestamp":"2021-06-15T07:41:25Z","generation":1,"labels":{"operators.coreos.com/etcd.default":""},"managedFields":[{"apiVersion":"operators.coreos.com/v2","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:labels":{".":{},"f:operators.coreos.com/etcd.default":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"2d00c3fd-dc4a-4673-8f23-1732c8f30a2e\"}":{".":{},"f:apiVersion":{},"f:blockOwnerDeletion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}},"f:spec":{".":{},"f:deployments":{},"f:serviceAccounts":{}}},"manager":"olm","operation":"Update","time":"2021-06-15T07:41:26Z"},{"apiVersion":"operators.coreos.com/v1","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:conditions":{}}},"manager":"curl","operation":"Update","time":"2021-06-15T08:15:55Z"}],"name":"etcdoperator.v0.9.2","namespace":"default","ownerReferences":[{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":true,"kind":"ClusterServiceVersion","name":"etcdoperator.v0.9.2","uid":"2d00c3fd-dc4a-4673-8f23-1732c8f30a2e"}],"resourceVersion":"211028","uid":"eea100c9-54f5-4082-acc5-68d39de8a9cb"},"spec":{"deployments":["etcd-operator"],"serviceAccounts":["etcd-operator"]},"status":{"conditions":[{"lastTransitionTime":"2021-06-15T15:39:01Z","message":"Bug","reason":"NotUpgradeable","status":"False","type":"Upgradeable"}]}}[cloud-user@preserve-olm-env jian]$ oc get operatorcondition/etcdoperator.v0.9.2 -o yaml apiVersion: operators.coreos.com/v2 kind: OperatorCondition metadata: creationTimestamp: "2021-06-15T07:41:25Z" generation: 1 labels: operators.coreos.com/etcd.default: "" name: etcdoperator.v0.9.2 namespace: default ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: true kind: ClusterServiceVersion name: etcdoperator.v0.9.2 uid: 2d00c3fd-dc4a-4673-8f23-1732c8f30a2e resourceVersion: "211028" uid: eea100c9-54f5-4082-acc5-68d39de8a9cb spec: deployments: - etcd-operator serviceAccounts: - etcd-operator status: conditions: - lastTransitionTime: "2021-06-15T15:39:01Z" message: Bug reason: NotUpgradeable status: "False" type: Upgradeable 5, Approve etcdoperator.v0.9.4, the corresponding CSV in Pending status, LGTM. [cloud-user@preserve-olm-env jian]$ oc get ip NAME CSV APPROVAL APPROVED install-cbnkv etcdoperator.v0.9.2 Manual true install-l8g4b etcdoperator.v0.9.4 Manual true [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.1.0-56 OpenShift Elasticsearch Operator 5.1.0-56 Succeeded etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 Replacing etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Pending 6, update the Upgradeable status to True, [cloud-user@preserve-olm-env jian]$ curl -X PATCH -H 'Content-Type: application/merge-patch+json' --data '{"status": {"conditions": [{"lastTransitionTime": "2021-06-15T15:39:01Z","message": "Bug fixed","reason": "Upgradeable","status": "True","type": "Upgradeable"}]}}' localhost:8081/apis/operators.coreos.com/v1/namespaces/default/operatorconditions/etcdoperator.v0.9.2/status {"apiVersion":"operators.coreos.com/v1","kind":"OperatorCondition","metadata":{"creationTimestamp":"2021-06-15T07:41:25Z","generation":1,"labels":{"operators.coreos.com/etcd.default":""},"managedFields":[{"apiVersion":"operators.coreos.com/v2","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:labels":{".":{},"f:operators.coreos.com/etcd.default":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"2d00c3fd-dc4a-4673-8f23-1732c8f30a2e\"}":{".":{},"f:apiVersion":{},"f:blockOwnerDeletion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}},"f:spec":{".":{},"f:deployments":{},"f:serviceAccounts":{}}},"manager":"olm","operation":"Update","time":"2021-06-15T07:41:26Z"},{"apiVersion":"operators.coreos.com/v1","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:conditions":{}}},"manager":"curl","operation":"Update","time":"2021-06-15T08:23:29Z"}],"name":"etcdoperator.v0.9.2","namespace":"default","ownerReferences":[{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":true,"kind":"ClusterServiceVersion","name":"etcdoperator.v0.9.2","uid":"2d00c3fd-dc4a-4673-8f23-1732c8f30a2e"}],"resourceVersion":"214874","uid":"eea100c9-54f5-4082-acc5-68d39de8a9cb"},"spec":{"deployments":["etcd-operator"],"serviceAccounts":["etcd-operator"]},"status":{"conditions":[{"lastTransitionTime":"2021-06-15T15:39:01Z","message":"Bug fixed","reason":"Upgradeable","status":"True","type":"Upgradeable"}]}} [cloud-user@preserve-olm-env jian]$ oc get operatorcondition etcdoperator.v0.9.2 -o yaml apiVersion: operators.coreos.com/v2 kind: OperatorCondition metadata: creationTimestamp: "2021-06-15T07:41:25Z" generation: 1 labels: operators.coreos.com/etcd.default: "" name: etcdoperator.v0.9.2 namespace: default ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: true kind: ClusterServiceVersion name: etcdoperator.v0.9.2 uid: 2d00c3fd-dc4a-4673-8f23-1732c8f30a2e resourceVersion: "214874" uid: eea100c9-54f5-4082-acc5-68d39de8a9cb spec: deployments: - etcd-operator serviceAccounts: - etcd-operator status: conditions: - lastTransitionTime: "2021-06-15T15:39:01Z" message: Bug fixed reason: Upgradeable status: "True" type: Upgradeable [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.1.0-56 OpenShift Elasticsearch Operator 5.1.0-56 Succeeded etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 Replacing etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Pending But the CSV still in Pending status, as follows: [cloud-user@preserve-olm-env jian]$ oc get csv etcdoperator.v0.9.4 -o yaml ... ... status: cleanup: {} conditions: - lastTransitionTime: "2021-06-15T08:19:42Z" lastUpdateTime: "2021-06-15T08:19:42Z" message: requirements not yet checked phase: Pending reason: RequirementsUnknown - lastTransitionTime: "2021-06-15T08:19:42Z" lastUpdateTime: "2021-06-15T08:19:42Z" message: 'operator is not upgradeable: The operatorcondition status "Upgradeable"="False" is outdated' phase: Pending reason: OperatorConditionNotUpgradeable lastTransitionTime: "2021-06-15T08:19:42Z" lastUpdateTime: "2021-06-15T08:23:29Z" message: 'operator is not upgradeable: The operatorcondition status "Upgradeable"="True" is outdated' phase: Pending reason: OperatorConditionNotUpgradeable
Hi Vu, Thanks for your information! 1, repeat the above step 1-3. [cloud-user@preserve-olm-env jian]$ oc get sub NAME PACKAGE SOURCE CHANNEL etcd etcd community-operators singlenamespace-alpha [cloud-user@preserve-olm-env jian]$ oc get ip NAME CSV APPROVAL APPROVED install-lkd94 etcdoperator.v0.9.4 Manual false install-whgs5 etcdoperator.v0.9.2 Manual true [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.1.0-56 OpenShift Elasticsearch Operator 5.1.0-56 Succeeded etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 Succeeded 2, Update the OperatorContion, add `spec.conditions` fileds, and set the Upgradeable to False. But, got the below warnings when modify the OperatorCondition object. # operatorconditions.operators.coreos.com "etcdoperator.v0.9.2" was not valid: # * spec.conditions.reason: Invalid value: "The operator is not upgradeable": spec.conditions.reason in body should match '^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$' # [cloud-user@preserve-olm-env jian]$ oc get operatorconditions etcdoperator.v0.9.2 -o yaml apiVersion: operators.coreos.com/v2 kind: OperatorCondition metadata: creationTimestamp: "2021-06-16T02:53:01Z" generation: 2 labels: operators.coreos.com/etcd.default: "" name: etcdoperator.v0.9.2 namespace: default ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: true kind: ClusterServiceVersion name: etcdoperator.v0.9.2 uid: 2345d4b7-c178-4de3-ba61-6bfe811889e7 resourceVersion: "704994" uid: 9db3774e-d008-4ed2-a7da-001750d55fc5 spec: conditions: - lastTransitionTime: "2021-06-16T16:56:44Z" message: The operator is not ready reason: Bug status: "False" type: Upgradeable deployments: - etcd-operator serviceAccounts: - etcd-operator status: conditions: - lastTransitionTime: "2021-06-16T16:56:44Z" message: The operator is not ready observedGeneration: 2 reason: Bug status: "False" type: Upgradeable 3, Approve InstallPlan etcdoperator.v0.9.4. Its CSV in Pending status looks good. [cloud-user@preserve-olm-env jian]$ oc edit ip install-lkd94 installplan.operators.coreos.com/install-lkd94 edited [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.1.0-56 OpenShift Elasticsearch Operator 5.1.0-56 Succeeded etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 Replacing etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Pending [cloud-user@preserve-olm-env jian]$ [cloud-user@preserve-olm-env jian]$ oc get csv etcdoperator.v0.9.4 -o yaml ... ... status: cleanup: {} conditions: - lastTransitionTime: "2021-06-16T03:20:38Z" lastUpdateTime: "2021-06-16T03:20:38Z" message: requirements not yet checked phase: Pending reason: RequirementsUnknown - lastTransitionTime: "2021-06-16T03:20:38Z" lastUpdateTime: "2021-06-16T03:20:38Z" message: 'operator is not upgradeable: The operator is not upgradeable: The operator is not ready' phase: Pending reason: OperatorConditionNotUpgradeable lastTransitionTime: "2021-06-16T03:20:38Z" lastUpdateTime: "2021-06-16T03:20:38Z" message: 'operator is not upgradeable: The operator is not upgradeable: The operator is not ready' phase: Pending reason: OperatorConditionNotUpgradeable 4, Check the OperatorConditions status. One question, the `spec.observedGeneration` is gone, is it as expected? [cloud-user@preserve-olm-env jian]$ oc get operatorconditions etcdoperator.v0.9.2 -o yaml apiVersion: operators.coreos.com/v2 kind: OperatorCondition metadata: creationTimestamp: "2021-06-16T02:53:01Z" generation: 2 labels: operators.coreos.com/etcd.default: "" name: etcdoperator.v0.9.2 namespace: default ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: true kind: ClusterServiceVersion name: etcdoperator.v0.9.2 uid: 2345d4b7-c178-4de3-ba61-6bfe811889e7 resourceVersion: "704994" uid: 9db3774e-d008-4ed2-a7da-001750d55fc5 spec: conditions: - lastTransitionTime: "2021-06-16T16:56:44Z" message: The operator is not ready reason: Bug status: "False" type: Upgradeable deployments: - etcd-operator serviceAccounts: - etcd-operator status: conditions: - lastTransitionTime: "2021-06-16T16:56:44Z" message: The operator is not ready observedGeneration: 2 reason: Bug status: "False" type: Upgradeable 5, Set the OperatorCondition `spec.conditions.Upgradeable` to "True". Maybe we should address this error. [cloud-user@preserve-olm-env jian]$ oc edit operatorconditions etcdoperator.v0.9.2 error: operatorconditions.operators.coreos.com "etcdoperator.v0.9.2" is invalid operatorcondition.operators.coreos.com/etcdoperator.v0.9.2 edited 6, etcdoperator.v0.9.2 upgraded to etcdoperator.v0.9.4 successfully, LGTM. [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.1.0-56 OpenShift Elasticsearch Operator 5.1.0-56 Succeeded etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Succeeded [cloud-user@preserve-olm-env jian]$ oc get operatorconditions NAME AGE etcdoperator.v0.9.4 5m
Hi Jian, I want to ask where did you see this warning from? I don't see this issue when I tried to update the OperatorCondition. I simply use `oc apply` with the `spec.conditions` added to yaml. You don't need to use `curl` command to update the spec. Vu
Hi Vu, For this warning, seems like it is forbidden to use the `blank` letter in the `reason` field. It's weird, I'm not sure why we do this rule. But, this is not a big issue. # operatorconditions.operators.coreos.com "etcdoperator.v0.9.2" was not valid: # * spec.conditions.reason: Invalid value: "The operator is not upgradeable": spec.conditions.reason in body should match '^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$' # Regarding point 4, the `spec.conditions[0],observedGeneration` is gone, is it as expected?
Hey Jian, When I update the spec with observedGeneration, it persists just fine. I'm not sure why yours was missing. I looked at the curl command and it seemed the `observedGeneration` wasn't set. Vu
Hi Vu, I also used the `patch`, but the `observedGeneration` still gone. $oc patch operatorcondition etcdoperator.v0.9.2 -p '{"spec":{"conditions":[{"type":"Upgradeable", "observedCondition":1,"status":"False","reason":"bug","message":"not ready","lastUpdateTime":"2021-06-16T16:56:44Z","lastTransitionTime":"2021-06-16T16:56:44Z"}]}}' --type=merge
I also test the `spec.overrides`, seems like it doesn't work. 1, patch it to set the Upgradeable to False. [cloud-user@preserve-olm-env jian]$ oc patch operatorcondition etcdoperator.v0.9.2 -p '{"spec":{"conditions":[{"type":"Upgradeable", "observedCondition":1,"status":"False","reason":"bug","message":"not ready","lastUpdateTime":"2021-06-16T16:56:44Z","lastTransitionTime":"2021-06-16T16:56:44Z"}]}}' --type=merge operatorcondition.operators.coreos.com/etcdoperator.v0.9.2 patched 2, the `observedGeneration` still gone. [cloud-user@preserve-olm-env jian]$ oc get operatorcondition etcdoperator.v0.9.2 -o yaml apiVersion: operators.coreos.com/v2 kind: OperatorCondition metadata: creationTimestamp: "2021-06-18T09:36:13Z" generation: 2 labels: operators.coreos.com/etcd.default: "" name: etcdoperator.v0.9.2 namespace: default ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: true kind: ClusterServiceVersion name: etcdoperator.v0.9.2 uid: 3d5d8262-4eb7-4265-ab81-bfaf92000c3b resourceVersion: "835241" uid: 8c0cb630-2c2b-4854-9bf5-103c9764cf44 spec: conditions: - lastTransitionTime: "2021-06-16T16:56:44Z" message: not ready reason: bug status: "False" type: Upgradeable deployments: - etcd-operator serviceAccounts: - etcd-operator status: conditions: - lastTransitionTime: "2021-06-16T16:56:44Z" message: not ready observedGeneration: 2 reason: bug status: "False" type: Upgradeable 3, Approve the etcdoperator.v0.9.4 [cloud-user@preserve-olm-env jian]$ oc get ip NAME CSV APPROVAL APPROVED install-bhb4z etcdoperator.v0.9.2 Manual true install-fb7dx etcdoperator.v0.9.4 Manual false [cloud-user@preserve-olm-env jian]$ oc edit ip install-fb7dx installplan.operators.coreos.com/install-fb7dx edited [cloud-user@preserve-olm-env jian]$ [cloud-user@preserve-olm-env jian]$ 4, etcdoperator.v0.9.4 in Pending status, looks good. [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE blacklist.v0.0.1 kaka 0.0.1 Succeeded elasticsearch-operator.5.1.0-59 OpenShift Elasticsearch Operator 5.1.0-59 elasticsearch-operator.5.1.0-58 Succeeded etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 Replacing etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Pending 5, Overrides it and set the Upgradeable to True. [cloud-user@preserve-olm-env jian]$ oc patch operatorcondition etcdoperator.v0.9.2 -p '{"spec":{"overrides":[{"type":"Upgradeable", "status":"True","reason":"bug","message":"now ready"}]}}' --type=merge operatorcondition.operators.coreos.com/etcdoperator.v0.9.2 patched 6, The status.conditions still in `False`, seems like `spec.overrides` can overrides the `status.conditions`. [cloud-user@preserve-olm-env jian]$ oc get operatorcondition etcdoperator.v0.9.2 -o yaml apiVersion: operators.coreos.com/v2 kind: OperatorCondition metadata: creationTimestamp: "2021-06-18T09:36:13Z" generation: 3 labels: operators.coreos.com/etcd.default: "" name: etcdoperator.v0.9.2 namespace: default ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: true kind: ClusterServiceVersion name: etcdoperator.v0.9.2 uid: 3d5d8262-4eb7-4265-ab81-bfaf92000c3b resourceVersion: "837086" uid: 8c0cb630-2c2b-4854-9bf5-103c9764cf44 spec: conditions: - lastTransitionTime: "2021-06-16T16:56:44Z" message: not ready reason: bug status: "False" type: Upgradeable deployments: - etcd-operator overrides: - message: now ready reason: bug status: "True" type: Upgradeable serviceAccounts: - etcd-operator status: conditions: - lastTransitionTime: "2021-06-16T16:56:44Z" message: not ready observedGeneration: 3 reason: bug status: "False" type: Upgradeable But, the CSV can be upgraded well. [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE blacklist.v0.0.1 kaka 0.0.1 Succeeded elasticsearch-operator.5.1.0-59 OpenShift Elasticsearch Operator 5.1.0-59 elasticsearch-operator.5.1.0-58 Succeeded etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 Replacing etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Installing
Hi Jian, I can see in the patch command you have `observedCondition` which is not valid field. The field is `observedGeneration`. Vu
Hi Vu, Oh, sorry, I guess something wrong with the example list in https://hackmd.io/9wG20hu5TU-y1HrkhvcsZQ?view : apiVersion: operators.coreos.com/v1 kind: OperatorCondition metadata: generation: 2 spec: conditions: - lastTransitionTime: "2021-05-05T16:56:44Z" lastUpdateTime: "2021-05-05T16:56:44Z" message: The operator is not ready reason: The operator is not upgradeable status: "False" observedCondition: 1 type: Upgradeable status: conditions: - lastTransitionTime: "2021-05-05T16:56:50Z" lastUpdateTime: "2021-05-05T16:56:50" message: The operator is not ready reason: The operator is not upgradeable status: "False" observedCondition: 2 type: Upgradeable Anyway, I retest it with the `observedGeneration`, [cloud-user@preserve-olm-env jian]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-06-19-005119 True False 18h Cluster version is 4.8.0-0.nightly-2021-06-19-005119 [cloud-user@preserve-olm-env jian]$ oc -n openshift-operator-lifecycle-manager exec catalog-operator-78f55c6dfc-zs9pd -- olm --version OLM version: 0.17.0 git commit: f25f670c03e849ba0fd53a56daa0d8a697f68d16 1, Install etcd operator v0.9.2 [cloud-user@preserve-olm-env jian]$ oc project Using project "default" on server "https://api.wewang-share.qe.devcluster.openshift.com:6443". [cloud-user@preserve-olm-env jian]$ oc get og NAME AGE default-og 2m13s [cloud-user@preserve-olm-env jian]$ cat sub-etcd.yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: etcd namespace: default spec: channel: singlenamespace-alpha installPlanApproval: Manual name: etcd source: community-operators sourceNamespace: openshift-marketplace startingCSV: etcdoperator.v0.9.2 [cloud-user@preserve-olm-env jian]$ oc create -f sub-etcd.yaml subscription.operators.coreos.com/etcd created [cloud-user@preserve-olm-env jian]$ oc get sub NAME PACKAGE SOURCE CHANNEL etcd etcd community-operators singlenamespace-alpha [cloud-user@preserve-olm-env jian]$ oc get ip NAME CSV APPROVAL APPROVED install-l9nd7 etcdoperator.v0.9.2 Manual false [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.1.0-63 OpenShift Elasticsearch Operator 5.1.0-63 Succeeded [cloud-user@preserve-olm-env jian]$ oc edit ip install-l9nd7 installplan.operators.coreos.com/install-l9nd7 edited [cloud-user@preserve-olm-env jian]$ oc get ip NAME CSV APPROVAL APPROVED install-l9nd7 etcdoperator.v0.9.2 Manual true install-r5sqx etcdoperator.v0.9.4 Manual false [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.1.0-63 OpenShift Elasticsearch Operator 5.1.0-63 Succeeded etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 Succeeded [cloud-user@preserve-olm-env jian]$ oc get operatorcondition NAME AGE etcdoperator.v0.9.2 13s 2, Patch the `spec.conditions[0].Upgradeable` of operatorcondition to False. Looks good. [cloud-user@preserve-olm-env jian]$ oc patch operatorcondition etcdoperator.v0.9.2 -p '{"spec":{"conditions":[{"type":"Upgradeable", "observedGeneration":1,"status":"False","reason":"bug","message":"not ready","lastUpdateTime":"2021-06-16T16:56:44Z","lastTransitionTime":"2021-06-16T16:56:44Z"}]}}' --type=merge operatorcondition.operators.coreos.com/etcdoperator.v0.9.2 patched [cloud-user@preserve-olm-env jian]$ oc get operatorcondition etcdoperator.v0.9.2 -o yaml apiVersion: operators.coreos.com/v2 kind: OperatorCondition metadata: creationTimestamp: "2021-06-22T07:30:54Z" generation: 2 labels: operators.coreos.com/etcd.default: "" name: etcdoperator.v0.9.2 namespace: default ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: true kind: ClusterServiceVersion name: etcdoperator.v0.9.2 uid: b2b2e7a5-05bf-4ccd-8ad7-24415b3cf2bf resourceVersion: "513365" uid: c2a1f7c0-a348-4c83-ab4b-a852aa34bf98 spec: conditions: - lastTransitionTime: "2021-06-16T16:56:44Z" message: not ready observedGeneration: 1 reason: bug status: "False" type: Upgradeable deployments: - etcd-operator serviceAccounts: - etcd-operator status: conditions: - lastTransitionTime: "2021-06-16T16:56:44Z" message: not ready observedGeneration: 2 reason: bug status: "False" type: Upgradeable 3, Approve the etcdoperator.v0.9.4. [cloud-user@preserve-olm-env jian]$ oc edit ip install-r5sqx installplan.operators.coreos.com/install-r5sqx edited [cloud-user@preserve-olm-env jian]$ [cloud-user@preserve-olm-env jian]$ oc get ip NAME CSV APPROVAL APPROVED install-l9nd7 etcdoperator.v0.9.2 Manual true install-r5sqx etcdoperator.v0.9.4 Manual true [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.1.0-63 OpenShift Elasticsearch Operator 5.1.0-63 Succeeded etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 Replacing etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Pending [cloud-user@preserve-olm-env jian]$ oc get csv etcdoperator.v0.9.4 -o yaml ... status: cleanup: {} conditions: - lastTransitionTime: "2021-06-22T07:35:22Z" lastUpdateTime: "2021-06-22T07:35:22Z" message: requirements not yet checked phase: Pending reason: RequirementsUnknown - lastTransitionTime: "2021-06-22T07:35:22Z" lastUpdateTime: "2021-06-22T07:35:22Z" message: 'operator is not upgradeable: The operator is not upgradeable: not ready' phase: Pending reason: OperatorConditionNotUpgradeable lastTransitionTime: "2021-06-22T07:35:22Z" lastUpdateTime: "2021-06-22T07:35:22Z" message: 'operator is not upgradeable: The operator is not upgradeable: not ready' phase: Pending reason: OperatorConditionNotUpgradeable 4, Patch the `spec.conditions[0].Upgradeable` of operatorcondition to True. etcdoperator.v0.9.2 can be upgraded to etcdoperator.v0.9.4, looks good. [cloud-user@preserve-olm-env jian]$ oc patch operatorcondition etcdoperator.v0.9.2 -p '{"spec":{"conditions":[{"type":"Upgradeable", "observedGeneration":1,"status":"True","reason":"bug","message":"now ready","lastUpdateTime":"2021-06-16T16:56:44Z","lastTransitionTime":"2021-06-16T16:56:44Z"}]}}' --type=merge operatorcondition.operators.coreos.com/etcdoperator.v0.9.2 patched [cloud-user@preserve-olm-env jian]$ [cloud-user@preserve-olm-env jian]$ [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.1.0-63 OpenShift Elasticsearch Operator 5.1.0-63 Succeeded etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 Replacing etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Installing [cloud-user@preserve-olm-env jian]$ oc get operatorcondition NAME AGE etcdoperator.v0.9.4 3m18s [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.1.0-63 OpenShift Elasticsearch Operator 5.1.0-63 Succeeded etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Succeeded As I mentioned in comment 14, I also test the `spec.overrides`, but, it doesn't work. test steps: repeat step 1-3, [cloud-user@preserve-olm-env jian]$ oc create -f sub-etcd.yaml subscription.operators.coreos.com/etcd created [cloud-user@preserve-olm-env jian]$ oc get sub NAME PACKAGE SOURCE CHANNEL etcd etcd community-operators singlenamespace-alpha [cloud-user@preserve-olm-env jian]$ oc get ip NAME CSV APPROVAL APPROVED install-wvstp etcdoperator.v0.9.2 Manual false [cloud-user@preserve-olm-env jian]$ oc edit ip install-wvstp installplan.operators.coreos.com/install-wvstp edited [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.1.0-63 OpenShift Elasticsearch Operator 5.1.0-63 Succeeded etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 Installing [cloud-user@preserve-olm-env jian]$ oc get ip NAME CSV APPROVAL APPROVED install-6nh9s etcdoperator.v0.9.4 Manual false install-wvstp etcdoperator.v0.9.2 Manual true [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.1.0-63 OpenShift Elasticsearch Operator 5.1.0-63 Succeeded etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 Succeeded [cloud-user@preserve-olm-env jian]$ oc patch operatorcondition etcdoperator.v0.9.2 -p '{"spec":{"conditions":[{"type":"Upgradeable", "observedGeneration":1,"status":"False","reason":"bug","message":"not ready","lastUpdateTime":"2021-06-16T16:56:44Z","lastTransitionTime":"2021-06-16T16:56:44Z"}]}}' --type=merge operatorcondition.operators.coreos.com/etcdoperator.v0.9.2 patched [cloud-user@preserve-olm-env jian]$ oc edit ip install-6nh9s installplan.operators.coreos.com/install-6nh9s edited [cloud-user@preserve-olm-env jian]$ [cloud-user@preserve-olm-env jian]$ [cloud-user@preserve-olm-env jian]$ oc get ip NAME CSV APPROVAL APPROVED install-6nh9s etcdoperator.v0.9.4 Manual true install-wvstp etcdoperator.v0.9.2 Manual true [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.1.0-63 OpenShift Elasticsearch Operator 5.1.0-63 Succeeded etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 Replacing etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Pending 4, Overrides it and sets the Upgradeable to True. But, the `status.conditions` still in `False`, `spec.overrides` cannot overrides the `status.conditions`. Is it as expected? [cloud-user@preserve-olm-env jian]$ oc patch operatorcondition etcdoperator.v0.9.2 -p '{"spec":{"overrides":[{"type":"Upgradeable", "status":"True","reason":"bug","message":"now ready"}]}}' --type=merge operatorcondition.operators.coreos.com/etcdoperator.v0.9.2 patched [cloud-user@preserve-olm-env jian]$ oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.1.0-63 OpenShift Elasticsearch Operator 5.1.0-63 Succeeded etcdoperator.v0.9.2 etcd 0.9.2 etcdoperator.v0.9.0 Replacing etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Installing [cloud-user@preserve-olm-env jian]$ oc get pods NAME READY STATUS RESTARTS AGE etcd-operator-6fc7cf7dd9-b64fp 3/3 Running 0 4m28s etcd-operator-f49544fdd-6bjct 0/3 ContainerCreating 0 67s [cloud-user@preserve-olm-env jian]$ oc get operatorcondition etcdoperator.v0.9.2 -o yaml apiVersion: operators.coreos.com/v2 kind: OperatorCondition metadata: creationTimestamp: "2021-06-22T07:43:08Z" generation: 3 labels: operators.coreos.com/etcd.default: "" name: etcdoperator.v0.9.2 namespace: default ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: true kind: ClusterServiceVersion name: etcdoperator.v0.9.2 uid: 221b55d2-3f19-42e2-abe7-1b0851710a87 resourceVersion: "525586" uid: 9f271bdc-ab10-426b-8614-9a330195b900 spec: conditions: - lastTransitionTime: "2021-06-16T16:56:44Z" message: not ready observedGeneration: 1 reason: bug status: "False" type: Upgradeable deployments: - etcd-operator overrides: - message: now ready reason: bug status: "True" type: Upgradeable serviceAccounts: - etcd-operator status: conditions: - lastTransitionTime: "2021-06-16T16:56:44Z" message: not ready observedGeneration: 3 reason: bug status: "False" type: Upgradeable
The `status.conditions[0].status` is `Flase`, however, it's in upgrading, and can be upgraded successfully.
Hi Jian, I updated the doc. The correct field is `ObservedGeneration`. The `spec.overrides` doesn't change the `spec.conditions`. It simply allows the OLM to proceed/block the upgrade and override the current conditions setting. So what you are seeing is accurate as the condition is False but you set override to True and the upgrade proceeds accordingly.
Hi Vu, > The `spec.overrides` doesn't change the `spec.conditions`. It simply allows the OLM to proceed/block the upgrade and override the current conditions setting. Yeah, but since the `status` is for reflecting the real-time status, the `spec.overrides` should change the `spec.status` at least. Otherwise, it's confusing for the users. What do you think? For this bug, I verified it since the `spec.conditions` works well.
Hey Jian, I can see the reasoning behind what you said. I think an improvement can be made to project the conditions in the overrides to the status as well. Thanks, Vu
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438