Bug 1927340 - Race condition in OperatorCondition reconcilation
Summary: Race condition in OperatorCondition reconcilation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Vu Dinh
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks: 1972075
TreeView+ depends on / blocked
 
Reported: 2021-02-10 15:01 UTC by Evan Cordell
Modified: 2021-07-27 22:44 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:43:43 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift operator-framework-olm pull 92 0 None closed Bug 1927340: Add OperatorCondition status sync and update operator upgradeable check 2021-06-28 05:45:17 UTC
Github operator-framework operator-lifecycle-manager pull 2160 0 None closed Add OperatorCondition status sync and update operator upgradeable check 2021-06-28 05:45:17 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:44:17 UTC

Description Evan Cordell 2021-02-10 15:01:56 UTC
Description of problem:

There is a potential race when using the `OperatorCondition` Upgradeable supported condition that can do the opposite of what the user expects.

What should happen on the happy path:

 - operator writes operatorcondition with upgradeable=false
 - olm sees operatorcondition and sets pending = true
 - operator sets pod ready = true
 - olm waits for the operator to set operatorcondition upgradeable=true

But OLM is not gauranteed to see the updated `OperatorCondition` value that the operator has set before it sees that the operator has set `ready=True` on its pod. 

In that case, this is what actually happens:

- operator writes operatorcondition with upgradeable=false
- operator sets pod ready = true
- olm sees pod status first and sets success = true
- olm upgrades operator to the next one
- olm gets a bug report from an user or operator author that `upgradeable=false` didn't prevent the upgrade


Version-Release number of selected component (if applicable): 4.7


There are a couple of ways to investigate fixing this:

1. When OLM queries for OperatorCondition status, hit the API directly instead of using the cache. This would fix the issue but will increase API load, especially during rollout of an upgrade, so it's not the ideal.
2. Use a generation / some other mechanism so that OLM can ack that it has seen the latest updates on OperatorCondition. The operator would wait to see the ack (i.e. ObservedGeneration = Generation) before setting pod readiness = true.

Comment 2 Nico Schieder 2021-02-19 13:45:11 UTC
> 1. When OLM queries for OperatorCondition status, hit the API directly instead of using the cache. This would fix the issue but will increase API load, especially during rollout of an upgrade, so it's not the ideal.

While this would reduce the time frame of the race condition, the Operator can still not be sure whether setting the Upgradable condition to True will be respected by OLM.
If OLM checked the OperatorCondition right before the Operator set it to False the Operator will be upgraded while the Operator already begun doing something important.

> 2. Use a generation / some other mechanism so that OLM can ack that it has seen the latest updates on OperatorCondition. The operator would wait to see the ack (i.e. ObservedGeneration = Generation) before setting pod readiness = true.

This would work. When the Operator can wait for an ACK from OLM.
The Operator can be sure that the Condition update will be respected.

While it's very counter-intuitive, I think that the Conditions reported by the Operator belong into the OperatorCondition.Spec instead of OperatorCondition.Status.
This way .metadata.generation is incremented and the Operator can wait for .status.observedGeneration to sync and for a "Acknowledged" condition, because OLM needs to signal to the Operator if the upgrade is already in progress ().

I would also like to propose 3.)

3. Instead of the Operator acquiring a lock on OLM, OLM could acquire a lock on the Operator, but this requires a CRD instance from the Operator to interact with. (Which is a common pattern, but not universally adopted)

- OLM sets/increments an annotation before starting to upgrade an operator
- The Operator reads the condition
-> If no upgrade-blocking operation is in progress set Upgradable condition of the Operators CRD to True
  - don't start anything blocking until upgraded by OLM
  
-> If a upgrade-blocking operation is in progress set Upgradable condition to False
  - OLM may respect this condition or still choose to force-upgrade the Operator
  - Operators should update the Upgradable condition to True when finished with the blocking operation
  - OLM may retry this flow from the start (by updating an annotation) to trigger the Operator to reconsider the Upgradable status

I think it might be worth thinking about this for the future:
- doesn't need a new API (no OperatorCondition)
- could be adopted as a standard pattern for operators, as it can be used via Helm or manually as well
- obtaining the Upgrade lock is now a burden on OLM instead of every Operator
- Operators may work out-of-the Box by reporting the Upgradable condition and refusing to Terminate till done? With long `terminationGracePeriodSeconds`?
- no dependency on operator-sdk runtime libraries or APIs

yaml examples here:
https://github.com/operator-framework/operator-lifecycle-manager/issues/1973#issuecomment-781314822

Comment 7 Jian Zhang 2021-06-15 08:29:31 UTC
[cloud-user@preserve-olm-env jian]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-06-14-145150   True        False         5h25m   Cluster version is 4.8.0-0.nightly-2021-06-14-145150

[cloud-user@preserve-olm-env jian]$ oc exec catalog-operator-5bff594bb5-8gvcp -- olm --version
OLM version: 0.17.0
git commit: f25f670c03e849ba0fd53a56daa0d8a697f68d16

1, Create an OperatorGroup in the default namespace.
[cloud-user@preserve-olm-env jian]$ oc create -f og.yaml 
operatorgroup.operators.coreos.com/default-og created
[cloud-user@preserve-olm-env jian]$ oc get og -n default
NAME         AGE
default-og   17s

2, Subscribe to the etcd 0.9.2 version with manual approve.
[cloud-user@preserve-olm-env jian]$ cat sub-etcd.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: etcd
  namespace: default
spec:
  channel: singlenamespace-alpha
  installPlanApproval: Manual
  name: etcd
  source: community-operators
  sourceNamespace: openshift-marketplace
  startingCSV: etcdoperator.v0.9.2

[cloud-user@preserve-olm-env jian]$ oc create -f sub-etcd.yaml 
subscription.operators.coreos.com/etcd created

[cloud-user@preserve-olm-env jian]$ oc project default
Now using project "default" on server "https://api.jiazha15.qe.devcluster.openshift.com:6443".
[cloud-user@preserve-olm-env jian]$ oc get operatorcondition 
No resources found in default namespace.
[cloud-user@preserve-olm-env jian]$ oc get sub
NAME   PACKAGE   SOURCE                CHANNEL
etcd   etcd      community-operators   singlenamespace-alpha
[cloud-user@preserve-olm-env jian]$ oc get ip
NAME            CSV                   APPROVAL   APPROVED
install-cbnkv   etcdoperator.v0.9.2   Manual     false

3, Approve it.

[cloud-user@preserve-olm-env jian]$ oc get ip
NAME            CSV                   APPROVAL   APPROVED
install-cbnkv   etcdoperator.v0.9.2   Manual     true
install-l8g4b   etcdoperator.v0.9.4   Manual     false
[cloud-user@preserve-olm-env jian]$ oc get operatorcondition
NAME                  AGE
etcdoperator.v0.9.2   18s
[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                              DISPLAY                            VERSION    REPLACES              PHASE
elasticsearch-operator.5.1.0-56   OpenShift Elasticsearch Operator   5.1.0-56                         Succeeded
etcdoperator.v0.9.2               etcd                               0.9.2      etcdoperator.v0.9.0   Succeeded


4, Set the Upgradeable status to False.
[cloud-user@preserve-olm-env jian]$ oc proxy --port=8081 &
[1] 773253
[cloud-user@preserve-olm-env jian]$ Starting to serve on 127.0.0.1:8081

[cloud-user@preserve-olm-env jian]$ curl -X PATCH -H 'Content-Type: application/merge-patch+json' --data '{"status": {"conditions": [{"lastTransitionTime": "2021-06-15T15:39:01Z","message": "Bug","reason": "NotUpgradeable","status": "False","type": "Upgradeable"}]}}' localhost:8081/apis/operators.coreos.com/v1/namespaces/default/operatorconditions/etcdoperator.v0.9.2/status
{"apiVersion":"operators.coreos.com/v1","kind":"OperatorCondition","metadata":{"creationTimestamp":"2021-06-15T07:41:25Z","generation":1,"labels":{"operators.coreos.com/etcd.default":""},"managedFields":[{"apiVersion":"operators.coreos.com/v2","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:labels":{".":{},"f:operators.coreos.com/etcd.default":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"2d00c3fd-dc4a-4673-8f23-1732c8f30a2e\"}":{".":{},"f:apiVersion":{},"f:blockOwnerDeletion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}},"f:spec":{".":{},"f:deployments":{},"f:serviceAccounts":{}}},"manager":"olm","operation":"Update","time":"2021-06-15T07:41:26Z"},{"apiVersion":"operators.coreos.com/v1","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:conditions":{}}},"manager":"curl","operation":"Update","time":"2021-06-15T08:15:55Z"}],"name":"etcdoperator.v0.9.2","namespace":"default","ownerReferences":[{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":true,"kind":"ClusterServiceVersion","name":"etcdoperator.v0.9.2","uid":"2d00c3fd-dc4a-4673-8f23-1732c8f30a2e"}],"resourceVersion":"211028","uid":"eea100c9-54f5-4082-acc5-68d39de8a9cb"},"spec":{"deployments":["etcd-operator"],"serviceAccounts":["etcd-operator"]},"status":{"conditions":[{"lastTransitionTime":"2021-06-15T15:39:01Z","message":"Bug","reason":"NotUpgradeable","status":"False","type":"Upgradeable"}]}}[cloud-user@preserve-olm-env jian]$ oc get operatorcondition/etcdoperator.v0.9.2 -o yaml
apiVersion: operators.coreos.com/v2
kind: OperatorCondition
metadata:
  creationTimestamp: "2021-06-15T07:41:25Z"
  generation: 1
  labels:
    operators.coreos.com/etcd.default: ""
  name: etcdoperator.v0.9.2
  namespace: default
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: true
    kind: ClusterServiceVersion
    name: etcdoperator.v0.9.2
    uid: 2d00c3fd-dc4a-4673-8f23-1732c8f30a2e
  resourceVersion: "211028"
  uid: eea100c9-54f5-4082-acc5-68d39de8a9cb
spec:
  deployments:
  - etcd-operator
  serviceAccounts:
  - etcd-operator
status:
  conditions:
  - lastTransitionTime: "2021-06-15T15:39:01Z"
    message: Bug
    reason: NotUpgradeable
    status: "False"
    type: Upgradeable

5, Approve etcdoperator.v0.9.4, the corresponding CSV in Pending status, LGTM.

[cloud-user@preserve-olm-env jian]$ oc get ip
NAME            CSV                   APPROVAL   APPROVED
install-cbnkv   etcdoperator.v0.9.2   Manual     true
install-l8g4b   etcdoperator.v0.9.4   Manual     true

[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                              DISPLAY                            VERSION    REPLACES              PHASE
elasticsearch-operator.5.1.0-56   OpenShift Elasticsearch Operator   5.1.0-56                         Succeeded
etcdoperator.v0.9.2               etcd                               0.9.2      etcdoperator.v0.9.0   Replacing
etcdoperator.v0.9.4               etcd                               0.9.4      etcdoperator.v0.9.2   Pending

6, update the Upgradeable status to True,
[cloud-user@preserve-olm-env jian]$ curl -X PATCH -H 'Content-Type: application/merge-patch+json' --data '{"status": {"conditions": [{"lastTransitionTime": "2021-06-15T15:39:01Z","message": "Bug fixed","reason": "Upgradeable","status": "True","type": "Upgradeable"}]}}' localhost:8081/apis/operators.coreos.com/v1/namespaces/default/operatorconditions/etcdoperator.v0.9.2/status
{"apiVersion":"operators.coreos.com/v1","kind":"OperatorCondition","metadata":{"creationTimestamp":"2021-06-15T07:41:25Z","generation":1,"labels":{"operators.coreos.com/etcd.default":""},"managedFields":[{"apiVersion":"operators.coreos.com/v2","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:labels":{".":{},"f:operators.coreos.com/etcd.default":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"2d00c3fd-dc4a-4673-8f23-1732c8f30a2e\"}":{".":{},"f:apiVersion":{},"f:blockOwnerDeletion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}},"f:spec":{".":{},"f:deployments":{},"f:serviceAccounts":{}}},"manager":"olm","operation":"Update","time":"2021-06-15T07:41:26Z"},{"apiVersion":"operators.coreos.com/v1","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:conditions":{}}},"manager":"curl","operation":"Update","time":"2021-06-15T08:23:29Z"}],"name":"etcdoperator.v0.9.2","namespace":"default","ownerReferences":[{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":true,"kind":"ClusterServiceVersion","name":"etcdoperator.v0.9.2","uid":"2d00c3fd-dc4a-4673-8f23-1732c8f30a2e"}],"resourceVersion":"214874","uid":"eea100c9-54f5-4082-acc5-68d39de8a9cb"},"spec":{"deployments":["etcd-operator"],"serviceAccounts":["etcd-operator"]},"status":{"conditions":[{"lastTransitionTime":"2021-06-15T15:39:01Z","message":"Bug fixed","reason":"Upgradeable","status":"True","type":"Upgradeable"}]}}

[cloud-user@preserve-olm-env jian]$ oc get operatorcondition etcdoperator.v0.9.2 -o yaml
apiVersion: operators.coreos.com/v2
kind: OperatorCondition
metadata:
  creationTimestamp: "2021-06-15T07:41:25Z"
  generation: 1
  labels:
    operators.coreos.com/etcd.default: ""
  name: etcdoperator.v0.9.2
  namespace: default
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: true
    kind: ClusterServiceVersion
    name: etcdoperator.v0.9.2
    uid: 2d00c3fd-dc4a-4673-8f23-1732c8f30a2e
  resourceVersion: "214874"
  uid: eea100c9-54f5-4082-acc5-68d39de8a9cb
spec:
  deployments:
  - etcd-operator
  serviceAccounts:
  - etcd-operator
status:
  conditions:
  - lastTransitionTime: "2021-06-15T15:39:01Z"
    message: Bug fixed
    reason: Upgradeable
    status: "True"
    type: Upgradeable


[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                              DISPLAY                            VERSION    REPLACES              PHASE
elasticsearch-operator.5.1.0-56   OpenShift Elasticsearch Operator   5.1.0-56                         Succeeded
etcdoperator.v0.9.2               etcd                               0.9.2      etcdoperator.v0.9.0   Replacing
etcdoperator.v0.9.4               etcd                               0.9.4      etcdoperator.v0.9.2   Pending


But the CSV still in Pending status, as follows:
[cloud-user@preserve-olm-env jian]$ oc get csv etcdoperator.v0.9.4 -o yaml
...
...
status:
  cleanup: {}
  conditions:
  - lastTransitionTime: "2021-06-15T08:19:42Z"
    lastUpdateTime: "2021-06-15T08:19:42Z"
    message: requirements not yet checked
    phase: Pending
    reason: RequirementsUnknown
  - lastTransitionTime: "2021-06-15T08:19:42Z"
    lastUpdateTime: "2021-06-15T08:19:42Z"
    message: 'operator is not upgradeable: The operatorcondition status "Upgradeable"="False"
      is outdated'
    phase: Pending
    reason: OperatorConditionNotUpgradeable
  lastTransitionTime: "2021-06-15T08:19:42Z"
  lastUpdateTime: "2021-06-15T08:23:29Z"
  message: 'operator is not upgradeable: The operatorcondition status "Upgradeable"="True"
    is outdated'
  phase: Pending
  reason: OperatorConditionNotUpgradeable

Comment 9 Jian Zhang 2021-06-16 03:39:48 UTC
Hi Vu,

Thanks for your information!

1, repeat the above step 1-3.
[cloud-user@preserve-olm-env jian]$ oc get sub 
NAME   PACKAGE   SOURCE                CHANNEL
etcd   etcd      community-operators   singlenamespace-alpha

[cloud-user@preserve-olm-env jian]$ oc get ip
NAME            CSV                   APPROVAL   APPROVED
install-lkd94   etcdoperator.v0.9.4   Manual     false
install-whgs5   etcdoperator.v0.9.2   Manual     true

[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                              DISPLAY                            VERSION    REPLACES              PHASE
elasticsearch-operator.5.1.0-56   OpenShift Elasticsearch Operator   5.1.0-56                         Succeeded
etcdoperator.v0.9.2               etcd                               0.9.2      etcdoperator.v0.9.0   Succeeded

2, Update the OperatorContion, add `spec.conditions` fileds, and set the Upgradeable to False. But, got the below warnings when modify the OperatorCondition object.
# operatorconditions.operators.coreos.com "etcdoperator.v0.9.2" was not valid:
# * spec.conditions.reason: Invalid value: "The operator is not upgradeable": spec.conditions.reason in body should match '^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$'
#

[cloud-user@preserve-olm-env jian]$ oc get operatorconditions etcdoperator.v0.9.2 -o yaml
apiVersion: operators.coreos.com/v2
kind: OperatorCondition
metadata:
  creationTimestamp: "2021-06-16T02:53:01Z"
  generation: 2
  labels:
    operators.coreos.com/etcd.default: ""
  name: etcdoperator.v0.9.2
  namespace: default
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: true
    kind: ClusterServiceVersion
    name: etcdoperator.v0.9.2
    uid: 2345d4b7-c178-4de3-ba61-6bfe811889e7
  resourceVersion: "704994"
  uid: 9db3774e-d008-4ed2-a7da-001750d55fc5
spec:
  conditions:
  - lastTransitionTime: "2021-06-16T16:56:44Z"
    message: The operator is not ready
    reason: Bug
    status: "False"
    type: Upgradeable
  deployments:
  - etcd-operator
  serviceAccounts:
  - etcd-operator
status:
  conditions:
  - lastTransitionTime: "2021-06-16T16:56:44Z"
    message: The operator is not ready
    observedGeneration: 2
    reason: Bug
    status: "False"
    type: Upgradeable

3, Approve InstallPlan etcdoperator.v0.9.4. Its CSV in Pending status looks good.

[cloud-user@preserve-olm-env jian]$ oc edit ip install-lkd94 
installplan.operators.coreos.com/install-lkd94 edited
[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                              DISPLAY                            VERSION    REPLACES              PHASE
elasticsearch-operator.5.1.0-56   OpenShift Elasticsearch Operator   5.1.0-56                         Succeeded
etcdoperator.v0.9.2               etcd                               0.9.2      etcdoperator.v0.9.0   Replacing
etcdoperator.v0.9.4               etcd                               0.9.4      etcdoperator.v0.9.2   Pending
[cloud-user@preserve-olm-env jian]$ 


[cloud-user@preserve-olm-env jian]$ oc get csv etcdoperator.v0.9.4 -o yaml
...
...
status:
  cleanup: {}
  conditions:
  - lastTransitionTime: "2021-06-16T03:20:38Z"
    lastUpdateTime: "2021-06-16T03:20:38Z"
    message: requirements not yet checked
    phase: Pending
    reason: RequirementsUnknown
  - lastTransitionTime: "2021-06-16T03:20:38Z"
    lastUpdateTime: "2021-06-16T03:20:38Z"
    message: 'operator is not upgradeable: The operator is not upgradeable: The operator
      is not ready'
    phase: Pending
    reason: OperatorConditionNotUpgradeable
  lastTransitionTime: "2021-06-16T03:20:38Z"
  lastUpdateTime: "2021-06-16T03:20:38Z"
  message: 'operator is not upgradeable: The operator is not upgradeable: The operator
    is not ready'
  phase: Pending
  reason: OperatorConditionNotUpgradeable

4, Check the OperatorConditions status. One question, the `spec.observedGeneration` is gone, is it as expected?

[cloud-user@preserve-olm-env jian]$ oc get operatorconditions etcdoperator.v0.9.2 -o yaml
apiVersion: operators.coreos.com/v2
kind: OperatorCondition
metadata:
  creationTimestamp: "2021-06-16T02:53:01Z"
  generation: 2
  labels:
    operators.coreos.com/etcd.default: ""
  name: etcdoperator.v0.9.2
  namespace: default
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: true
    kind: ClusterServiceVersion
    name: etcdoperator.v0.9.2
    uid: 2345d4b7-c178-4de3-ba61-6bfe811889e7
  resourceVersion: "704994"
  uid: 9db3774e-d008-4ed2-a7da-001750d55fc5
spec:
  conditions:
  - lastTransitionTime: "2021-06-16T16:56:44Z"
    message: The operator is not ready
    reason: Bug
    status: "False"
    type: Upgradeable
  deployments:
  - etcd-operator
  serviceAccounts:
  - etcd-operator
status:
  conditions:
  - lastTransitionTime: "2021-06-16T16:56:44Z"
    message: The operator is not ready
    observedGeneration: 2
    reason: Bug
    status: "False"
    type: Upgradeable


5, Set the OperatorCondition `spec.conditions.Upgradeable` to "True". Maybe we should address this error.
[cloud-user@preserve-olm-env jian]$ oc edit operatorconditions etcdoperator.v0.9.2 
error: operatorconditions.operators.coreos.com "etcdoperator.v0.9.2" is invalid
operatorcondition.operators.coreos.com/etcdoperator.v0.9.2 edited

6, etcdoperator.v0.9.2 upgraded to etcdoperator.v0.9.4 successfully, LGTM.
[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                              DISPLAY                            VERSION    REPLACES              PHASE
elasticsearch-operator.5.1.0-56   OpenShift Elasticsearch Operator   5.1.0-56                         Succeeded
etcdoperator.v0.9.4               etcd                               0.9.4      etcdoperator.v0.9.2   Succeeded

[cloud-user@preserve-olm-env jian]$ oc get operatorconditions
NAME                  AGE
etcdoperator.v0.9.4   5m

Comment 10 Vu Dinh 2021-06-16 15:26:37 UTC
Hi Jian,

I want to ask where did you see this warning from? I don't see this issue when I tried to update the OperatorCondition. I simply use `oc apply` with the `spec.conditions` added to yaml. You don't need to use `curl` command to update the spec.

Vu

Comment 11 Jian Zhang 2021-06-17 01:37:56 UTC
Hi Vu,

For this warning, seems like it is forbidden to use the `blank` letter in the `reason` field. It's weird, I'm not sure why we do this rule. But, this is not a big issue.

# operatorconditions.operators.coreos.com "etcdoperator.v0.9.2" was not valid:
# * spec.conditions.reason: Invalid value: "The operator is not upgradeable": spec.conditions.reason in body should match '^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$'
#

Regarding point 4, the `spec.conditions[0],observedGeneration` is gone, is it as expected?

Comment 12 Vu Dinh 2021-06-17 05:54:00 UTC
Hey Jian,

When I update the spec with observedGeneration, it persists just fine. I'm not sure why yours was missing. I looked at the curl command and it seemed the `observedGeneration` wasn't set.

Vu

Comment 13 Jian Zhang 2021-06-18 09:00:23 UTC
Hi Vu,

I also used the `patch`, but the `observedGeneration` still gone.

$oc patch operatorcondition etcdoperator.v0.9.2 -p '{"spec":{"conditions":[{"type":"Upgradeable", "observedCondition":1,"status":"False","reason":"bug","message":"not ready","lastUpdateTime":"2021-06-16T16:56:44Z","lastTransitionTime":"2021-06-16T16:56:44Z"}]}}' --type=merge

Comment 14 Jian Zhang 2021-06-18 09:49:57 UTC
I also test the `spec.overrides`, seems like it doesn't work.

1, patch it to set the Upgradeable to False.
[cloud-user@preserve-olm-env jian]$ oc patch operatorcondition etcdoperator.v0.9.2 -p '{"spec":{"conditions":[{"type":"Upgradeable", "observedCondition":1,"status":"False","reason":"bug","message":"not ready","lastUpdateTime":"2021-06-16T16:56:44Z","lastTransitionTime":"2021-06-16T16:56:44Z"}]}}' --type=merge
operatorcondition.operators.coreos.com/etcdoperator.v0.9.2 patched

2, the `observedGeneration` still gone.

[cloud-user@preserve-olm-env jian]$  oc get operatorcondition etcdoperator.v0.9.2 -o yaml
apiVersion: operators.coreos.com/v2
kind: OperatorCondition
metadata:
  creationTimestamp: "2021-06-18T09:36:13Z"
  generation: 2
  labels:
    operators.coreos.com/etcd.default: ""
  name: etcdoperator.v0.9.2
  namespace: default
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: true
    kind: ClusterServiceVersion
    name: etcdoperator.v0.9.2
    uid: 3d5d8262-4eb7-4265-ab81-bfaf92000c3b
  resourceVersion: "835241"
  uid: 8c0cb630-2c2b-4854-9bf5-103c9764cf44
spec:
  conditions:
  - lastTransitionTime: "2021-06-16T16:56:44Z"
    message: not ready
    reason: bug
    status: "False"
    type: Upgradeable
  deployments:
  - etcd-operator
  serviceAccounts:
  - etcd-operator
status:
  conditions:
  - lastTransitionTime: "2021-06-16T16:56:44Z"
    message: not ready
    observedGeneration: 2
    reason: bug
    status: "False"
    type: Upgradeable

3, Approve the etcdoperator.v0.9.4 
[cloud-user@preserve-olm-env jian]$ oc get ip
NAME            CSV                   APPROVAL   APPROVED
install-bhb4z   etcdoperator.v0.9.2   Manual     true
install-fb7dx   etcdoperator.v0.9.4   Manual     false
[cloud-user@preserve-olm-env jian]$ oc edit ip install-fb7dx
installplan.operators.coreos.com/install-fb7dx edited
[cloud-user@preserve-olm-env jian]$ 
[cloud-user@preserve-olm-env jian]$ 

4, etcdoperator.v0.9.4 in Pending status, looks good.
[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                              DISPLAY                            VERSION    REPLACES                          PHASE
blacklist.v0.0.1                  kaka                               0.0.1                                        Succeeded
elasticsearch-operator.5.1.0-59   OpenShift Elasticsearch Operator   5.1.0-59   elasticsearch-operator.5.1.0-58   Succeeded
etcdoperator.v0.9.2               etcd                               0.9.2      etcdoperator.v0.9.0               Replacing
etcdoperator.v0.9.4               etcd                               0.9.4      etcdoperator.v0.9.2               Pending

5, Overrides it and set the Upgradeable to True.
[cloud-user@preserve-olm-env jian]$ oc patch operatorcondition etcdoperator.v0.9.2 -p '{"spec":{"overrides":[{"type":"Upgradeable", "status":"True","reason":"bug","message":"now ready"}]}}' --type=merge
operatorcondition.operators.coreos.com/etcdoperator.v0.9.2 patched

6, The status.conditions still in `False`, seems like `spec.overrides` can overrides the `status.conditions`.
[cloud-user@preserve-olm-env jian]$  oc get operatorcondition etcdoperator.v0.9.2 -o yaml
apiVersion: operators.coreos.com/v2
kind: OperatorCondition
metadata:
  creationTimestamp: "2021-06-18T09:36:13Z"
  generation: 3
  labels:
    operators.coreos.com/etcd.default: ""
  name: etcdoperator.v0.9.2
  namespace: default
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: true
    kind: ClusterServiceVersion
    name: etcdoperator.v0.9.2
    uid: 3d5d8262-4eb7-4265-ab81-bfaf92000c3b
  resourceVersion: "837086"
  uid: 8c0cb630-2c2b-4854-9bf5-103c9764cf44
spec:
  conditions:
  - lastTransitionTime: "2021-06-16T16:56:44Z"
    message: not ready
    reason: bug
    status: "False"
    type: Upgradeable
  deployments:
  - etcd-operator
  overrides:
  - message: now ready
    reason: bug
    status: "True"
    type: Upgradeable
  serviceAccounts:
  - etcd-operator
status:
  conditions:
  - lastTransitionTime: "2021-06-16T16:56:44Z"
    message: not ready
    observedGeneration: 3
    reason: bug
    status: "False"
    type: Upgradeable


But, the CSV can be upgraded well.
[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                              DISPLAY                            VERSION    REPLACES                          PHASE
blacklist.v0.0.1                  kaka                               0.0.1                                        Succeeded
elasticsearch-operator.5.1.0-59   OpenShift Elasticsearch Operator   5.1.0-59   elasticsearch-operator.5.1.0-58   Succeeded
etcdoperator.v0.9.2               etcd                               0.9.2      etcdoperator.v0.9.0               Replacing
etcdoperator.v0.9.4               etcd                               0.9.4      etcdoperator.v0.9.2               Installing

Comment 15 Vu Dinh 2021-06-18 14:06:39 UTC
Hi Jian,

I can see in the patch command you have `observedCondition` which is not valid field. The field is `observedGeneration`.

Vu

Comment 16 Jian Zhang 2021-06-22 07:51:20 UTC
Hi Vu,

Oh, sorry, I guess something wrong with the example list in https://hackmd.io/9wG20hu5TU-y1HrkhvcsZQ?view :

apiVersion: operators.coreos.com/v1
kind: OperatorCondition
metadata:
  generation: 2
spec:
  conditions:
  - lastTransitionTime: "2021-05-05T16:56:44Z"
    lastUpdateTime: "2021-05-05T16:56:44Z"
    message: The operator is not ready
    reason: The operator is not upgradeable
    status: "False"
    observedCondition: 1
    type: Upgradeable
status:
  conditions:
  - lastTransitionTime: "2021-05-05T16:56:50Z"
    lastUpdateTime: "2021-05-05T16:56:50"
    message: The operator is not ready
    reason: The operator is not upgradeable
    status: "False"
    observedCondition: 2
    type: Upgradeable


Anyway, I retest it with the `observedGeneration`,

[cloud-user@preserve-olm-env jian]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-06-19-005119   True        False         18h     Cluster version is 4.8.0-0.nightly-2021-06-19-005119

[cloud-user@preserve-olm-env jian]$ oc -n openshift-operator-lifecycle-manager exec catalog-operator-78f55c6dfc-zs9pd  -- olm --version
OLM version: 0.17.0
git commit: f25f670c03e849ba0fd53a56daa0d8a697f68d16

1, Install etcd operator v0.9.2
[cloud-user@preserve-olm-env jian]$ oc project
Using project "default" on server "https://api.wewang-share.qe.devcluster.openshift.com:6443".
[cloud-user@preserve-olm-env jian]$ oc get og
NAME         AGE
default-og   2m13s
[cloud-user@preserve-olm-env jian]$ cat sub-etcd.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: etcd
  namespace: default
spec:
  channel: singlenamespace-alpha
  installPlanApproval: Manual
  name: etcd
  source: community-operators
  sourceNamespace: openshift-marketplace
  startingCSV: etcdoperator.v0.9.2
[cloud-user@preserve-olm-env jian]$ oc create -f sub-etcd.yaml 
subscription.operators.coreos.com/etcd created
[cloud-user@preserve-olm-env jian]$ oc get sub 
NAME   PACKAGE   SOURCE                CHANNEL
etcd   etcd      community-operators   singlenamespace-alpha
[cloud-user@preserve-olm-env jian]$ oc get ip
NAME            CSV                   APPROVAL   APPROVED
install-l9nd7   etcdoperator.v0.9.2   Manual     false
[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                              DISPLAY                            VERSION    REPLACES   PHASE
elasticsearch-operator.5.1.0-63   OpenShift Elasticsearch Operator   5.1.0-63              Succeeded

[cloud-user@preserve-olm-env jian]$ oc edit ip install-l9nd7 
installplan.operators.coreos.com/install-l9nd7 edited
[cloud-user@preserve-olm-env jian]$ oc get ip 
NAME            CSV                   APPROVAL   APPROVED
install-l9nd7   etcdoperator.v0.9.2   Manual     true
install-r5sqx   etcdoperator.v0.9.4   Manual     false
[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                              DISPLAY                            VERSION    REPLACES              PHASE
elasticsearch-operator.5.1.0-63   OpenShift Elasticsearch Operator   5.1.0-63                         Succeeded
etcdoperator.v0.9.2               etcd                               0.9.2      etcdoperator.v0.9.0   Succeeded
[cloud-user@preserve-olm-env jian]$ oc get operatorcondition
NAME                  AGE
etcdoperator.v0.9.2   13s

2, Patch the `spec.conditions[0].Upgradeable` of operatorcondition to False. Looks good.

[cloud-user@preserve-olm-env jian]$ oc patch operatorcondition etcdoperator.v0.9.2 -p '{"spec":{"conditions":[{"type":"Upgradeable", "observedGeneration":1,"status":"False","reason":"bug","message":"not ready","lastUpdateTime":"2021-06-16T16:56:44Z","lastTransitionTime":"2021-06-16T16:56:44Z"}]}}' --type=merge
operatorcondition.operators.coreos.com/etcdoperator.v0.9.2 patched

[cloud-user@preserve-olm-env jian]$ oc get operatorcondition etcdoperator.v0.9.2 -o yaml
apiVersion: operators.coreos.com/v2
kind: OperatorCondition
metadata:
  creationTimestamp: "2021-06-22T07:30:54Z"
  generation: 2
  labels:
    operators.coreos.com/etcd.default: ""
  name: etcdoperator.v0.9.2
  namespace: default
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: true
    kind: ClusterServiceVersion
    name: etcdoperator.v0.9.2
    uid: b2b2e7a5-05bf-4ccd-8ad7-24415b3cf2bf
  resourceVersion: "513365"
  uid: c2a1f7c0-a348-4c83-ab4b-a852aa34bf98
spec:
  conditions:
  - lastTransitionTime: "2021-06-16T16:56:44Z"
    message: not ready
    observedGeneration: 1
    reason: bug
    status: "False"
    type: Upgradeable
  deployments:
  - etcd-operator
  serviceAccounts:
  - etcd-operator
status:
  conditions:
  - lastTransitionTime: "2021-06-16T16:56:44Z"
    message: not ready
    observedGeneration: 2
    reason: bug
    status: "False"
    type: Upgradeable

3, Approve the etcdoperator.v0.9.4.
[cloud-user@preserve-olm-env jian]$ oc edit ip install-r5sqx 
installplan.operators.coreos.com/install-r5sqx edited
[cloud-user@preserve-olm-env jian]$ 
[cloud-user@preserve-olm-env jian]$ oc get ip
NAME            CSV                   APPROVAL   APPROVED
install-l9nd7   etcdoperator.v0.9.2   Manual     true
install-r5sqx   etcdoperator.v0.9.4   Manual     true
[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                              DISPLAY                            VERSION    REPLACES              PHASE
elasticsearch-operator.5.1.0-63   OpenShift Elasticsearch Operator   5.1.0-63                         Succeeded
etcdoperator.v0.9.2               etcd                               0.9.2      etcdoperator.v0.9.0   Replacing
etcdoperator.v0.9.4               etcd                               0.9.4      etcdoperator.v0.9.2   Pending

[cloud-user@preserve-olm-env jian]$ oc get csv etcdoperator.v0.9.4 -o yaml
...
status:
  cleanup: {}
  conditions:
  - lastTransitionTime: "2021-06-22T07:35:22Z"
    lastUpdateTime: "2021-06-22T07:35:22Z"
    message: requirements not yet checked
    phase: Pending
    reason: RequirementsUnknown
  - lastTransitionTime: "2021-06-22T07:35:22Z"
    lastUpdateTime: "2021-06-22T07:35:22Z"
    message: 'operator is not upgradeable: The operator is not upgradeable: not ready'
    phase: Pending
    reason: OperatorConditionNotUpgradeable
  lastTransitionTime: "2021-06-22T07:35:22Z"
  lastUpdateTime: "2021-06-22T07:35:22Z"
  message: 'operator is not upgradeable: The operator is not upgradeable: not ready'
  phase: Pending
  reason: OperatorConditionNotUpgradeable

4, Patch the `spec.conditions[0].Upgradeable` of operatorcondition to True. etcdoperator.v0.9.2 can be upgraded to etcdoperator.v0.9.4, looks good.
[cloud-user@preserve-olm-env jian]$ oc patch operatorcondition etcdoperator.v0.9.2 -p '{"spec":{"conditions":[{"type":"Upgradeable", "observedGeneration":1,"status":"True","reason":"bug","message":"now ready","lastUpdateTime":"2021-06-16T16:56:44Z","lastTransitionTime":"2021-06-16T16:56:44Z"}]}}' --type=merge
operatorcondition.operators.coreos.com/etcdoperator.v0.9.2 patched
[cloud-user@preserve-olm-env jian]$ 
[cloud-user@preserve-olm-env jian]$ 
[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                              DISPLAY                            VERSION    REPLACES              PHASE
elasticsearch-operator.5.1.0-63   OpenShift Elasticsearch Operator   5.1.0-63                         Succeeded
etcdoperator.v0.9.2               etcd                               0.9.2      etcdoperator.v0.9.0   Replacing
etcdoperator.v0.9.4               etcd                               0.9.4      etcdoperator.v0.9.2   Installing
[cloud-user@preserve-olm-env jian]$ oc get operatorcondition
NAME                  AGE
etcdoperator.v0.9.4   3m18s
[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                              DISPLAY                            VERSION    REPLACES              PHASE
elasticsearch-operator.5.1.0-63   OpenShift Elasticsearch Operator   5.1.0-63                         Succeeded
etcdoperator.v0.9.4               etcd                               0.9.4      etcdoperator.v0.9.2   Succeeded


As I mentioned in comment 14, I also test the `spec.overrides`, but, it doesn't work.

test steps:

repeat step 1-3,

[cloud-user@preserve-olm-env jian]$ oc create -f sub-etcd.yaml 
subscription.operators.coreos.com/etcd created
[cloud-user@preserve-olm-env jian]$ oc get sub
NAME   PACKAGE   SOURCE                CHANNEL
etcd   etcd      community-operators   singlenamespace-alpha
[cloud-user@preserve-olm-env jian]$ oc get ip
NAME            CSV                   APPROVAL   APPROVED
install-wvstp   etcdoperator.v0.9.2   Manual     false
[cloud-user@preserve-olm-env jian]$ oc edit ip install-wvstp 
installplan.operators.coreos.com/install-wvstp edited
[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                              DISPLAY                            VERSION    REPLACES              PHASE
elasticsearch-operator.5.1.0-63   OpenShift Elasticsearch Operator   5.1.0-63                         Succeeded
etcdoperator.v0.9.2               etcd                               0.9.2      etcdoperator.v0.9.0   Installing
[cloud-user@preserve-olm-env jian]$ oc get ip
NAME            CSV                   APPROVAL   APPROVED
install-6nh9s   etcdoperator.v0.9.4   Manual     false
install-wvstp   etcdoperator.v0.9.2   Manual     true
[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                              DISPLAY                            VERSION    REPLACES              PHASE
elasticsearch-operator.5.1.0-63   OpenShift Elasticsearch Operator   5.1.0-63                         Succeeded
etcdoperator.v0.9.2               etcd                               0.9.2      etcdoperator.v0.9.0   Succeeded
[cloud-user@preserve-olm-env jian]$ oc patch operatorcondition etcdoperator.v0.9.2 -p '{"spec":{"conditions":[{"type":"Upgradeable", "observedGeneration":1,"status":"False","reason":"bug","message":"not ready","lastUpdateTime":"2021-06-16T16:56:44Z","lastTransitionTime":"2021-06-16T16:56:44Z"}]}}' --type=merge
operatorcondition.operators.coreos.com/etcdoperator.v0.9.2 patched

[cloud-user@preserve-olm-env jian]$ oc edit ip install-6nh9s 
installplan.operators.coreos.com/install-6nh9s edited
[cloud-user@preserve-olm-env jian]$ 
[cloud-user@preserve-olm-env jian]$ 
[cloud-user@preserve-olm-env jian]$ oc get ip
NAME            CSV                   APPROVAL   APPROVED
install-6nh9s   etcdoperator.v0.9.4   Manual     true
install-wvstp   etcdoperator.v0.9.2   Manual     true
[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                              DISPLAY                            VERSION    REPLACES              PHASE
elasticsearch-operator.5.1.0-63   OpenShift Elasticsearch Operator   5.1.0-63                         Succeeded
etcdoperator.v0.9.2               etcd                               0.9.2      etcdoperator.v0.9.0   Replacing
etcdoperator.v0.9.4               etcd                               0.9.4      etcdoperator.v0.9.2   Pending

4, Overrides it and sets the Upgradeable to True. But, the `status.conditions` still in `False`, `spec.overrides` cannot overrides the `status.conditions`. Is it as expected?

[cloud-user@preserve-olm-env jian]$ oc patch operatorcondition etcdoperator.v0.9.2 -p '{"spec":{"overrides":[{"type":"Upgradeable", "status":"True","reason":"bug","message":"now ready"}]}}' --type=merge
operatorcondition.operators.coreos.com/etcdoperator.v0.9.2 patched

[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                              DISPLAY                            VERSION    REPLACES              PHASE
elasticsearch-operator.5.1.0-63   OpenShift Elasticsearch Operator   5.1.0-63                         Succeeded
etcdoperator.v0.9.2               etcd                               0.9.2      etcdoperator.v0.9.0   Replacing
etcdoperator.v0.9.4               etcd                               0.9.4      etcdoperator.v0.9.2   Installing
[cloud-user@preserve-olm-env jian]$ oc get pods
NAME                             READY   STATUS              RESTARTS   AGE
etcd-operator-6fc7cf7dd9-b64fp   3/3     Running             0          4m28s
etcd-operator-f49544fdd-6bjct    0/3     ContainerCreating   0          67s

[cloud-user@preserve-olm-env jian]$ oc get operatorcondition etcdoperator.v0.9.2 -o yaml
apiVersion: operators.coreos.com/v2
kind: OperatorCondition
metadata:
  creationTimestamp: "2021-06-22T07:43:08Z"
  generation: 3
  labels:
    operators.coreos.com/etcd.default: ""
  name: etcdoperator.v0.9.2
  namespace: default
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: true
    kind: ClusterServiceVersion
    name: etcdoperator.v0.9.2
    uid: 221b55d2-3f19-42e2-abe7-1b0851710a87
  resourceVersion: "525586"
  uid: 9f271bdc-ab10-426b-8614-9a330195b900
spec:
  conditions:
  - lastTransitionTime: "2021-06-16T16:56:44Z"
    message: not ready
    observedGeneration: 1
    reason: bug
    status: "False"
    type: Upgradeable
  deployments:
  - etcd-operator
  overrides:
  - message: now ready
    reason: bug
    status: "True"
    type: Upgradeable
  serviceAccounts:
  - etcd-operator
status:
  conditions:
  - lastTransitionTime: "2021-06-16T16:56:44Z"
    message: not ready
    observedGeneration: 3
    reason: bug
    status: "False"
    type: Upgradeable

Comment 17 Jian Zhang 2021-06-22 07:53:58 UTC
The `status.conditions[0].status` is `Flase`, however, it's in upgrading, and can be upgraded successfully.

Comment 18 Vu Dinh 2021-06-22 14:09:59 UTC
Hi Jian,

I updated the doc. The correct field is `ObservedGeneration`. The `spec.overrides` doesn't change the `spec.conditions`. It simply allows the OLM to proceed/block the upgrade and override the current conditions setting. So what you are seeing is accurate as the condition is False but you set override to True and the upgrade proceeds accordingly.

Comment 19 Jian Zhang 2021-06-25 08:19:31 UTC
Hi Vu,

> The `spec.overrides` doesn't change the `spec.conditions`. It simply allows the OLM to proceed/block the upgrade and override the current conditions setting.

Yeah, but since the `status` is for reflecting the real-time status, the `spec.overrides` should change the `spec.status` at least.
Otherwise, it's confusing for the users. What do you think?
For this bug, I verified it since the `spec.conditions` works well.

Comment 21 Vu Dinh 2021-06-25 16:29:48 UTC
Hey Jian,

I can see the reasoning behind what you said. I think an improvement can be made to project the conditions in the overrides to the status as well.

Thanks,
Vu

Comment 23 errata-xmlrpc 2021-07-27 22:43:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.