Description of problem (please be detailed as possible and provide log snippests): We don't see the upgrade started when we are upgrading from 4.4.1 (live content) to 4.4.2 internal build. We still see: $ oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE lib-bucket-provisioner.v1.0.0 lib-bucket-provisioner 1.0.0 Succeeded ocs-operator.v4.4.1 OpenShift Container Storage 4.4.1 Succeeded In OLM I see this error: time="2020-07-28T11:52:24Z" level=info msg="error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"packageserver\": the object has been modified; please apply your changes to the latest version and try again" csv=packageserver id=dqpnY namespace=openshift-operator-lifecycle-manager phase=Succeeded E0728 11:52:24.994588 1 queueinformer_operator.go:290] sync {"update" "openshift-operator-lifecycle-manager/packageserver"} failed: error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com "packageserver": the object has been modified; please apply your changes to the latest version and try again time="2020-07-28T11:52:24Z" level=info msg="checking packageserver" time="2020-07-28T11:52:24Z" level=info msg="updated labels" csv=packageserver labels="olm.api.4bca9f23e412d79d=provided,olm.clusteroperator.name=operator-lifecycle-manager-packageserver,olm.version=0.14.2" ns=openshift-operator-lifecycle-manager time="2020-07-28T11:52:25Z" level=info msg="operatorgroup incorrect" csv=packageserver error="<nil>" id=gWd3H namespace=openshift-operator-lifecycle-manager phase=Succeeded time="2020-07-28T11:52:25Z" level=warning msg="error adding operatorgroup annotations" csv=packageserver error="Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"packageserver\": the object has been modified; please apply your changes to the latest version and try again" namespace=openshift-operator-lifecycle-manager operatorGroup=olm-operators time="2020-07-28T11:52:25Z" level=warning msg="failed to annotate CSVs in operatorgroup after group change" error="Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"packageserver\": the object has been modified; please apply your changes to the latest version and try again" namespace=openshift-operator-lifecycle-manager operatorGroup=olm-operators E0728 11:52:25.033435 1 queueinformer_operator.go:290] sync {"update" "openshift-operator-lifecycle-manager/olm-operators"} failed: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com "packageserver": the object has been modified; please apply your changes to the latest version and try again time="2020-07-28T11:52:25Z" level=info msg="error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"packageserver\": the object has been modified; please apply your changes to the latest version and try again" csv=packageserver id=Ol62h namespace=openshift-operator-lifecycle-manager phase=Succeeded time="2020-07-28T11:52:25Z" level=info msg="not part of any operatorgroup, no annotations" csv=packageserver id=Ol62h namespace=openshift-operator-lifecycle-manager phase=Succeeded E0728 11:52:25.033629 1 queueinformer_operator.go:290] sync {"update" "openshift-operator-lifecycle-manager/packageserver"} failed: error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com "packageserver": the object has been modified; please apply your changes to the latest version and try again The similar issue was observed with OCS installed on OCP cluster without fix, then upgrade OCP to build with OLM fix. But in this scenario we saw that LBP But that time we saw that lib-bucket-provisioner was V2. I will attach logs to this execution at the end of this BZ as well. For both mentioned scenarios we did not apply any workaround from KCS. Version of all relevant components (if applicable): OCS - 4.4.2-503.ci OCP: 4.4.0-0.nightly-2020-07-25-091418 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, cannot upgrade Is there any workaround available to the best of your knowledge? Uninstall LBP operator will start upgrade. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Yes Can this issue reproduce from the UI? Haven't tried but most likely we can If this is a regression, please provide more details to justify this: Yes, it worked before. Steps to Reproduce: 1. Install OCP with OLM fix 2. Install OCS 4.4.1 3. Add custom catalog to internal 4.4.2 build 4. Change subscription source to point to custom catalog. Actual results: We don't see any upgrade started. From UI I see message: Upgrade available for both operators, LBP and OCS. Expected results: Upgrade will start rolling automatically Additional info: Logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-ai3c33-ua/jnk-ai3c33-ua_20200728T093056/logs/failed_testcase_ocs_logs_1595932084/test_upgrade_ocs_logs/ Job: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/10349/ For another scenario with OCP without fix + upgrade to OCP with OLM fix: Job: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/10315/console Logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/pbalogh-3u-lbp/pbalogh-3u-lbp_20200727T161208/logs/failed_testcase_ocs_logs_1595870760/test_upgrade_ocs_logs/
Since this is being worked on outside of OCS, moving this to MODIFIED. Vu, I will assign this to you for now since you are the one actually working on the fix in OCP.
Can this be moved to ON_QA now?
I think that this was an OLM issue and opened in OCS as a tracker. We already tested this in OCS 4.4.2 upgrade with applying KCS and it worked for QE so we can even mark as verified I think but let's go with ON_QE and then verified? But QE should not move bugs to ON_QE be themself I guess.
Thanks Petr, I am moving it to ON_QA
IIRC QE already tested this with KCS https://access.redhat.com/solutions/5319681 as this was the OLM issue. This was tested before OCS 4.4.2 GA. But I covered previous version of KCS, now I see there are new changes in the KCS since last week but this new KCS was probably tested by Ashish. As I understand there was nothing done from OCS side even in 4.4.2 and even for 4.5 . This was mainly as OCS tracker but the issue was in OLM and with the solution we suggested in first version of KCS. @Vu, can you please confirm if I didn't miss anything?
Hi Petr, So the underlying issue for this problem is the InstallPlan generation bug (https://bugzilla.redhat.com/show_bug.cgi?id=1869717). The PR is still pending merge for 4.4.z. It was approved last week but for some it wasn't passing CI. Hopefully, it will get in this week and available in the next s-stream release for 4.4. The KCS looks good to me as Ashish has been using it to address customers' issues. This is OLM issue so there is nothing on OCS that needs to be done besides using KCS to address customer cases in the meantime. Thanks, Vu
Thanks Vu. Based on Vu's reply I am marking this as verified as nothing to do from OCS side.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3754