Bug 1861364 - Update to 4.4.2 build has not started on the OCP env with OLM fix
Summary: Update to 4.4.2 build has not started on the OCP env with OLM fix
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.4
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: OCS 4.5.0
Assignee: Vu Dinh
QA Contact: Petr Balogh
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-28 12:17 UTC by Petr Balogh
Modified: 2020-09-15 10:19 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-15 10:18:25 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:3754 0 None None None 2020-09-15 10:18:59 UTC

Description Petr Balogh 2020-07-28 12:17:09 UTC
Description of problem (please be detailed as possible and provide log
snippests):
We don't see the upgrade started when we are upgrading from 4.4.1 (live content) to 4.4.2 internal build.

We still see:
$ oc get csv -n openshift-storage
NAME                            DISPLAY                       VERSION   REPLACES   PHASE
lib-bucket-provisioner.v1.0.0   lib-bucket-provisioner        1.0.0                Succeeded
ocs-operator.v4.4.1             OpenShift Container Storage   4.4.1                Succeeded

In OLM I see this error:
time="2020-07-28T11:52:24Z" level=info msg="error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"packageserver\": the object has been modified; please apply your changes to the latest version and try again" csv=packageserver id=dqpnY namespace=openshift-operator-lifecycle-manager phase=Succeeded
E0728 11:52:24.994588       1 queueinformer_operator.go:290] sync {"update" "openshift-operator-lifecycle-manager/packageserver"} failed: error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com "packageserver": the object has been modified; please apply your changes to the latest version and try again
time="2020-07-28T11:52:24Z" level=info msg="checking packageserver"
time="2020-07-28T11:52:24Z" level=info msg="updated labels" csv=packageserver labels="olm.api.4bca9f23e412d79d=provided,olm.clusteroperator.name=operator-lifecycle-manager-packageserver,olm.version=0.14.2" ns=openshift-operator-lifecycle-manager
time="2020-07-28T11:52:25Z" level=info msg="operatorgroup incorrect" csv=packageserver error="<nil>" id=gWd3H namespace=openshift-operator-lifecycle-manager phase=Succeeded
time="2020-07-28T11:52:25Z" level=warning msg="error adding operatorgroup annotations" csv=packageserver error="Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"packageserver\": the object has been modified; please apply your changes to the latest version and try again" namespace=openshift-operator-lifecycle-manager operatorGroup=olm-operators
time="2020-07-28T11:52:25Z" level=warning msg="failed to annotate CSVs in operatorgroup after group change" error="Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"packageserver\": the object has been modified; please apply your changes to the latest version and try again" namespace=openshift-operator-lifecycle-manager operatorGroup=olm-operators
E0728 11:52:25.033435       1 queueinformer_operator.go:290] sync {"update" "openshift-operator-lifecycle-manager/olm-operators"} failed: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com "packageserver": the object has been modified; please apply your changes to the latest version and try again
time="2020-07-28T11:52:25Z" level=info msg="error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"packageserver\": the object has been modified; please apply your changes to the latest version and try again" csv=packageserver id=Ol62h namespace=openshift-operator-lifecycle-manager phase=Succeeded
time="2020-07-28T11:52:25Z" level=info msg="not part of any operatorgroup, no annotations" csv=packageserver id=Ol62h namespace=openshift-operator-lifecycle-manager phase=Succeeded
E0728 11:52:25.033629       1 queueinformer_operator.go:290] sync {"update" "openshift-operator-lifecycle-manager/packageserver"} failed: error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com "packageserver": the object has been modified; please apply your changes to the latest version and try again


The similar issue was observed with OCS installed on OCP cluster without fix, then upgrade OCP to build with OLM fix. But in this scenario we saw that LBP

But that time we saw that lib-bucket-provisioner was V2. I will attach logs to this execution at the end of this BZ as well.

For both mentioned scenarios we did not apply any workaround from KCS.

Version of all relevant components (if applicable):
OCS - 4.4.2-503.ci
OCP: 4.4.0-0.nightly-2020-07-25-091418


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, cannot upgrade


Is there any workaround available to the best of your knowledge?
Uninstall LBP operator will start upgrade.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Haven't tried but most likely we can

If this is a regression, please provide more details to justify this:
Yes, it worked before.

Steps to Reproduce:
1. Install OCP with OLM fix
2. Install OCS 4.4.1
3. Add custom catalog to internal 4.4.2 build
4. Change subscription source to point to custom catalog.


Actual results:
We don't see any upgrade started.
From UI I see message: Upgrade available for both operators, LBP and OCS.

Expected results:
Upgrade will start rolling automatically

Additional info:
Logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-ai3c33-ua/jnk-ai3c33-ua_20200728T093056/logs/failed_testcase_ocs_logs_1595932084/test_upgrade_ocs_logs/
Job: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/10349/

For another scenario with OCP without fix + upgrade to OCP with OLM fix:
Job: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/10315/console
Logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/pbalogh-3u-lbp/pbalogh-3u-lbp_20200727T161208/logs/failed_testcase_ocs_logs_1595870760/test_upgrade_ocs_logs/

Comment 5 Jose A. Rivera 2020-08-03 17:03:09 UTC
Since this is being worked on outside of OCS, moving this to MODIFIED. Vu, I will assign this to you for now since you are the one actually working on the fix in OCP.

Comment 8 Mudit Agarwal 2020-08-11 11:21:33 UTC
Can this be moved to ON_QA now?

Comment 9 Petr Balogh 2020-08-24 13:33:52 UTC
I think that this was an OLM issue and opened in OCS as a tracker.
We already tested this in OCS 4.4.2 upgrade with applying KCS and it worked for QE so we can even mark as verified I think but let's go with ON_QE and then verified? But QE should not move bugs to ON_QE be themself I guess.

Comment 10 Mudit Agarwal 2020-08-24 13:48:02 UTC
Thanks Petr, I am moving it to ON_QA

Comment 11 Petr Balogh 2020-08-24 13:59:53 UTC
IIRC QE already tested this with KCS https://access.redhat.com/solutions/5319681 as this was the OLM issue. This was tested before OCS 4.4.2 GA.  But I covered previous version of KCS, now I see there are new changes in the KCS since last week but this new KCS was probably tested by Ashish.

As I understand there was nothing done from OCS side even in 4.4.2 and even for 4.5 .

This was mainly as OCS tracker but the issue was in OLM and with the solution we suggested in first version of KCS.

@Vu, can you please confirm if I didn't miss anything?

Comment 12 Vu Dinh 2020-09-02 18:25:11 UTC
Hi Petr,

So the underlying issue for this problem is the InstallPlan generation bug (https://bugzilla.redhat.com/show_bug.cgi?id=1869717). The PR is still pending merge for 4.4.z. It was approved last week but for some it wasn't passing CI. Hopefully, it will get in this week and available in the next s-stream release for 4.4.

The KCS looks good to me as Ashish has been using it to address customers' issues.

This is OLM issue so there is nothing on OCS that needs to be done besides using KCS to address customer cases in the meantime.

Thanks,
Vu

Comment 13 Petr Balogh 2020-09-03 06:57:42 UTC
Thanks Vu. Based on Vu's reply I am marking this as verified as nothing to do from OCS side.

Comment 16 errata-xmlrpc 2020-09-15 10:18:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3754


Note You need to log in before you can comment on or make changes to this bug.