Bug 1819457

Summary: Package Server is in 'Cannot update' status despite properly working
Product: OpenShift Container Platform Reporter: Pedro Amoedo <pamoedom>
Component: OLMAssignee: Evan Cordell <ecordell>
OLM sub component: OLM QA Contact: Jian Zhang <jiazha>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: agawand, dageoffr, ecordell, francis.kemp, krizza, nhale, oarribas, pamoedom
Version: 4.3.zKeywords: Reopened, UpcomingSprint
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:10:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1896051    

Description Pedro Amoedo 2020-03-31 21:07:24 UTC
Description of problem:

An OCP 4.3 cluster was freshly installed over OpenStack 13 using IPI method. 
After that, certificates for API and *.apps were installed according to the documentation.
No other noticeable configuration actions were performed.

Operator Package Server (0.13.0) is in 'Cannot update' status.
Event log contains multiple installation attempts and events "APIServices not installed". 
At the same time, Package Server appears to be fully operational. Nothing suspicious in the pods, operator manifests properly loaded and rendered from all sources.

OLM pod log has multiple entries:

~~~
```time="2020-03-21T02:06:49Z" level=info msg="checking packageserver"
time="2020-03-21T02:06:49Z" level=info msg="Labels updated!" labels="olm.api.4bca9f23e412d79d=provided,olm.clusteroperator.name=operator-lifecycle-manager-packageserver,olm.version=0.13.0"
time="2020-03-21T02:06:49Z" level=warning msg="issue ensuring csv api labels" csv=packageserver error="Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"packageserver\": the object has been modified; please apply your changes to the latest version and try again" id=JQ2Pt namespace=openshift-operator-lifecycle-manager phase=Succeeded
time="2020-03-21T02:06:49Z" level=info msg="not part of any operatorgroup, no annotations" csv=packageserver id=4VpPC namespace=openshift-operator-lifecycle-manager phase=Succeeded
E0321 02:06:49.225975       1 queueinformer_operator.go:282] sync {"update" "openshift-operator-lifecycle-manager/packageserver"} failed: error transitioning ClusterServiceVersion: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com "packageserver": the object has been modified; please apply your changes to the latest version and try again and error updating CSV status: error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com "packageserver": the object has been modified; please apply your changes to the latest version and try again```
~~~

`v1-packages-operators-coreos-com` service has multiple ownerReferences with identical uid, the number grows over time.

Version-Release number of selected component (if applicable):

OCP 4.3.8
Package Server (0.13.0)

How reproducible:
Unknown (fresh install)

Steps to Reproduce:
1. OSP 13
2. OCP 4.3.x IPI

Actual results:

Strange operator behavior.

Expected results:

Avoid those error messages if the operator is properly running.

Additional info:

There is a related KCS[1] that states the following:

~~~
The Package Server Operator is applied as a ClusterServiceVersion that Operator Lifecycle Manager does install directly without a Subscription or CatalogSource. The application of this ClusterServiceVersion is handled by ClusterVersionOperator, so upgrades to it occur with regular Red Hat OpenShift Container Platform - update and won't be an issue.
~~~

[1] - https://access.redhat.com/solutions/4937981

Comment 14 Fran Kemp 2020-05-19 18:14:40 UTC
Having a very similar situation to this on an OpenShift cluster installed using IPI on the IBM Cloud last week.

OCP 4.3.12 was installed initially - since upgraded to 4.3.18 and seems to be operating normally otherwise.

There is 1 thing I'm seeing that is not listed in this bug:  The Control Plane is listed as "Not Available" on the Administrator Dashboard.
Is that caused by this bug also or do I have another problem? 

Thanks

Comment 15 Evan Cordell 2020-05-20 19:34:47 UTC
>  The Control Plane is listed as "Not Available" on the Administrator Dashboard.

That indicates a more fundamental problem - in general, cluster components being unavailable (discovery in particular) can cause packageserver to report unhealthy, because OLM uses discovery to confirm the health of the apiservice.

Given that the objects in the current report show that the packageserver is healthy and we haven't heard back, I'm going to close this again. Please re-open with additional information describing the failure if one still exists (it is not apparent that there is an error from the current report).

Comment 33 Jian Zhang 2020-11-11 06:34:28 UTC
I find the fixed PR only for the packageserver ownerReferences issue. And, only one ownerReferences uid, LGTM, I'm going to verify this 4.7 bug. 

1, Create the 4.7 cluster
[root@preserve-olm-env data]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-11-11-033756   True        False         78m     Cluster version is 4.7.0-0.nightly-2020-11-11-033756
[root@preserve-olm-env data]# oc adm release info registry.svc.ci.openshift.org/ocp/release:4.7.0-0.nightly-2020-11-11-033756 --commits|grep lifecycle
  operator-lifecycle-manager                     https://github.com/operator-framework/operator-lifecycle-manager            161c86b215ceae325d7bf8f7f351406a0303ca27

2, Check the packageserver service
[root@preserve-olm-env data]# oc api-resources|grep packages
packagemanifests                                       packages.operators.coreos.com         true         PackageManifest

[root@preserve-olm-env data]# oc get service packageserver-service -n openshift-operator-lifecycle-manager -o yaml
apiVersion: v1
kind: Service
metadata:
...
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: false
    kind: ClusterServiceVersion
    name: packageserver
    uid: 443dba6c-1561-4e30-8faa-f4c0fdd4b9d3
  resourceVersion: "8082"
  selfLink: /api/v1/namespaces/openshift-operator-lifecycle-manager/services/packageserver-service
  uid: 48e27560-80b7-4116-a6d3-dac24cce9bcc

[root@preserve-olm-env data]# oc get csv -n openshift-operator-lifecycle-manager packageserver -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: ClusterServiceVersion
...
  name: packageserver
  namespace: openshift-operator-lifecycle-manager
  resourceVersion: "59704"
  selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-operator-lifecycle-manager/clusterserviceversions/packageserver
  uid: 443dba6c-1561-4e30-8faa-f4c0fdd4b9d3
...

3, The upgrade to 4.7 from 4.6 works well. I couldn't reproduce the origin issue. Please let me if I missed something, thanks!

Comment 36 errata-xmlrpc 2021-02-24 15:10:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Comment 37 Red Hat Bugzilla 2023-09-18 00:20:39 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days