Bug 1819457 - Package Server is in 'Cannot update' status despite properly working [NEEDINFO]
Summary: Package Server is in 'Cannot update' status despite properly working
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Evan Cordell
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks: 1896051
TreeView+ depends on / blocked
 
Reported: 2020-03-31 21:07 UTC by Pedro Amoedo
Modified: 2021-02-24 15:11 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:10:58 UTC
Target Upstream Version:
agawand: needinfo? (ecordell)
oarribas: needinfo? (ecordell)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-lifecycle-manager pull 1855 0 None closed Bug 1819457: Services should not have duplicate ownerrefs 2021-02-16 13:42:08 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:11:53 UTC

Description Pedro Amoedo 2020-03-31 21:07:24 UTC
Description of problem:

An OCP 4.3 cluster was freshly installed over OpenStack 13 using IPI method. 
After that, certificates for API and *.apps were installed according to the documentation.
No other noticeable configuration actions were performed.

Operator Package Server (0.13.0) is in 'Cannot update' status.
Event log contains multiple installation attempts and events "APIServices not installed". 
At the same time, Package Server appears to be fully operational. Nothing suspicious in the pods, operator manifests properly loaded and rendered from all sources.

OLM pod log has multiple entries:

~~~
```time="2020-03-21T02:06:49Z" level=info msg="checking packageserver"
time="2020-03-21T02:06:49Z" level=info msg="Labels updated!" labels="olm.api.4bca9f23e412d79d=provided,olm.clusteroperator.name=operator-lifecycle-manager-packageserver,olm.version=0.13.0"
time="2020-03-21T02:06:49Z" level=warning msg="issue ensuring csv api labels" csv=packageserver error="Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"packageserver\": the object has been modified; please apply your changes to the latest version and try again" id=JQ2Pt namespace=openshift-operator-lifecycle-manager phase=Succeeded
time="2020-03-21T02:06:49Z" level=info msg="not part of any operatorgroup, no annotations" csv=packageserver id=4VpPC namespace=openshift-operator-lifecycle-manager phase=Succeeded
E0321 02:06:49.225975       1 queueinformer_operator.go:282] sync {"update" "openshift-operator-lifecycle-manager/packageserver"} failed: error transitioning ClusterServiceVersion: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com "packageserver": the object has been modified; please apply your changes to the latest version and try again and error updating CSV status: error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com "packageserver": the object has been modified; please apply your changes to the latest version and try again```
~~~

`v1-packages-operators-coreos-com` service has multiple ownerReferences with identical uid, the number grows over time.

Version-Release number of selected component (if applicable):

OCP 4.3.8
Package Server (0.13.0)

How reproducible:
Unknown (fresh install)

Steps to Reproduce:
1. OSP 13
2. OCP 4.3.x IPI

Actual results:

Strange operator behavior.

Expected results:

Avoid those error messages if the operator is properly running.

Additional info:

There is a related KCS[1] that states the following:

~~~
The Package Server Operator is applied as a ClusterServiceVersion that Operator Lifecycle Manager does install directly without a Subscription or CatalogSource. The application of this ClusterServiceVersion is handled by ClusterVersionOperator, so upgrades to it occur with regular Red Hat OpenShift Container Platform - update and won't be an issue.
~~~

[1] - https://access.redhat.com/solutions/4937981

Comment 14 Fran Kemp 2020-05-19 18:14:40 UTC
Having a very similar situation to this on an OpenShift cluster installed using IPI on the IBM Cloud last week.

OCP 4.3.12 was installed initially - since upgraded to 4.3.18 and seems to be operating normally otherwise.

There is 1 thing I'm seeing that is not listed in this bug:  The Control Plane is listed as "Not Available" on the Administrator Dashboard.
Is that caused by this bug also or do I have another problem? 

Thanks

Comment 15 Evan Cordell 2020-05-20 19:34:47 UTC
>  The Control Plane is listed as "Not Available" on the Administrator Dashboard.

That indicates a more fundamental problem - in general, cluster components being unavailable (discovery in particular) can cause packageserver to report unhealthy, because OLM uses discovery to confirm the health of the apiservice.

Given that the objects in the current report show that the packageserver is healthy and we haven't heard back, I'm going to close this again. Please re-open with additional information describing the failure if one still exists (it is not apparent that there is an error from the current report).

Comment 33 Jian Zhang 2020-11-11 06:34:28 UTC
I find the fixed PR only for the packageserver ownerReferences issue. And, only one ownerReferences uid, LGTM, I'm going to verify this 4.7 bug. 

1, Create the 4.7 cluster
[root@preserve-olm-env data]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-11-11-033756   True        False         78m     Cluster version is 4.7.0-0.nightly-2020-11-11-033756
[root@preserve-olm-env data]# oc adm release info registry.svc.ci.openshift.org/ocp/release:4.7.0-0.nightly-2020-11-11-033756 --commits|grep lifecycle
  operator-lifecycle-manager                     https://github.com/operator-framework/operator-lifecycle-manager            161c86b215ceae325d7bf8f7f351406a0303ca27

2, Check the packageserver service
[root@preserve-olm-env data]# oc api-resources|grep packages
packagemanifests                                       packages.operators.coreos.com         true         PackageManifest

[root@preserve-olm-env data]# oc get service packageserver-service -n openshift-operator-lifecycle-manager -o yaml
apiVersion: v1
kind: Service
metadata:
...
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: false
    kind: ClusterServiceVersion
    name: packageserver
    uid: 443dba6c-1561-4e30-8faa-f4c0fdd4b9d3
  resourceVersion: "8082"
  selfLink: /api/v1/namespaces/openshift-operator-lifecycle-manager/services/packageserver-service
  uid: 48e27560-80b7-4116-a6d3-dac24cce9bcc

[root@preserve-olm-env data]# oc get csv -n openshift-operator-lifecycle-manager packageserver -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: ClusterServiceVersion
...
  name: packageserver
  namespace: openshift-operator-lifecycle-manager
  resourceVersion: "59704"
  selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-operator-lifecycle-manager/clusterserviceversions/packageserver
  uid: 443dba6c-1561-4e30-8faa-f4c0fdd4b9d3
...

3, The upgrade to 4.7 from 4.6 works well. I couldn't reproduce the origin issue. Please let me if I missed something, thanks!

Comment 36 errata-xmlrpc 2021-02-24 15:10:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.