Bug 1714140 - [4.2]The generated InstallPlan object didn't refer to itself subscription object
Summary: [4.2]The generated InstallPlan object didn't refer to itself subscription object
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.2.0
Assignee: Evan Cordell
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks: 1740174
TreeView+ depends on / blocked
 
Reported: 2019-05-27 08:40 UTC by Jian Zhang
Modified: 2019-10-16 06:29 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1740174 (view as bug list)
Environment:
Last Closed: 2019-10-16 06:29:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
The whole catalog-operator logs (152.58 KB, text/plain)
2019-05-27 08:48 UTC, Jian Zhang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-lifecycle-manager pull 965 0 None None None 2019-07-31 18:25:33 UTC
Github operator-framework operator-lifecycle-manager pull 977 0 None None None 2019-08-05 14:44:00 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:29:35 UTC

Description Jian Zhang 2019-05-27 08:40:43 UTC
Description of problem:
Sometimes, the generated InstallPlan object refers to the wrong subscription object. And, no csv object generated.

Version-Release number of selected component (if applicable):
Cluster version is 4.1.0-0.nightly-2019-05-24-040103
OLM source commit.id=586ffaf57b5da9cc2301b01e2ea10ce6117928c9

How reproducible:
Sometimes, but we encountered this kind of issue several times recently.
Similar to bug 1702540

Steps to Reproduce:
1. Install the Couchbase, anchore-engine and TiDB operators, select the "All Namespaces ..." option.

2. Remove the Couchbase and anchore-engine operators.
3. Install the anchore-engine operator again, select the "All Namespaces ..." option.

Actual results:
There two problems here:
1) No anchore-engine operator csv object generated. And, its InstallPlan object only refers to the "TiDB" subscription, lack of itself subscription object.
mac:~ jianzhang$  oc get sub -n openshift-operators
NAME                      PACKAGE                   SOURCE                                    CHANNEL
anchore-engine            anchore-engine            installed-certified-openshift-operators   alpha
tidb-operator-certified   tidb-operator-certified   installed-certified-openshift-operators   beta
mac:~ jianzhang$  oc get ip -n openshift-operators
NAME            CSV                              SOURCE   APPROVAL    APPROVED
install-c5qpl   anchore-engine-operator.v0.0.1            Automatic   true
install-hhnx7   tidb-operator.1.0.0-beta1                 Automatic   true
install-sbbv9   anchore-engine-operator.v0.0.1            Automatic   true

No anchore-engine csv object generated:
mac:~ jianzhang$ oc get csv -n openshift-operators
NAME                        DISPLAY         VERSION       REPLACES   PHASE
tidb-operator.1.0.0-beta1   TiDB Operator   1.0.0-beta1              Succeeded

That two InstallPlan both only refer to the `tidb-operator-certified`, they should refer to the `anchore-engine` subscription object too.
mac:~ jianzhang$  oc get ip -n openshift-operators install-c5qpl -o yaml |grep ownerReferences: -A 9
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: false
    kind: Subscription
    name: tidb-operator-certified
    uid: ec38df40-803e-11e9-b9dc-02c6e457e9d6
  resourceVersion: "76521"
  selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-operators/installplans/install-c5qpl
  uid: 86d4ef99-8041-11e9-b9dc-02c6e457e9d6
mac:~ jianzhang$  oc get ip -n openshift-operators install-sbbv9 -o yaml |grep ownerReferences: -A 9
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: false
    kind: Subscription
    name: tidb-operator-certified
    uid: ec38df40-803e-11e9-b9dc-02c6e457e9d6
  resourceVersion: "79779"
  selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-operators/installplans/install-sbbv9
  uid: bdcce16d-8042-11e9-b9dc-02c6e457e9d6

2) The old InstallPlan wasn't be removed. And, it refers to two csv objects, is it as expected? In my personal opinion, the InstallPlan object should only refer to itself csv object.
And, the InstallPlan object still contains the Couchbase csv object, but no Couchbase sub/csv objects anymore in this cluster. 

mac:~ jianzhang$  oc get ip -n openshift-operators install-c5qpl -o yaml |grep clusterServiceVersionNames: -A 3
  clusterServiceVersionNames:
  - anchore-engine-operator.v0.0.1
  - couchbase-operator.v1.1.0
  source: ""


Expected results:
The operators' InstallPlan object should refer to itself subscription object and only contain itself csv object.  And its csv object should be generated successfully.

Additional info:
mac:~ jianzhang$ oc get catalogsource -n openshift-operators installed-certified-openshift-operators -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  creationTimestamp: "2019-05-27T05:17:05Z"
  generation: 10
  labels:
    csc-owner-name: installed-certified-openshift-operators
    csc-owner-namespace: openshift-marketplace
  name: installed-certified-openshift-operators
  namespace: openshift-operators
  resourceVersion: "80804"
  selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-operators/catalogsources/installed-certified-openshift-operators
  uid: aa9b8594-803e-11e9-bf5b-06c6297e18aa
spec:
  address: 172.30.70.66:50051
  displayName: Certified Operators
  icon:
    base64data: ""
    mediatype: ""
  publisher: Certified
  sourceType: grpc
status:
  lastSync: "2019-05-27T05:48:54Z"
  registryService:
    createdAt: "2019-05-27T05:48:54Z"
    protocol: grpc

mac:~ jianzhang$ oc get svc -n openshift-marketplace
NAME                                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
certified-operators                       ClusterIP   172.30.61.153    <none>        50051/TCP   4h57m
community-operators                       ClusterIP   172.30.112.217   <none>        50051/TCP   4h57m
installed-certified-default               ClusterIP   172.30.149.30    <none>        50051/TCP   4h2m
installed-certified-jian                  ClusterIP   172.30.246.18    <none>        50051/TCP   128m
installed-certified-openshift-operators   ClusterIP   172.30.70.66     <none>        50051/TCP   105m
installed-community-jian                  ClusterIP   172.30.71.92     <none>        50051/TCP   138m
installed-redhat-jian                     ClusterIP   172.30.67.141    <none>        50051/TCP   135m
redhat-operators                          ClusterIP   172.30.13.32     <none>        50051/TCP   4h57m


mac:~ jianzhang$ oc logs installed-certified-openshift-operators-65d48c4456-9mxwx -n  openshift-marketplace
time="2019-05-27T05:48:40Z" level=info msg="Using in-cluster kube client config" port=50051 type=appregistry
time="2019-05-27T05:48:40Z" level=info msg="operator source(s) specified are - [https://quay.io/cnr|certified-operators]" port=50051 type=appregistry
time="2019-05-27T05:48:40Z" level=info msg="package(s) specified are - tidb-operator-certified,anchore-engine" port=50051 type=appregistry
time="2019-05-27T05:48:40Z" level=info msg="input has been sanitized" port=50051 type=appregistry
time="2019-05-27T05:48:40Z" level=info msg="sources: [https://quay.io/cnr/certified-operators]" port=50051 type=appregistry
time="2019-05-27T05:48:40Z" level=info msg="packages: [tidb-operator-certified anchore-engine]" port=50051 type=appregistry
time="2019-05-27T05:48:41Z" level=info msg="resolved the following packages: [certified-operators/anchore-engine:1.0.0 certified-operators/tidb-operator-certified:1.0.0]" port=50051 type=appregistry
time="2019-05-27T05:48:41Z" level=info msg="downloading repository: certified-operators/anchore-engine:1.0.0 from https://quay.io/cnr" port=50051 type=appregistry
time="2019-05-27T05:48:41Z" level=info msg="downloading repository: certified-operators/tidb-operator-certified:1.0.0 from https://quay.io/cnr" port=50051 type=appregistry
time="2019-05-27T05:48:41Z" level=info msg="download complete - 2 repositories have been downloaded" port=50051 type=appregistry
time="2019-05-27T05:48:41Z" level=info msg="decoding the downloaded operator manifest(s)" port=50051 type=appregistry
time="2019-05-27T05:48:41Z" level=info msg="manifest format is - flattened" port=50051 repository="certified-operators/anchore-engine:1.0.0" type=appregistry
time="2019-05-27T05:48:41Z" level=info msg="decoded successfully" port=50051 repository="certified-operators/anchore-engine:1.0.0" type=appregistry
time="2019-05-27T05:48:41Z" level=info msg="manifest format is - flattened" port=50051 repository="certified-operators/tidb-operator-certified:1.0.0" type=appregistry
time="2019-05-27T05:48:41Z" level=info msg="decoded successfully" port=50051 repository="certified-operators/tidb-operator-certified:1.0.0" type=appregistry
time="2019-05-27T05:48:41Z" level=info msg="merging all flattened manifests into a single configmap 'data' section" port=50051 type=appregistry
time="2019-05-27T05:48:41Z" level=info msg="decoded 2 flattened and 0 nested operator manifest(s)" port=50051 type=appregistry
time="2019-05-27T05:48:41Z" level=info msg="loading flattened operator manifest(s) into sqlite" port=50051 type=appregistry
time="2019-05-27T05:48:41Z" level=info msg="using configmap loader to build sqlite database" port=50051 type=appregistry
time="2019-05-27T05:48:41Z" level=info msg="loading CRDs" port=50051 type=appregistry
time="2019-05-27T05:48:41Z" level=info msg="loading Bundles" port=50051 type=appregistry
time="2019-05-27T05:48:41Z" level=info msg="loading Packages" port=50051 type=appregistry
time="2019-05-27T05:48:41Z" level=info msg="extracting provided API information" port=50051 type=appregistry
time="2019-05-27T05:48:41Z" level=info msg="serving registry" port=50051 type=appregistry

Comment 1 Jian Zhang 2019-05-27 08:48:41 UTC
Created attachment 1573849 [details]
The whole catalog-operator logs

Comment 2 Jian Zhang 2019-05-28 03:29:36 UTC
Although it occurs not often, it will block the users to install the operators. Increased Severity.

Comment 3 Evan Cordell 2019-05-28 16:03:25 UTC
I believe I have seen this before, and I wrote down what I thought was a description of the bug:

- when you uninstall an operator and remove the subscription, the installplan for that subscription has an ownerreference pointing back to the subscription.
- if you create a new subscription for the same operator before kube GCs the installplan, we detect it as farther along in the install process (because the installplan exists)
and just sit, waiting for the operator to start up (but it never does because we skipped that part)

The mitigation for this is to wait for the installplans to GC before re-installing an operator. Because there is a workaround, I'm moving down to medium severity.

Moved to 4.1.z so that we can work on a fix for this.

Comment 5 Jian Zhang 2019-08-02 03:06:52 UTC
Evan, Jeff

I think this fixed PR(https://github.com/operator-framework/operator-lifecycle-manager/pull/965) only merged in the master branch, not the release-4.1. Could you help cherry-pick it to release-4.1 branch? Thanks! Verify failed since no fixed PR for 4.1.z version.

Comment 7 Jeff Peeler 2019-08-05 14:49:15 UTC
The 4.1 version of this in progress.

Comment 8 Evan Cordell 2019-08-12 12:13:21 UTC
Targeting this to 4.2, will clone for 4.1

Comment 10 Jian Zhang 2019-08-13 05:54:57 UTC
Evan,

> Targeting this to 4.2, will clone for 4.1

Ok, create bug 1740491 for 4.1.z version.

Comment 11 Jian Zhang 2019-08-14 10:02:20 UTC
LGTM, verify it, detail steps as below:
Cluster version is 4.2.0-0.nightly-2019-08-13-183722
OLM version:
mac:~ jianzhang$ oc exec catalog-operator-5cc66dd5c4-7sw95 -- olm --version
OLM version: 0.11.0
git commit: 586e941bd1f42ea1f331453ed431fb43699fef70

1. Install the Couchbase, anchore-engine and TiDB operators, select the "All Namespaces ..." option.
mac:~ jianzhang$ oc get ip
NAME            CSV                              SOURCE   APPROVAL    APPROVED
install-jlwm7   couchbase-operator.v1.1.0                 Automatic   true
install-s546g   anchore-engine-operator.v0.0.2            Automatic   true
install-zjd7j   tidb-operator.1.0.0-beta1                 Automatic   true
mac:~ jianzhang$ oc get pods
NAME                                       READY   STATUS    RESTARTS   AGE
anchore-engine-operator-5b55589d6f-829md   1/1     Running   0          10m
couchbase-operator-79b995b87d-t8qvh        1/1     Running   0          10m
tidb-controller-manager-7546d898df-6zzbk   1/1     Running   0          4m55s

2. Remove the Couchbase and anchore-engine operators.
3. Install the anchore-engine operator again, select the "All Namespaces ..." option.

mac:~ jianzhang$ oc get ip
NAME            CSV                              SOURCE   APPROVAL    APPROVED
install-6dkxn   anchore-engine-operator.v0.0.2            Automatic   true
install-zjd7j   tidb-operator.1.0.0-beta1                 Automatic   true
mac:~ jianzhang$ oc get pods
NAME                                       READY   STATUS    RESTARTS   AGE
anchore-engine-operator-5b55589d6f-cw5cr   1/1     Running   0          9m6s
tidb-controller-manager-7546d898df-6zzbk   1/1     Running   0          20m


Now, the Anchore operator works well, and the TiDB only refers to itself subscription, others have been deleted.
mac:~ jianzhang$ oc get ip install-zjd7j  -o yaml|grep "ownerReferences" -A 10
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: false
    kind: Subscription
    name: tidb-operator-certified
    uid: 9a80fb2d-be76-11e9-874b-022328418248
  resourceVersion: "126830"
  selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-operators/installplans/install-zjd7j
  uid: 9a8d7a9d-be76-11e9-874b-022328418248
spec:

mac:~ jianzhang$ oc get ip install-6dkxn -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: InstallPlan
metadata:
  creationTimestamp: "2019-08-14T09:45:16Z"
  generateName: install-
  generation: 1
  name: install-6dkxn
  namespace: openshift-operators
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: false
    kind: Subscription
    name: anchore-engine
    uid: 388aabb4-be78-11e9-874b-022328418248
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: false
    kind: Subscription
    name: tidb-operator-certified
    uid: 9a80fb2d-be76-11e9-874b-022328418248
  resourceVersion: "127334"
...

Comment 13 errata-xmlrpc 2019-10-16 06:29:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.