Description of problem: After applied catalogsource yaml, 2 pods generated and keep restarting # oc apply -f deploy/olm-catalog/catalogsource.yaml ... # cat deploy/olm-catalog/catalogsource.yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: wmco # Deploying the openshift-marketplace namespace as it is the global catalog namespace # a subscription in any namespace can refer to catalogsources in this namespace without error. namespace: openshift-marketplace spec: displayName: Windows Machine Config operators sourceType: grpc image: quay.io/sgaoshang/wmco-index:1.0.0 updateStrategy: registryPoll: interval: 5m # oc get all -n openshift-marketplace NAME READY STATUS RESTARTS AGE ... pod/wmco-h5gtn 1/1 Running 0 16m pod/wmco-xqlz4 1/1 Running 0 10m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ... service/wmco ClusterIP 172.30.161.103 <none> 50051/TCP 14h Version-Release number of selected component (if applicable): # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-07-01-213040 True False 31h Cluster version is 4.5.0-0.nightly-2020-07-01-213040 How reproducible: always Steps to Reproduce: 1. Create wmco operator image, bundle image and index image, push to quay.io, refer to https://github.com/openshift/windows-machine-config-operator/blob/master/README.md 2. Install OCP cluster with ovn-kubernetes, apply catalogsource e.g. # oc apply -f deploy/olm-catalog/catalogsource.yaml ... 3. Check registry-server pod generated in openshift-marketplace, 2 wmco pods exist and keep restarting, seems like CatalogSource object keep creating new pod even there's no update in operator index images, and the old pod can not be removed # oc get all -n openshift-marketplace NAME READY STATUS RESTARTS AGE ... pod/wmco-h5gtn 1/1 Running 0 16m pod/wmco-xqlz4 1/1 Running 0 10m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ... service/wmco ClusterIP 172.30.161.103 <none> 50051/TCP 14h Actual results: Expected results: Additional info:
Looks like this is because the registry polling interval is too short. Can you try increasing this to 25m and see if the problem persists? See https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/catalog-polling.md#caveats
(In reply to Ankita Thomas from comment #1) > Looks like this is because the registry polling interval is too short. Can > you try increasing this to 25m and see if the problem persists? > > See > https://github.com/operator-framework/operator-lifecycle-manager/blob/master/ > doc/design/catalog-polling.md#caveats Thanks for the info, unfortunately after changed registry polling interval to 25m, this issue still exist, added logs from catalog-operator. # oc get all -n openshift-marketplace NAME READY STATUS RESTARTS AGE ... pod/wmco-46wlx 1/1 Running 0 9m54s pod/wmco-qc5b9 1/1 Running 0 9m56s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ... service/wmco ClusterIP 172.30.48.15 <none> 50051/TCP 40m # oc logs -f deployment.apps/catalog-operator -n openshift-operator-lifecycle-manager ... time="2020-07-07T08:52:57Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=CONNECTING" time="2020-07-07T08:52:58Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=TRANSIENT_FAILURE" time="2020-07-07T08:52:59Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=CONNECTING" time="2020-07-07T08:52:59Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=TRANSIENT_FAILURE" time="2020-07-07T08:53:01Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=CONNECTING" time="2020-07-07T08:53:01Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=TRANSIENT_FAILURE" time="2020-07-07T08:53:03Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=CONNECTING" time="2020-07-07T08:53:03Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=TRANSIENT_FAILURE" time="2020-07-07T08:53:07Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=CONNECTING" time="2020-07-07T08:53:07Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=TRANSIENT_FAILURE" time="2020-07-07T08:53:13Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=CONNECTING" time="2020-07-07T08:53:13Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=READY" time="2020-07-07T08:54:39Z" level=info msg="Adding related objects for operator-lifecycle-manager-catalog" time="2020-07-07T08:59:39Z" level=info msg="Adding related objects for operator-lifecycle-manager-catalog" time="2020-07-07T09:04:39Z" level=info msg="Adding related objects for operator-lifecycle-manager-catalog" time="2020-07-07T09:09:39Z" level=info msg="Adding related objects for operator-lifecycle-manager-catalog" time="2020-07-07T09:14:39Z" level=info msg="Adding related objects for operator-lifecycle-manager-catalog" time="2020-07-07T09:19:39Z" level=info msg="Adding related objects for operator-lifecycle-manager-catalog" time="2020-07-07T09:23:55Z" level=info msg="no image update for catalogsource pod" CatalogSource=wmco time="2020-07-07T09:23:55Z" level=info msg="creating new catalog source update pod" CatalogSource=wmco time="2020-07-07T09:23:55Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=CONNECTING" time="2020-07-07T09:23:55Z" level=warning msg="pod status unknown" CatalogSource=wmco-qc5b9 time="2020-07-07T09:23:55Z" level=info msg="ImageID " CatalogSource=wmco-qc5b9 time="2020-07-07T09:23:55Z" level=info msg="Update Pod ImageID " CatalogSource=wmco-qc5b9 time="2020-07-07T09:23:55Z" level=info msg="ImageID quay.io/sgaoshang/wmco-index@sha256:e29b54c17d6458488acbd6b860d355ce4b6b58a568f05fee3218538745818fd7" CatalogSource=wmco-bhvcf time="2020-07-07T09:23:55Z" level=info msg="Serving Pod ImageID quay.io/sgaoshang/wmco-index@sha256:e29b54c17d6458488acbd6b860d355ce4b6b58a568f05fee3218538745818fd7" CatalogSource=wmco-bhvcf time="2020-07-07T09:23:55Z" level=info msg="detect image update for catalogsource pod" CatalogSource=wmco time="2020-07-07T09:23:55Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=READY" E0707 09:23:55.225719 1 queueinformer_operator.go:290] sync {"update" "openshift-marketplace/wmco"} failed: couldn't ensure registry server - error ensuring updated catalog source pod: : error creating new pod: : Operation cannot be fulfilled on pods "wmco-qc5b9": the object has been modified; please apply your changes to the latest version and try again time="2020-07-07T09:23:55Z" level=info msg="ImageID " CatalogSource=wmco-qc5b9 time="2020-07-07T09:23:55Z" level=info msg="Update Pod ImageID " CatalogSource=wmco-qc5b9 time="2020-07-07T09:23:55Z" level=info msg="ImageID quay.io/sgaoshang/wmco-index@sha256:e29b54c17d6458488acbd6b860d355ce4b6b58a568f05fee3218538745818fd7" CatalogSource=wmco-bhvcf time="2020-07-07T09:23:55Z" level=info msg="Serving Pod ImageID quay.io/sgaoshang/wmco-index@sha256:e29b54c17d6458488acbd6b860d355ce4b6b58a568f05fee3218538745818fd7" CatalogSource=wmco-bhvcf time="2020-07-07T09:23:55Z" level=info msg="detect image update for catalogsource pod" CatalogSource=wmco E0707 09:23:55.254957 1 queueinformer_operator.go:290] sync {"update" "openshift-marketplace/wmco"} failed: couldn't ensure registry server - error ensuring updated catalog source pod: : error creating new pod: : Operation cannot be fulfilled on pods "wmco-qc5b9": the object has been modified; please apply your changes to the latest version and try again time="2020-07-07T09:23:55Z" level=info msg="ImageID " CatalogSource=wmco-qc5b9 time="2020-07-07T09:23:55Z" level=info msg="Update Pod ImageID " CatalogSource=wmco-qc5b9 time="2020-07-07T09:23:55Z" level=info msg="ImageID quay.io/sgaoshang/wmco-index@sha256:e29b54c17d6458488acbd6b860d355ce4b6b58a568f05fee3218538745818fd7" CatalogSource=wmco-bhvcf time="2020-07-07T09:23:55Z" level=info msg="Serving Pod ImageID quay.io/sgaoshang/wmco-index@sha256:e29b54c17d6458488acbd6b860d355ce4b6b58a568f05fee3218538745818fd7" CatalogSource=wmco-bhvcf time="2020-07-07T09:23:55Z" level=info msg="detect image update for catalogsource pod" CatalogSource=wmco time="2020-07-07T09:23:55Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=CONNECTING" time="2020-07-07T09:23:55Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=SHUTDOWN" time="2020-07-07T09:23:55Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=CONNECTING" time="2020-07-07T09:23:55Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=TRANSIENT_FAILURE" time="2020-07-07T09:23:56Z" level=info msg="no image update for catalogsource pod" CatalogSource=wmco time="2020-07-07T09:23:56Z" level=info msg="creating new catalog source update pod" CatalogSource=wmco time="2020-07-07T09:23:56Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=CONNECTING" time="2020-07-07T09:23:56Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=CONNECTING" time="2020-07-07T09:23:57Z" level=info msg="ImageID " CatalogSource=wmco-bs8pq time="2020-07-07T09:23:57Z" level=info msg="Update Pod ImageID " CatalogSource=wmco-bs8pq time="2020-07-07T09:23:57Z" level=info msg="ImageID " CatalogSource=wmco-qc5b9 time="2020-07-07T09:23:57Z" level=info msg="Serving Pod ImageID " CatalogSource=wmco-qc5b9 time="2020-07-07T09:23:57Z" level=info msg="no image update for catalogsource pod" CatalogSource=wmco time="2020-07-07T09:23:57Z" level=info msg="creating new catalog source update pod" CatalogSource=wmco time="2020-07-07T09:23:57Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=SHUTDOWN" time="2020-07-07T09:23:57Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=CONNECTING" time="2020-07-07T09:24:13Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=wmco state.State=READY" time="2020-07-07T09:24:39Z" level=info msg="Adding related objects for operator-lifecycle-manager-catalog" time="2020-07-07T09:29:39Z" level=info msg="Adding related objects for operator-lifecycle-manager-catalog" time="2020-07-07T09:34:39Z" level=info msg="Adding related objects for operator-lifecycle-manager-catalog"
If I removed following section in catalogsource.yaml, this problem can not be reproduced. --- updateStrategy: registryPoll: interval: 5m
From the above logs, we can see unnecessary update operations, as follows, submit a PR to fix: https://github.com/operator-framework/operator-lifecycle-manager/pull/1729 time="2020-07-07T09:23:57Z" level=info msg="Update Pod ImageID " CatalogSource=wmco-bs8pq time="2020-07-07T09:23:57Z" level=info msg="ImageID " CatalogSource=wmco-qc5b9 time="2020-07-07T09:23:57Z" level=info msg="Serving Pod ImageID " CatalogSource=wmco-qc5b9
[root@preserve-olm-env data]# oc exec catalog-operator-7574bc8948-gx62l -- olm --version OLM version: 0.16.0 git commit: c3852d57c86707deb80c042c2155ad82c2d9628f Based on my understanding, there are two pods for the catalogsource is as expected. Because one of the pods is curently served pod. Anohter one is used to check the index image updates. So, it will be restarted every poll interval. [root@preserve-olm-env data]# oc get catalogsource qe-app-registry -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource ... spec: displayName: Production Operators image: quay.io/openshift-qe-optional-operators/ocp4-index:latest publisher: OpenShift QE sourceType: grpc updateStrategy: registryPoll: interval: 15m [root@preserve-olm-env data]# oc get pods NAME READY STATUS RESTARTS AGE ... qe-app-registry-7wk4v 1/1 Running 0 4m43s qe-app-registry-ml5p8 1/1 Running 0 62m As we can see, this pod is the serveing pod, it runs hours without restart since no updates for index image. [root@preserve-olm-env data]# oc get pods qe-app-registry-ml5p8 -o=jsonpath='{.metadata.labels}' {"olm.catalogSource":"qe-app-registry"} This one is used to check the index image updates. So, it will be recreated every poll interval. [root@preserve-olm-env data]# oc get pods qe-app-registry-7wk4v -o=jsonpath='{.metadata.labels}' {"catalogsource.operators.coreos.com/update":"qe-app-registry","olm.catalogSource":""} Verify it.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196