+++ This bug was initially created as a clone of Bug #1761871 +++ Description of problem: Some Subscriptions are not processed by the OLM operators. They do not receive a status block or only after an unreasonable amount of time (10-15 minutes). Version-Release number of selected component (if applicable): OCP 4.1.18 How reproducible: The issue can be observed by repeatedly installing Operators from the same catalog, e.g. in an attempt to manually satisfy all the dependencies of OpenShift Service Mesh: Elastic Search, Jaeger, Kiali. Steps to Reproduce: 1. Install ElasticSearch 2. Subscription get's created, processing takes about 1 minute 3. Pod `installed-redhat-openshift-operators-65d87d7cb9-tpvp4` appears in `openshift-marketplace` namespace as a result of the CatalogSourceConfig in `openshift-operators` namespace 4. Install Jaeger 5. Observe that no status block gets added to the Jaeger subscription 6. Pod `installed-redhat-openshift-operators-65d87d7cb9-tpvp4` gets killed 7. Pod `installed-redhat-openshift-operators-5d66657866-htj7k` appears instead in `openshift-marketplace` 8. OLM catalog operators log: ``` E1015 13:28:10.469041 1 queueinformer_operator.go:186] Sync "openshift-operators" failed: {jaeger-product stable jaeger-operator.v1.13.1 {installed │ │ -redhat-openshift-operators openshift-operators}} not found: rpc error: code = Unknown desc = no bundle found for csv jaeger-operator.v1.13.1 │ │ time="2019-10-15T13:28:14Z" level=info msg="retrying openshift-operators" │ │ E1015 13:28:14.557753 1 queueinformer_operator.go:186] Sync "openshift-operators" failed: {jaeger-product stable jaeger-operator.v1.13.1 {installed │ │ -redhat-openshift-operators openshift-operators}} not found: CatalogSource {installed-redhat-openshift-operators openshift-operators} not found ``` Actual results: Jaeger subscription never resolves. Expected results: Jaeger subscription succeeds. Additional info: Removing and re-installing Jaeger usually solves this. --- Additional comment from Alexander Greene on 2019-10-15 16:59:34 UTC --- Moving to 4.3 as this is not release blocking for 4.2. We will continue to try to reproduce there and backport any applicable fixes to z-stream releases. --- Additional comment from Alexander Greene on 2019-10-16 13:49:54 UTC --- This is not reproducible on a 4.3 cluster - but is reproducible on a 4.1.18 cluster.
I could not replicate on 4.1.22 - everything installs within about 30s (admittedly slower than 4.2/4.3 due to CatalogSourceConfigs on 4.1, but still much faster than this report).
When I was Installing Jaeger I saw the csc pod being killed, @Evan is it something that we should care? oc get pods -n openshift-marketplace NAME READY STATUS RESTARTS AGE certified-operators-7bf59cdfbb-pfz2p 1/1 Running 0 121m community-operators-6ff5dfc595-tvx5f 1/1 Running 0 121m installed-redhat-openshift-operators-5b47dccfbd-427x7 0/1 Terminating 0 2m4s installed-redhat-openshift-operators-cc48fcd9b-9mzqk 1/1 Running 0 22s marketplace-operator-f69f7c6d4-tzcpk 1/1 Running 0 121m redhat-operators-6f9f896c69-2rvt5 1/1 Running 0 121m Events: 72s Normal Killing pod/installed-redhat-openshift-operators-5b47dccfbd-427x7 Stopping container installed-redhat-openshift-operators 3m13s Normal SuccessfulCreate replicaset/installed-redhat-openshift-operators-5b47dccfbd Created pod: installed-redhat-openshift-operators-5b47dccfbd-427x7 72s Normal SuccessfulDelete replicaset/installed-redhat-openshift-operators-5b47dccfbd Deleted pod: installed-redhat-openshift-operators-5b47dccfbd-427x7 However, the Operators took less than 1 minute to be Running. Cluster Version: 4.1.0-0.nightly-2019-11-08-121853 OLM version: 0.9.0 git commit: 9e06c0ad9043872e7fc2b87d13bf1d3832b1bac2 Waiting for Evan response to move the bug status
Changing to assigned for developers visibility
Bruno, The catalog pod restarts like that whenever there is a change to the operators that are watched by the catalogsourceconfig. That is expected. This should be all set if there are no other issues.
Closing this as NotABug as unable to reproduce on later versions of 4.1.22+. Additional concerns around the pod restarting have been confirmed as normal behavior.