1763838 – Subscriptions are not getting processed / take very long to get processed

Bug 1763838 - Subscriptions are not getting processed / take very long to get processed

Summary: Subscriptions are not getting processed / take very long to get processed

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	OLM
Sub Component:
Version:	4.1.z
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.1.z
Assignee:	Evan Cordell
QA Contact:	Bruno Andrade
Docs Contact:
URL:
Whiteboard:
Depends On:	1761871 1763841
Blocks:
TreeView+	depends on / blocked

Reported:	2019-10-21 17:43 UTC by Alexander Greene
Modified:	2020-02-18 13:58 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1761871
Environment:
Last Closed:	2020-02-18 13:58:41 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Alexander Greene 2019-10-21 17:43:16 UTC

+++ This bug was initially created as a clone of Bug #1761871 +++

Description of problem:

Some Subscriptions are not processed by the OLM operators. They do not receive a status block or only after an unreasonable amount of time (10-15 minutes). 


Version-Release number of selected component (if applicable):

OCP 4.1.18


How reproducible:

The issue can be observed by repeatedly installing Operators from the same catalog, e.g. in an attempt to manually satisfy all the dependencies of OpenShift Service Mesh: Elastic Search, Jaeger, Kiali.


Steps to Reproduce:
1. Install ElasticSearch
2. Subscription get's created, processing takes about 1 minute
3. Pod `installed-redhat-openshift-operators-65d87d7cb9-tpvp4` appears in `openshift-marketplace` namespace as a result of the CatalogSourceConfig in `openshift-operators` namespace
4. Install Jaeger
5. Observe that no status block gets added to the Jaeger subscription
6. Pod `installed-redhat-openshift-operators-65d87d7cb9-tpvp4` gets killed
7. Pod `installed-redhat-openshift-operators-5d66657866-htj7k` appears instead in `openshift-marketplace`
8. OLM catalog operators log:

```
E1015 13:28:10.469041       1 queueinformer_operator.go:186] Sync "openshift-operators" failed: {jaeger-product stable jaeger-operator.v1.13.1 {installed │
│ -redhat-openshift-operators openshift-operators}} not found: rpc error: code = Unknown desc = no bundle found for csv jaeger-operator.v1.13.1             │
│ time="2019-10-15T13:28:14Z" level=info msg="retrying openshift-operators"                                                                                 │
│ E1015 13:28:14.557753       1 queueinformer_operator.go:186] Sync "openshift-operators" failed: {jaeger-product stable jaeger-operator.v1.13.1 {installed │
│ -redhat-openshift-operators openshift-operators}} not found: CatalogSource {installed-redhat-openshift-operators openshift-operators} not found
```

Actual results:

Jaeger subscription never resolves.


Expected results:

Jaeger subscription succeeds.


Additional info:

Removing and re-installing Jaeger usually solves this.

--- Additional comment from Alexander Greene on 2019-10-15 16:59:34 UTC ---

Moving to 4.3 as this is not release blocking for 4.2.  We will continue to try to reproduce there and backport any applicable fixes to z-stream releases.

--- Additional comment from Alexander Greene on 2019-10-16 13:49:54 UTC ---

This is not reproducible on a 4.3 cluster - but is reproducible on a 4.1.18 cluster.

Comment 2 Evan Cordell 2019-11-07 13:34:48 UTC

I could not replicate on 4.1.22 - everything installs within about 30s (admittedly slower than 4.2/4.3 due to CatalogSourceConfigs on 4.1, but still much faster than this report).

Comment 4 Bruno Andrade 2019-11-11 20:20:09 UTC

When I was Installing Jaeger I saw the csc pod being killed, @Evan is it something that we should care?

oc get pods -n openshift-marketplace
NAME                                                    READY   STATUS        RESTARTS   AGE
certified-operators-7bf59cdfbb-pfz2p                    1/1     Running       0          121m
community-operators-6ff5dfc595-tvx5f                    1/1     Running       0          121m
installed-redhat-openshift-operators-5b47dccfbd-427x7   0/1     Terminating   0          2m4s
installed-redhat-openshift-operators-cc48fcd9b-9mzqk    1/1     Running       0          22s
marketplace-operator-f69f7c6d4-tzcpk                    1/1     Running       0          121m
redhat-operators-6f9f896c69-2rvt5                       1/1     Running       0          121m

Events:

72s         Normal    Killing             pod/installed-redhat-openshift-operators-5b47dccfbd-427x7    Stopping container installed-redhat-openshift-operators
3m13s       Normal    SuccessfulCreate    replicaset/installed-redhat-openshift-operators-5b47dccfbd   Created pod: installed-redhat-openshift-operators-5b47dccfbd-427x7
72s         Normal    SuccessfulDelete    replicaset/installed-redhat-openshift-operators-5b47dccfbd   Deleted pod: installed-redhat-openshift-operators-5b47dccfbd-427x7

However, the Operators took less than 1 minute to be Running.

Cluster Version: 4.1.0-0.nightly-2019-11-08-121853
OLM version: 0.9.0
git commit: 9e06c0ad9043872e7fc2b87d13bf1d3832b1bac2

Waiting for Evan response to move the bug status

Comment 5 Bruno Andrade 2019-11-12 12:20:13 UTC

Changing to assigned for developers visibility

Comment 7 Kevin Rizza 2019-11-19 19:18:07 UTC

Bruno,

The catalog pod restarts like that whenever there is a change to the operators that are watched by the catalogsourceconfig. That is expected. This should be all set if there are no other issues.

Comment 8 Dan Geoffroy 2020-02-18 13:58:41 UTC

Closing this as NotABug as unable to reproduce on later versions of 4.1.22+.  Additional concerns around the pod restarting have been confirmed as normal behavior.

Note You need to log in before you can comment on or make changes to this bug.