Bug 1834136

Summary:	failed: couldn't find queue operatorname for event: {update }
Product:	OpenShift Container Platform	Reporter:	Jatan Malde <jmalde>
Component:	OLM	Assignee:	Nick Hale <nhale>
OLM sub component:	OLM	QA Contact:	kuiwang
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	aabhishe, ecordell, kuiwang, nhale
Version:	4.2.z	Keywords:	UpcomingSprint
Target Milestone:	---
Target Release:	4.5.0
Hardware:	x86_64
OS:	Linux
Whiteboard:	backport-to: 4.2
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: Garbage collection resource event queue wasn't configured correctly. Consequence: Cluster-scoped resources generated for operators managed by OLM are never cleaned up when the operator is uninstalled. Fix: Reconfigure garbage collection queues to be hit for owner labels referencing any namespace. Result: Cluster-scoped resources generated for operators managed by OLM are cleaned up when the operator is uninstalled.	Story Points:	---
Clone Of:
Clones:	1836905 (view as bug list)		Environment:
Last Closed:	2020-07-13 17:36:48 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1836905

Description Jatan Malde 2020-05-11 07:06:14 UTC

Description of problem:

IHAC with OCP 4.2.x where operators are not getting deployed with OLM and we see the following message in the OLM operator and no events on csv outputs. 

E0507 00:18:10.044212 1 queueinformer_operator.go:282] sync {"update" "cam-operator.v1.1.1-9bcwg"} failed: couldn't find queue '3scale-project-template' for event: {update 0xc004208dc0}

Version-Release number of selected component (if applicable):


How reproducible:
Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Attaching the latest must-gather and an example operator content from the cluster.

Comment 2 Nick Hale 2020-05-11 15:38:44 UTC

@Jatan, I was not able to reproduce the issue described on 4.2.33 using the following inferred steps:

1. Create OpenShift 4.3.33 cluster (using clusterbot)
2. Create test-3scale namespace
3. Create Subscription to the 3scale operator from the redhat-operators catalog in the test-3scale namespace
4. Create Subscription to the cam-operator from the redhat-operators catlaog in the test-3scale namespace
5. Create Subscription to the above two operators in the default namespace

Results:

- CSVs are created in the test-3scale and default naemspaces for the redhat-operators and cam-operator (4 total; 2 for each namespace)
- CSVs have status with phases indicating successful installation
- Operator Deployments are healthy

From reading through the collab-shell files, we seem to be missing some data. There are several namespaces mentioned in the attached error logs that are not included in the must-gather; e.g. "3scale-project-template". We're going to need the resources from those namespaces to triage further.

My interpretation of the support thread is that the cluster in question is deployed to the customer's internal environment, and that there were several coincident issues that spawned bugs for other OpenShift components. To help streamline triage, could you attempt to reproduce on a fresh cluster with the same initial configuration as the customer's? After, could you please post the steps to set up the cluster and reproduce?

(I'll keep this bug at high priority -- for now -- since the potential impact on the cluster is the inability to install operators.)

Comment 21 errata-xmlrpc 2020-07-13 17:36:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409