Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1767547

Summary:

Marketplace-operator produces unbounded number of package registry replicasets when CatalogSourceConfig targetNamespace is missing

Product:

OpenShift Container Platform

Reporter:

Chance Zibolski <chancez>

Component:

OLM

Assignee:

Kevin Rizza <krizza>

OLM sub component:

OperatorHub

QA Contact:

Fan Jia <jfan>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

high

Priority:

unspecified

CC:

cblecker, ecordell, jeder, rbastos

Version:

4.3.0

Target Milestone:

---

Target Release:

4.3.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1769841 1769844 (view as bug list)

Environment:

Last Closed:

2020-01-23 11:10:15 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1769841, 1769844

Attachments:

Description	Flags
screenshot of etcd object count for the cluster this occurred on	none

Description Chance Zibolski 2019-10-31 16:29:07 UTC

Description of problem: The marketplace-operator pod updates the package-registry deployment for CatalogSourceConfigs forever when the targetNamespace of the CatalogSourceConfig is missing. 

As a result, this triggers the deployment to create a new replicaset, but for some reason the old ones are left around. This results in an unbounded growth in the number of replicasets being created. Since this reconciles fairly regularly according to logs and the number of replicasets after 24 hours, you can see thousands of replicasets over 24 hours. 


Since the growth is unbounded, eventually there will be too many objects for the master to manage in either the controllers, apiserver, or etcd.

Version-Release number of selected component (if applicable): OCP 4.2.2, sometime after upgrading to 4.1.18, but may or may not be tied to an upgrade.


How reproducible: Unknown


Steps to Reproduce:

I don't have actual steps. I just know it was caused by the targetNamespace specified in a CatalogSourceConfig (created via the UI or something) targeting a namespace that was deleted and didnt exist, and it was resolved by re-creating that namespace.

Actual results:

I noticed it after roughly 24 hours, and here's what I found:

Here are the events in the openshift-marketplace namespace: https://gist.github.com/chancez/8680938c4e9fa6e1d591ffb90615f367, you can see the pods and replicasets changing regularly for what seems to be no reason.

Here's the list of replicasets in the namespace https://gist.github.com/chancez/b0d161f9ed11475516308fc2e6968fa2

A rough count of replicasets:
kubectl get rs -n openshift-marketplace  | wc -l
   38251

When I inspect the deployment, they do have a revisionHistoryLimit of 10, so I don't understand why that isn't applying here.

And here is a snippet of the marketplace-operator's pod logs: https://gist.github.com/chancez/0795be851895ca7753bafb4504bc2338

From the logs you can see for each CatalogSourceConfig where the targetNamespace doesn't exist, you get logs like so:

time="2019-10-29T14:58:45Z" level=info msg="Reconciling CatalogSourceConfig openshift-marketplace/elasticsearch\n"
time="2019-10-29T14:58:45Z" level=info msg="Updated Deployment elasticsearch with registry command: [appregistry-server -r https://quay.io/cnr|redhat-operators -o elasticsearch-operator]" name=elasticsearch targetNamespace=openshift-operators-redhat type=CatalogSourceConfig
time="2019-10-29T14:58:45Z" level=info msg="Service elasticsearch is present" name=elasticsearch targetNamespace=openshift-operators-redhat type=CatalogSourceConfig
time="2019-10-29T14:58:45Z" level=info msg="Child resource openshift-marketplace/elasticsearch owned by a CatalogSourceConfig was deleted"
time="2019-10-29T14:58:45Z" level=info msg="Deleted Service elasticsearch" name=elasticsearch targetNamespace=openshift-operators-redhat type=CatalogSourceConfig
time="2019-10-29T14:58:45Z" level=info msg="Created Service elasticsearch" name=elasticsearch targetNamespace=openshift-operators-redhat type=CatalogSourceConfig
time="2019-10-29T14:58:45Z" level=info msg="Creating CatalogSource elasticsearch" name=elasticsearch targetNamespace=openshift-operators-redhat type=CatalogSourceConfig
time="2019-10-29T14:58:45Z" level=error msg="Failed to create CatalogSource : namespaces \"openshift-operators-redhat\" not found" name=elasticsearch targetNamespace=openshift-operators-redhat type=CatalogSourceConfig



Expected results: Old replicasets are deleted, and that the operator somehow surfaces the issue of the targetNamespace outside of the pods logs, potentially in Kubernetes events.


Additional info:
ClusterID is af8bc55b-9ae3-4735-bf65-b6ef43aeced9. I was able to resolve it be creating the missing namespaces.

Comment 2 Chance Zibolski 2019-10-31 16:33:57 UTC

Created attachment 1631159 [details]
screenshot of etcd object count for the cluster this occurred on

Comment 4 Fan Jia 2019-11-11 06:02:53 UTC

No nightly build include this fix PR for now.

Comment 7 Rogerio Bastos 2019-11-13 15:01:01 UTC

*** Bug 1771747 has been marked as a duplicate of this bug. ***

Comment 8 Rogerio Bastos 2019-11-13 15:03:00 UTC

This issue was also identified in v4.2.2, with customer clusters in production. The duplicated BZ for additional info is: https://bugzilla.redhat.com/show_bug.cgi?id=1771747

Comment 10 errata-xmlrpc 2020-01-23 11:10:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062