Bug 1746199

Summary: catalog-operator consumes 11GB RSS
Product: OpenShift Container Platform Reporter: Evan Cordell <ecordell>
Component: OLMAssignee: Evan Cordell <ecordell>
OLM sub component: OLM QA Contact: Mike Fiedler <mifiedle>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: bandrade, chuo, erjones, jfan, jiazha, nmalik, scolange
Version: 4.2.0   
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1740857 Environment:
Last Closed: 2019-10-16 06:38:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1740857    

Comment 1 Evan Cordell 2019-08-27 23:23:35 UTC
I believe that this is fixed by:

https://github.com/operator-framework/operator-lifecycle-manager/pull/1008 (bumping grpc-go, which had some memory leaks reported against it)

and 

https://github.com/operator-framework/operator-lifecycle-manager/pull/974 

which included the commits for:

https://github.com/operator-framework/operator-lifecycle-manager/pull/906

which refactored our management of the grpc connections in OLM.

Apologies for the many breadcrumbs.

I had a unit test that showed a memory leak before the commits from #906 merged, and which doesn't leak after it merged. However, #974 made some changes that made that unit test unreliable, so it is not currently included (it's difficult to measure memory usage when using the kube client fake libraries, because they record all cluster actions in memory - #974 changed the resolver to use one of these clients).

A better approach would be to write an blackbox test that could trigger the leaks in an openshift cluster, but we don't currently have that available.

Comment 4 Mike Fiedler 2019-09-03 19:43:59 UTC
On a reliability cluster based on 4.2.0-0.nightly-2019-08-30-102614 which is on its 6th day, cluster-operator RSS is 165MB and holding steady.   Marking verified.

Comment 5 errata-xmlrpc 2019-10-16 06:38:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922