Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1965030

Summary: Controller inventory container has memory leaks and restarts if OpenShift Virtualization is not installed on the cluster
Product: Migration Toolkit for Virtualization Reporter: Franco Bladilo <fbladilo>
Component: OperatorAssignee: Franco Bladilo <fbladilo>
Status: CLOSED ERRATA QA Contact: Tzahi Ashkenazi <tashkena>
Severity: low Docs Contact: Avital Pinnick <apinnick>
Priority: low    
Version: 2.0.0CC: apinnick, fdupont, istein
Target Milestone: ---   
Target Release: 2.2.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-09 19:20:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Prometheus chart showing the leak over 1 week none

Description Franco Bladilo 2021-05-26 15:11:30 UTC
Description of problem:

The inventory container seems to be leaking slowly when MTV 2.0.0 is deployed on a cluster without CNV installed. The inventory container logs for controller show an error of the missing VirtualMachine kind continuously.
MTV is configured only with the OCP host provider (default).

Error below :

{"level":"info","ts":1622041766.0487552,"logger":"provider|8jmmm","msg":"Reconcile ended.","provider":"konveyor-forklift/host","reQ":3}
{"level":"error","ts":1622041767.0330288,"logger":"controller-runtime.source","msg":"if kind is a CRD, it should be installed before calling Start","kind":"VirtualMachine.kubevirt.io","error":"no matches for kind \"VirtualMachine\" in version \"kubevirt.io/v1alpha3\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/opt/app-root/pkg/mod/github.com/go-logr/zapr.0/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start\n\t/opt/app-root/pkg/mod/sigs.k8s.io/controller-runtime.4/pkg/source/source.go:117\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/opt/app-root/pkg/mod/sigs.k8s.io/controller-runtime.4/pkg/internal/controller/controller.go:143\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/opt/app-root/pkg/mod/sigs.k8s.io/controller-runtime.4/pkg/internal/controller/controller.go:184\nsigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startRunnable.func1\n\t/opt/app-root/pkg/mod/sigs.k8s.io/controller-runtime.4/pkg/manager/internal.go:676"}
{"level":"info","ts":1622041769.0490327,"logger":"provider|f6hn4","msg":"Reconcile started.","provider":"konveyor-forklift/host"}
{"level":"info","ts":1622041771.7529714,"logger":"provider","msg":"Connection test succeeded."}


Version-Release number of selected component (if applicable):
2.0.0 , OCP 4.7

How reproducible:

Always

Steps to Reproduce:
1. Deploy MTV or Forklift upstream on OCP without CNV
2. Create forkliftcontroller CR and wait for deployment to finish
3. Keep deployment running and watch the restart count on controller pod
4. Examine controller pod to see the OOM terminations for inventory

Actual results:


Expected results:

It should not leak memory or cause restarts

Additional info:

Comment 1 Franco Bladilo 2021-05-26 15:13:41 UTC
Created attachment 1787321 [details]
Prometheus chart showing the leak over 1 week

Attached prometheus memory stats for controller pod

Comment 2 Fabien Dupont 2021-06-24 16:02:50 UTC
@fbladilo, can you change the Ansible role to create the "host" provider only if CNV is installed in the cluster?

Comment 3 Fabien Dupont 2021-09-02 21:01:46 UTC
Please verify with build 2.2.0-1 / iib:104622, on OCP 4.9.

To test, install the MTV operator on an OpenShift cluster where CNV is *NOT* installed.
Once MTV is installed, the OpenShift Virtualization provider named "host" should not be present.

Comment 4 Tzahi Ashkenazi 2021-11-17 08:51:47 UTC
Tested for 20H on  MTV-2.2.0-87  on f06-h36

no memory leaks, the memory consuming during those 20H of the forklift-controller pod was 178MB and stable 

no restarts occurred on forklift-controller  pod :
 
[kni@f06-h36-000-r640 root]$ oc get pods -nopenshift-mtv
NAME                                        READY   STATUS    RESTARTS   AGE
forklift-controller-67fd84598-kwkx9         2/2     Running   0          20h
forklift-must-gather-api-5979b5b97c-gctn2   1/1     Running   0          20h
forklift-operator-7867b4cd45-s7zpv          1/1     Running   0          20h
forklift-ui-b86f47d86-mfdff                 1/1     Running   0          20h
forklift-validation-6c5b5697fb-82xpj        1/1     Running   0          20h

no host provider is present on the OCP :
[kni@f06-h36-000-r640 root]$ oc get providers  -A
No resources found

Comment 9 errata-xmlrpc 2021-12-09 19:20:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (MTV 2.2.0 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:5066