Bug 1952121

Summary:

Inventory container crashes because of OOM condition, causing controller pod restarts

Product:

Migration Toolkit for Virtualization

Reporter:

Tzahi Ashkenazi <tashkena>

Component:

General

Assignee:

Jeff Ortel <jortel>

Status:

CLOSED ERRATA

QA Contact:

Tzahi Ashkenazi <tashkena>

Severity:

high

Docs Contact:

Avital Pinnick <apinnick>

Priority:

urgent

Version:

2.0.0

CC:

apinnick, dagur, dvaanunu, fdupont, istein, jortel

Target Milestone:

---

Target Release:

2.0.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2021-06-10 17:11:46 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
cloud38 memory profile.	none
kubectl top output.	none
psi4 profile.	none
psi4 controller container profile.	none
controller pod memory - grafana screenshot	none

Description Tzahi Ashkenazi 2021-04-21 14:42:14 UTC

Description of problem:
on cloud38 BM  6 nodes 
on the idle state  ( all pods deleted and created from scratch ) 
the controller pod restarted 5 times during 180min

root@f02-h07-000-r640:/home/kni/scripts/iperf$ oc get pods
NAME                                   READY   STATUS    RESTARTS   AGE
forklift-controller-64585c555b-t2bkl   2/2     Running   4          175m
forklift-operator-847f9d45d7-pgnzx     1/1     Running   0          175m
forklift-ui-7fc8495999-6xhk2           1/1     Running   0          175m
forklift-validation-7977854bdd-xfqm9   1/1     Running   0          175m
iperf-client-h15                       1/1     Running   0          95m


from the controller pod : 

    State:          Running
      Started:      Wed, 21 Apr 2021 14:24:16 +0000
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Wed, 21 Apr 2021 13:41:14 +0000
      Finished:     Wed, 21 Apr 2021 14:24:13 +0000
    Ready:          True
    Restart Count:  4
    Limits:
      cpu:     100m
      memory:  800Mi
    Requests:
      cpu:     100m
      memory:  350Mi
    Environment Variables from:
      forklift-controller-config  ConfigMap  Optional: false
    Environment:
      POD_NAMESPACE:                 openshift-rhmtv (v1:metadata.namespace)
      ROLE:                          inventory
      SECRET_NAME:                   webhook-server-secret
      API_PORT:                      8443
      API_TLS_ENABLED:               true
      API_TLS_CERTIFICATE:           /var/run/secrets/forklift-inventory-serving-cert/tls.crt
      API_TLS_KEY:                   /var/run/secrets/forklift-inventory-serving-cert/tls.key
      METRICS_PORT:                  8081
      POLICY_AGENT_URL:              https://forklift-validation.openshift-rhmtv.svc.cluster.local:8181
      POLICY_AGENT_SEARCH_INTERVAL:  120


Version-Release number of selected component (if applicable):
MTV 2.0.0.18
CNV 2.6.1

Comment 1 Fabien Dupont 2021-04-21 19:41:34 UTC

The fix should be in build 2.0.0-20 / iib:69034.

Comment 2 Tzahi Ashkenazi 2021-04-22 13:29:43 UTC

reproduce on 2.0.0.20  

oc get pods/forklift-controller-86986fd75b-hgr7f  -nopenshift-rhmtv -oyaml |les

    imageID: registry.redhat.io/rhmtv/rhmtv-controller@sha256:4a3766c3c467a0d24b34ea1ed692c040c76d14cc13b805fc971089255489a88f
    lastState:
      terminated:
        containerID: cri-o://cd7096c6ae240e953d15a35f8a7f6ed7663a6d8c4273961b537b4589bd4a6a77
        exitCode: 137
        finishedAt: "2021-04-22T12:39:56Z"
        reason: OOMKilled
        startedAt: "2021-04-22T07:53:43Z"
    name: controller
    ready: true
    restartCount: 1
    started: true
    state:
      running:
        startedAt: "2021-04-22T12:39:58Z"
  - containerID: cri-o://3630fe55b0df45050030159c23a892fc360380582359ec8f5aba204bd25cb7af
    image: registry.redhat.io/rhmtv/rhmtv-controller@sha256:4a3766c3c467a0d24b34ea1ed692c040c76d14cc13b805fc971089255489a88f
    imageID: registry.redhat.io/rhmtv/rhmtv-controller@sha256:4a3766c3c467a0d24b34ea1ed692c040c76d14cc13b805fc971089255489a88f
    lastState:
      terminated:
        containerID: cri-o://65896916004c95b4d65820dac52359bda5beb6d16e310d27a2a3b947c1527c10
        exitCode: 137
        finishedAt: "2021-04-22T11:32:16Z"
        reason: OOMKilled
        startedAt: "2021-04-22T07:53:44Z"
    name: inventory
    ready: true
    restartCount: 1
    started: true
    state:
      running:
        startedAt: "2021-04-22T11:32:18Z"
  hostIP: 192.168.208.15
  phase: Running
  podIP: 10.131.0.162
  podIPs:
  - ip: 10.131.0.162
  qosClass: Burstable
  startTime: "2021-04-22T07:53:33Z"

Comment 3 Tzahi Ashkenazi 2021-04-22 13:32:57 UTC

I guess this current BZ is impacted by this one > https://bugzilla.redhat.com/show_bug.cgi?id=1952450

Comment 4 Tzahi Ashkenazi 2021-04-22 13:56:19 UTC

oc describe node f02-h18-000-r640.rdu2.scalelab.redhat.com

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                       Requests     Limits
  --------                       --------     ------
  cpu                            1179m (1%)   300m (0%)
  memory                         3356Mi (0%)  2400Mi (0%)
  ephemeral-storage              0 (0%)       0 (0%)
  hugepages-1Gi                  0 (0%)       0 (0%)
  hugepages-2Mi                  0 (0%)       0 (0%)
  devices.kubevirt.io/kvm        0            0
  devices.kubevirt.io/tun        0            0
  devices.kubevirt.io/vhost-net  0            0
  openshift.io/sriov_nics        0            0
Events:
  Type     Reason     Age   From     Message
  ----     ------     ----  ----     -------
  Warning  SystemOOM  142m  kubelet  System OOM encountered, victim process: manager, pid: 3564364
  Warning  SystemOOM  75m   kubelet  System OOM encountered, victim process: manager, pid: 3564270

Comment 5 Jeff Ortel 2021-04-25 13:53:48 UTC

Created attachment 1775187 [details]
cloud38 memory profile.

Comment 6 Jeff Ortel 2021-04-25 13:57:48 UTC

Created attachment 1775188 [details]
kubectl top output.

Comment 7 Jeff Ortel 2021-04-26 16:05:21 UTC

Created attachment 1775637 [details]
psi4 profile.

Reproduced on my PSI cluster.

Comment 8 Jeff Ortel 2021-04-26 16:06:27 UTC

Created attachment 1775638 [details]
psi4 controller container profile.

Reproduced on my PSI cluster.
This is the "controller" container (not inventory).

Comment 9 Jeff Ortel 2021-04-27 13:26:17 UTC

The standard Go http lib transport defaults to: unlimited idle connections to support connection reuse.  For some reason, the idle connection retains IO buffers for the same unlimited duration.  The fix is to configure the transport to limit the number and lifespan of idle connections.

https://github.com/konveyor/forklift-controller/pull/229

Comment 10 David Vaanunu 2021-04-29 09:34:26 UTC

reproduce on MTV 2.0.0.21 - another scale env(Cloud10)

  inventory:
    Container ID:  cri-o://60a44db3b8f133849f5999fff8a0420df92f9cb800902a9d88aa6687750249f4
    Image:         registry.redhat.io/rhmtv/rhmtv-controller@sha256:4a3766c3c467a0d24b34ea1ed692c040c76d14cc13b805fc971089255489a88f
    Image ID:      registry.redhat.io/rhmtv/rhmtv-controller@sha256:4a3766c3c467a0d24b34ea1ed692c040c76d14cc13b805fc971089255489a88f
    Port:          8443/TCP
    Host Port:     0/TCP
    Command:
      /usr/local/bin/manager
    State:          Running
      Started:      Wed, 28 Apr 2021 23:50:54 +0000
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Wed, 28 Apr 2021 15:17:05 +0000
      Finished:     Wed, 28 Apr 2021 23:50:53 +0000
    Ready:          True
    Restart Count:  2
    Limits:
      cpu:     100m
      memory:  800Mi
    Requests:
      cpu:     100m
      memory:  350Mi
    Environment Variables from:
      forklift-controller-config  ConfigMap  Optional: false
    Environment:
      POD_NAMESPACE:                 openshift-rhmtv (v1:metadata.namespace)
      ROLE:                          inventory
      SECRET_NAME:                   webhook-server-secret
      API_PORT:                      8443
      API_TLS_ENABLED:               true
      API_TLS_CERTIFICATE:           /var/run/secrets/forklift-inventory-serving-cert/tls.crt
      API_TLS_KEY:                   /var/run/secrets/forklift-inventory-serving-cert/tls.key
      METRICS_PORT:                  8081
      POLICY_AGENT_URL:              https://forklift-validation.openshift-rhmtv.svc.cluster.local:8181
      POLICY_AGENT_SEARCH_INTERVAL:  120



oc describe node f01-h26-000-r640.rdu2.scalelab.redhat.com

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                       Requests     Limits
  --------                       --------     ------
  cpu                            1820m (2%)   500m (0%)
  memory                         7890Mi (2%)  500Mi (0%)
  ephemeral-storage              0 (0%)       0 (0%)
  hugepages-1Gi                  0 (0%)       0 (0%)
  hugepages-2Mi                  0 (0%)       0 (0%)
  devices.kubevirt.io/kvm        1            1
  devices.kubevirt.io/tun        1            1
  devices.kubevirt.io/vhost-net  1            1
Events:

Comment 11 Fabien Dupont 2021-05-03 12:09:09 UTC

The fix should be part of build mtv-operator-bundle-container-2.0.0-4 / iib:72115.

Comment 14 Fabien Dupont 2021-05-12 12:47:41 UTC

Would it be possible to reproduce and note the amount of memory consumed by the pod when it is killed?

The guess is that it is consuming more than the 800Mi limit. This is possible, but surprising. @jortel, what do you think?

Comment 15 David Vaanunu 2021-05-12 13:04:47 UTC

Created attachment 1782406 [details]
controller pod memory - grafana screenshot

Comment 16 Tzahi Ashkenazi 2021-05-12 14:04:25 UTC

continue to David comment : 

Also, OOM occurred on cloud38 as well. 
During the OOM time, the env was idle and have 4 plans with succeeded status.
cloud38: f02-h07-000-r640.rdu2.scalelab.redhat.com (root ; 100yard-)

- containerID: cri-o://d9114299068a7ad9221e42619b1a2e7fb1d69156840a8acf1405c82af106f1b1
    image: registry.redhat.io/mtv/mtv-controller@sha256:666e415b74f7d93e5b91faba038b191da65619bed3f1ead7ab5fdb56873c61f7
    imageID: registry.redhat.io/mtv/mtv-controller@sha256:666e415b74f7d93e5b91faba038b191da65619bed3f1ead7ab5fdb56873c61f7
    lastState:
      terminated:
        containerID: cri-o://7210a625d5d7a0fd859eb4e81e9284bede921d5da2c61e463eb557a0d3448f1e
        exitCode: 137
        finishedAt: "2021-05-12T02:21:49Z"
        reason: OOMKilled
        startedAt: "2021-05-10T15:03:53Z"
    name: inventory
    ready: true
    restartCount: 1
    started: true
    state:
      running:
        startedAt: "2021-05-12T02:21:51Z"
  hostIP: 192.168.208.14
  phase: Running
  podIP: 10.128.3.141
  podIPs:
  - ip: 10.128.3.141
  qosClass: Burstable
  startTime: "2021-05-10T15:03:46Z"





root@f02-h07-000-r640:~$ oc get pods -nopenshift-mtv -owide
NAME                                   READY   STATUS    RESTARTS   AGE   IP             NODE                                        NOMINATED NODE   READINESS GATES
forklift-controller-5fd7f96df7-dckj2   2/2     Running   1          46h   10.128.3.141   f02-h17-000-r640.rdu2.scalelab.redhat.com   <none>           <none>
forklift-operator-754dbc46dd-4bvcw     1/1     Running   0          2d    10.128.3.102   f02-h17-000-r640.rdu2.scalelab.redhat.com   <none>           <none>
forklift-ui-f46bbcfd9-tvcr9            1/1     Running   0          2d    10.128.3.105   f02-h17-000-r640.rdu2.scalelab.redhat.com   <none>           <none>
forklift-validation-6687f5954d-rxv2h   1/1     Running   0          2d    10.128.3.104   f02-h17-000-r640.rdu2.scalelab.redhat.com   <none>           <none>



cloud38 
MTV:2.0.0.12
CNV:2.6.2

Comment 17 Fabien Dupont 2021-05-12 16:36:05 UTC

According to the graph you share, it's weird that the container is OOMKilled.

Would you mind running the following command and share its output?

$ oc get -o yaml -n openshift-mtv deployment forklift-controller

Comment 18 David Vaanunu 2021-05-13 06:32:57 UTC

root@f01-h14-000-r640:~$ oc get -o yaml -n openshift-mtv deployment forklift-controller
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: "2021-05-10T10:59:04Z"
  generation: 1
  labels:
    app: forklift
    control-plane: controller-manager
    controller-tools.k8s.io: "1.0"
  managedFields:
  - apiVersion: apps/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .: {}
          f:app: {}
          f:control-plane: {}
          f:controller-tools.k8s.io: {}
        f:ownerReferences:
          .: {}
          k:{"uid":"e0334f7b-0bf5-494b-95b3-215a09222750"}:
            .: {}
            f:apiVersion: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:spec:
        f:progressDeadlineSeconds: {}
        f:replicas: {}
        f:revisionHistoryLimit: {}
        f:selector: {}
        f:strategy:
          f:rollingUpdate:
            .: {}
            f:maxSurge: {}
            f:maxUnavailable: {}
          f:type: {}
        f:template:
          f:metadata:
            f:annotations:
              .: {}
              f:configHash: {}
            f:labels:
              .: {}
              f:app: {}
              f:control-plane: {}
              f:controller-tools.k8s.io: {}
          f:spec:
            f:containers:
              k:{"name":"controller"}:
                .: {}
                f:command: {}
                f:env:
                  .: {}
                  k:{"name":"API_HOST"}:
                    .: {}
                    f:name: {}
                    f:value: {}
                  k:{"name":"API_PORT"}:
                    .: {}
                    f:name: {}
                    f:value: {}
                  k:{"name":"API_TLS_ENABLED"}:
                    .: {}
                    f:name: {}
                    f:value: {}
                  k:{"name":"POD_NAMESPACE"}:
                    .: {}
                    f:name: {}
                    f:valueFrom:
                      .: {}
                      f:fieldRef:
                        .: {}
                        f:apiVersion: {}
                        f:fieldPath: {}
                  k:{"name":"ROLE"}:
                    .: {}
                    f:name: {}
                    f:value: {}
                  k:{"name":"SECRET_NAME"}:
                    .: {}
                    f:name: {}
                    f:value: {}
                f:envFrom: {}
                f:image: {}
                f:imagePullPolicy: {}
                f:name: {}
                f:ports:
                  .: {}
                  k:{"containerPort":9876,"protocol":"TCP"}:
                    .: {}
                    f:containerPort: {}
                    f:name: {}
                    f:protocol: {}
                f:resources:
                  .: {}
                  f:limits:
                    .: {}
                    f:cpu: {}
                    f:memory: {}
                  f:requests:
                    .: {}
                    f:cpu: {}
                    f:memory: {}
                f:terminationMessagePath: {}
                f:terminationMessagePolicy: {}
                f:volumeMounts:
                  .: {}
                  k:{"mountPath":"/tmp/cert"}:
                    .: {}
                    f:mountPath: {}
                    f:name: {}
                    f:readOnly: {}
                  k:{"mountPath":"/var/cache/profiler"}:
                    .: {}
                    f:mountPath: {}
                    f:name: {}
              k:{"name":"inventory"}:
                .: {}
                f:command: {}
                f:env:
                  .: {}
                  k:{"name":"API_PORT"}:
                    .: {}
                    f:name: {}
                    f:value: {}
                  k:{"name":"API_TLS_CERTIFICATE"}:
                    .: {}
                    f:name: {}
                    f:value: {}
                  k:{"name":"API_TLS_ENABLED"}:
                    .: {}
                    f:name: {}
                    f:value: {}
                  k:{"name":"API_TLS_KEY"}:
                    .: {}
                    f:name: {}
                    f:value: {}
                  k:{"name":"METRICS_PORT"}:
                    .: {}
                    f:name: {}
                    f:value: {}
                  k:{"name":"POD_NAMESPACE"}:
                    .: {}
                    f:name: {}
                    f:valueFrom:
                      .: {}
                      f:fieldRef:
                        .: {}
                        f:apiVersion: {}
                        f:fieldPath: {}
                  k:{"name":"POLICY_AGENT_SEARCH_INTERVAL"}:
                    .: {}
                    f:name: {}
                    f:value: {}
                  k:{"name":"POLICY_AGENT_URL"}:
                    .: {}
                    f:name: {}
                    f:value: {}
                  k:{"name":"ROLE"}:
                    .: {}
                    f:name: {}
                    f:value: {}
                  k:{"name":"SECRET_NAME"}:
                    .: {}
                    f:name: {}
                    f:value: {}
                f:envFrom: {}
                f:image: {}
                f:imagePullPolicy: {}
                f:name: {}
                f:ports:
                  .: {}
                  k:{"containerPort":8443,"protocol":"TCP"}:
                    .: {}
                    f:containerPort: {}
                    f:name: {}
                    f:protocol: {}
                f:resources:
                  .: {}
                  f:limits:
                    .: {}
                    f:cpu: {}
                    f:memory: {}
                  f:requests:
                    .: {}
                    f:cpu: {}
                    f:memory: {}
                f:terminationMessagePath: {}
                f:terminationMessagePolicy: {}
                f:volumeMounts:
                  .: {}
                  k:{"mountPath":"/var/cache/inventory"}:
                    .: {}
                    f:mountPath: {}
                    f:name: {}
                  k:{"mountPath":"/var/cache/profiler"}:
                    .: {}
                    f:mountPath: {}
                    f:name: {}
                  k:{"mountPath":"/var/run/secrets/forklift-inventory-serving-cert"}:
                    .: {}
                    f:mountPath: {}
                    f:name: {}
            f:dnsPolicy: {}
            f:restartPolicy: {}
            f:schedulerName: {}
            f:securityContext: {}
            f:serviceAccount: {}
            f:serviceAccountName: {}
            f:terminationGracePeriodSeconds: {}
            f:volumes:
              .: {}
              k:{"name":"cert"}:
                .: {}
                f:name: {}
                f:secret:
                  .: {}
                  f:defaultMode: {}
                  f:secretName: {}
              k:{"name":"forklift-inventory-serving-cert"}:
                .: {}
                f:name: {}
                f:secret:
                  .: {}
                  f:defaultMode: {}
                  f:secretName: {}
              k:{"name":"inventory"}:
                .: {}
                f:emptyDir: {}
                f:name: {}
              k:{"name":"profiler"}:
                .: {}
                f:emptyDir: {}
                f:name: {}
    manager: OpenAPI-Generator
    operation: Update
    time: "2021-05-10T10:59:04Z"
  - apiVersion: apps/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:deployment.kubernetes.io/revision: {}
      f:status:
        f:availableReplicas: {}
        f:conditions:
          .: {}
          k:{"type":"Available"}:
            .: {}
            f:lastTransitionTime: {}
            f:lastUpdateTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
          k:{"type":"Progressing"}:
            .: {}
            f:lastTransitionTime: {}
            f:lastUpdateTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
        f:observedGeneration: {}
        f:readyReplicas: {}
        f:replicas: {}
        f:updatedReplicas: {}
    manager: kube-controller-manager
    operation: Update
    time: "2021-05-13T02:49:29Z"
  name: forklift-controller
  namespace: openshift-mtv
  ownerReferences:
  - apiVersion: forklift.konveyor.io/v1beta1
    kind: ForkliftController
    name: forklift-controller
    uid: e0334f7b-0bf5-494b-95b3-215a09222750
  resourceVersion: "120956429"
  selfLink: /apis/apps/v1/namespaces/openshift-mtv/deployments/forklift-controller
  uid: 2b3b14b0-77ab-44ff-b877-585348f7083d
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: forklift
      control-plane: controller-manager
      controller-tools.k8s.io: "1.0"
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        configHash: /var/cache/inventory
      creationTimestamp: null
      labels:
        app: forklift
        control-plane: controller-manager
        controller-tools.k8s.io: "1.0"
    spec:
      containers:
      - command:
        - /usr/local/bin/manager
        env:
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: ROLE
          value: main
        - name: API_HOST
          value: forklift-inventory.openshift-mtv.svc.cluster.local
        - name: API_PORT
          value: "8443"
        - name: API_TLS_ENABLED
          value: "true"
        - name: SECRET_NAME
          value: webhook-server-secret
        envFrom:
        - configMapRef:
            name: forklift-controller-config
        image: registry.redhat.io/mtv/mtv-controller@sha256:666e415b74f7d93e5b91faba038b191da65619bed3f1ead7ab5fdb56873c61f7
        imagePullPolicy: Always
        name: controller
        ports:
        - containerPort: 9876
          name: webhook-server
          protocol: TCP
        resources:
          limits:
            cpu: 100m
            memory: 800Mi
          requests:
            cpu: 100m
            memory: 350Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /tmp/cert
          name: cert
          readOnly: true
        - mountPath: /var/cache/profiler
          name: profiler
      - command:
        - /usr/local/bin/manager
        env:
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: ROLE
          value: inventory
        - name: SECRET_NAME
          value: webhook-server-secret
        - name: API_PORT
          value: "8443"
        - name: API_TLS_ENABLED
          value: "true"
        - name: API_TLS_CERTIFICATE
          value: /var/run/secrets/forklift-inventory-serving-cert/tls.crt
        - name: API_TLS_KEY
          value: /var/run/secrets/forklift-inventory-serving-cert/tls.key
        - name: METRICS_PORT
          value: "8081"
        - name: POLICY_AGENT_URL
          value: https://forklift-validation.openshift-mtv.svc.cluster.local:8181
        - name: POLICY_AGENT_SEARCH_INTERVAL
          value: "120"
        envFrom:
        - configMapRef:
            name: forklift-controller-config
        image: registry.redhat.io/mtv/mtv-controller@sha256:666e415b74f7d93e5b91faba038b191da65619bed3f1ead7ab5fdb56873c61f7
        imagePullPolicy: Always
        name: inventory
        ports:
        - containerPort: 8443
          name: api
          protocol: TCP
        resources:
          limits:
            cpu: 100m
            memory: 800Mi
          requests:
            cpu: 100m
            memory: 350Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/cache/inventory
          name: inventory
        - mountPath: /var/cache/profiler
          name: profiler
        - mountPath: /var/run/secrets/forklift-inventory-serving-cert
          name: forklift-inventory-serving-cert
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: forklift-controller
      serviceAccountName: forklift-controller
      terminationGracePeriodSeconds: 10
      volumes:
      - name: cert
        secret:
          defaultMode: 420
          secretName: webhook-server-secret
      - name: forklift-inventory-serving-cert
        secret:
          defaultMode: 420
          secretName: forklift-inventory-serving-cert
      - emptyDir: {}
        name: inventory
      - emptyDir: {}
        name: profiler
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2021-05-10T10:59:04Z"
    lastUpdateTime: "2021-05-10T11:00:05Z"
    message: ReplicaSet "forklift-controller-5c745fcf7c" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: "2021-05-13T02:49:29Z"
    lastUpdateTime: "2021-05-13T02:49:29Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1
root@f01-h14-000-r640:~$

Comment 19 Fabien Dupont 2021-05-18 19:56:21 UTC

The fix should be part of build mtv-operator-bundle-container-2.0.0-17 / iib:76027.

The next step will be to fix Open Policy Agent itself, but it's out of scope of this BZ.

Comment 20 Tzahi Ashkenazi 2021-05-24 10:32:52 UTC

verify on cloud38 :
MTV :2.0.0.19
CNV: 2.6.3 

no OOM messages were found during migration and idle state of the pods/nodes  for 22 hours in total , since last MTV/CNV upgrade

Comment 24 errata-xmlrpc 2021-06-10 17:11:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (MTV 2.0.0 images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:2381