Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1817057 - ContainerDisk is sometimes OOMKilled on some systems [NEEDINFO]
Summary: ContainerDisk is sometimes OOMKilled on some systems
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.2.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 2.3.0
Assignee: Petr Kotas
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-25 13:44 UTC by Roman Mohr
Modified: 2020-07-12 11:16 UTC (History)
11 users (show)

Fixed In Version: hco-bundle-registry-container-v2.3.0-70 virt-operator-container-v2.3.0-36
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-04 19:10:58 UTC
Target Upstream Version:
ncredi: needinfo+
ncredi: needinfo? (kbidarka)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 2844 0 None closed Rewrite container-disk main on C to prevent memory overconsumption 2020-11-26 13:15:40 UTC
Github kubevirt kubevirt pull 3204 0 None closed [release-0.26] Rewrite container-disk main on C to prevent memory overconsumption 2020-11-26 13:15:40 UTC
Red Hat Product Errata RHEA-2020:2011 0 None None None 2020-05-04 19:11:10 UTC

Description Roman Mohr 2020-03-25 13:44:17 UTC
Description of problem:

On some systems we see that the containerDisk container gets OOMKilled quite regularly. The reason seems to be connected to dynamic memory management of golang and possibly things like the used kernel version. The golang memory spikes are harmless in general and not big, but they are big enough to sometimes hit the memory limit of 40 M on the containerDisk container.

We had once a case 6 months ago on Azure on kubevirt 0.20, where people reported that issue. There it was solved by bumping the limit to 40MB.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


VMs get on some nodes frequently restarted.

Expected results:

VMs should not be restarted.

Additional info:

It is possible to work around that by using a DataVolume and specify there the containerDisk as the import source. The possible disadvantages are more storage and bandwidth consumption on the distributed storage for ephemeral data which the VMs normally don't have to keep after restarts. The storage usage can be reduced by putting the resulting PVC as an ephemeral volume on the VM, so that one PVC can be used for multiple VMs, but it still puts more pressure on the distributed storage than the containerDisk.

We are proposing https://github.com/kubevirt/kubevirt/pull/2844 to fix this in kubevirt. We basically rewrite the containerDisk binary in C to have more guarantees regarding to the memory consumption. We could also increase the memory limit, but this would have significant impact on the ram usage.

Comment 6 Dana Safford 2020-04-02 18:57:16 UTC
As this is becoming important, I raised the Customer Escalation Flag.

Comment 7 Kedar Bidarkar 2020-04-06 14:46:24 UTC

Tested this, monitored for almost 72hrs and seen no memory increase with containerDisk container.



[root@cnvqe-01 ~]# oc get nodes -o wide 
NAME                                      STATUS   ROLES    AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                CONTAINER-RUNTIME
master-0.testing.redhat.com   Ready    master   8d    v1.17.1   10.46.8.33    <none>        Red Hat Enterprise Linux CoreOS 44.81.202003230949-0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.17.0-9.dev.rhaos4.4.gitdfc8414.el8
master-1.testing.redhat.com   Ready    master   8d    v1.17.1   10.46.8.36    <none>        Red Hat Enterprise Linux CoreOS 44.81.202003230949-0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.17.0-9.dev.rhaos4.4.gitdfc8414.el8
master-2.testing.redhat.com   Ready    master   8d    v1.17.1   10.46.8.37    <none>        Red Hat Enterprise Linux CoreOS 44.81.202003230949-0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.17.0-9.dev.rhaos4.4.gitdfc8414.el8
worker-0.testing.redhat.com   Ready    worker   8d    v1.17.1   10.46.8.38    <none>        Red Hat Enterprise Linux CoreOS 44.81.202003230949-0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.17.0-9.dev.rhaos4.4.gitdfc8414.el8
worker-1.testing.redhat.com   Ready    worker   8d    v1.17.1   10.46.8.39    <none>        Red Hat Enterprise Linux CoreOS 44.81.202003230949-0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.17.0-9.dev.rhaos4.4.gitdfc8414.el8
[root@cnvqe-01 ~]# kubectl top pod --containers --namespace=default
POD                                 NAME                  CPU(cores)   MEMORY(bytes)   
virt-launcher-rhel78-hpp-vm-zh2c9   compute               20m          863Mi           
virt-launcher-rhel78-vm-rb9r8       compute               21m          884Mi           
virt-launcher-vm-fedora-rmm4k       compute               23m          1107Mi          
virt-launcher-vm-fedora-rmm4k       volumecontainerdisk   7m           22Mi            
virt-launcher-vm-fedora1-8mxz6      volumecontainerdisk   7m           22Mi            
virt-launcher-vm-fedora1-8mxz6      compute               22m          1106Mi          
[root@cnvqe-01 ~]# oc get vmi
NAME            AGE     PHASE     IP                NODENAME
rhel78-hpp-vm   5d      Running   10.0.2.2/24       worker-0.testing.redhat.com
rhel78-vm       5d      Running   10.0.2.2/24       worker-0.testing.redhat.com
vm-fedora       5d18h   Running   10.128.2.102/23   worker-1.testing.redhat.com
vm-fedora1      5d18h   Running   10.128.2.101/23   worker-1.testing.redhat.com
[root@cnvqe-01 ~]# oc get pods
NAME                                READY   STATUS    RESTARTS   AGE
virt-launcher-rhel78-hpp-vm-zh2c9   1/1     Running   0          5d
virt-launcher-rhel78-vm-rb9r8       1/1     Running   0          5d
virt-launcher-vm-fedora-rmm4k       2/2     Running   0          5d18h
virt-launcher-vm-fedora1-8mxz6      2/2     Running   0          5d18h


Will be moving this to VERIFIED state now.

Comment 10 errata-xmlrpc 2020-05-04 19:10:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2011


Note You need to log in before you can comment on or make changes to this bug.