Bug 1885418 - [openshift-cnv] issues with memory overhead calculation when limits are used
Summary: [openshift-cnv] issues with memory overhead calculation when limits are used
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 2.6.0
Assignee: Vladik Romanovsky
QA Contact: vsibirsk
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-05 21:16 UTC by Andreas Karis
Modified: 2024-03-25 16:38 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-10 11:18:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:0799 0 None None None 2021-03-10 11:19:54 UTC

Description Andreas Karis 2020-10-05 21:16:24 UTC
Description of problem:

CNV does not calculate a sufficient overhead for qemu-kvm to avoid OOM situations when memory limits are set. 

When providing limit and request values, CNV will set the qemu-kvm proesses memory according to the limit request + overhead calculation instead of according to the request (as I think it should be). This will lead to an OOM situation in my tests, too.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Andreas Karis 2020-10-05 21:17:00 UTC
In the following tests, I use this YAML as a baseline:
~~~
[root@openshift-jumpserver-0 ~]# cat fedora-vm.yaml 
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  annotations:
    kubevirt.io/latest-observed-api-version: v1alpha3
    kubevirt.io/storage-observed-api-version: v1alpha3
    name.os.template.kubevirt.io/fedora31: Fedora 31
  labels:
    app: fedora-test-vm
    flavor.template.kubevirt.io/Custom: "true"
    os.template.kubevirt.io/fedora31: "true"
    vm.kubevirt.io/template.namespace: openshift
    vm.kubevirt.io/template.revision: "1"
    vm.kubevirt.io/template.version: v0.9.1
    workload.template.kubevirt.io/server: "true"
  name: fedora-test-vm
  namespace: test
spec:
  running: true
  template:
    metadata:
      labels:
        flavor.template.kubevirt.io/Custom: "true"
        kubevirt.io/domain: fedora-test-vm
        kubevirt.io/size: large
        os.template.kubevirt.io/fedora31: "true"
        vm.kubevirt.io/name: fedora-test-vm
        workload.template.kubevirt.io/server: "true"
    spec:
      domain:
        cpu:
          cores: 3
          model: host-model
        devices:
          disks:
          - bootOrder: 1
            disk:
              bus: virtio
            name: rootdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
          interfaces:
          - masquerade: {}
            model: virtio
            name: nic0
          networkInterfaceMultiqueue: true
          rng: {}
        features:
          acpi:
            enabled: true
        machine:
          type: q35
        resources:
          requests:
            cpu: "3"
            memory: 8192Mi
      evictionStrategy: LiveMigrate
      hostname: fedora-test-vm
      networks:
      - name: nic0
        pod: {}
      terminationGracePeriodSeconds: 0
      volumes:
      - containerDisk:
          image: docker.io/kubevirt/fedora-cloud-container-disk-demo
        name: rootdisk
      - cloudInitNoCloud:
          userData: |
            #cloud-config
            name: default
            hostname: fedora-test-vm
            ssh_authorized_keys:
              - >-
                ssh-rsa
                AAAAB3NzaC(...)hT/a4jl49IU=
        name: cloudinitdisk
~~~

I then expose the VM's SSH port so that I can SSH into the VM from openshift-worker-1. This also allows me to run an SCP stress test.
~~~
virtctl expose  vmi fedora-test-vm --port=20022 --target-port=22 --name=fedora-test-vm-ssh
~~~

I then stress test the VM as follows.

All but openshift-worker-1 are tainted, so the VM will only be scheduled there.
Only one VM exists at a given time on openshift-worker-1.

I stress the VM memory with stress-ng inside VM:
~~~
ssh -i gss-stack-tools -p 20022 fedora.54.202
yum install stress-ng -y
stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1
~~~

Network load - from hypervisor run:
~~~
dd if=/dev/zero of=test.txt bs=1G count=1
sudo ip route add 172.30.54.202 via 172.24.2.1 dev ovn-k8s-mp0 src 172.24.2.2  mtu 1400 # my virtual network uses jumbo frame and there is an unrelated issue with jumbo frame MTU so this route is a workaround
while true ; do scp -i gss-stack-tools -P 20022  test.txt  fedora.54.202: ; done
~~~ 

I look for memory consumption and OOM errors with:
~~~
journalctl -f | grep -i oomkilled
~~~

And:
~~~
cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do  date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit'  ; sleep 2 ; done
~~~

Comment 2 Andreas Karis 2020-10-05 21:17:21 UTC
Example for how the OOM issue cannot be reproduced.

Only requests [0]:

~~~
        resources:
          requests:
            cpu: "3"
            memory: 8192Mi
~~~

Resulting pod:
~~~
[root@openshift-jumpserver-0 ~]# oc describe pod  | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:'
    Limits:
      memory:  40M
    Requests:
      memory:     1M
Containers:
  compute:
    Limits:
    Requests:
      memory:                         8579144Ki
  volumerootdisk:
    Limits:
      memory:  40M
    Requests:
      memory:     1M
~~~

Resulting qemu-kvm:
~~~
[root@openshift-worker-1 ~]# pgrep qemu-kvm | while read pid ; do echo === $pid === ; ps aux | grep $pid |  grep -o '\-m [0-9]*' ; pmap $pid | tail ; done
=== 1106212 ===
-m 8192
00007f87348fd000     12K rw-s-   [ anon ]
00007f8734900000      4K rw-s- zero (deleted)
00007f8734901000      4K r---- ld-2.28.so
00007f8734902000      4K rw--- ld-2.28.so
00007f8734903000      4K rw---   [ anon ]
00007fff9c57b000    132K rw---   [ stack ]
00007fff9c5ba000     12K r----   [ anon ]
00007fff9c5bd000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]
 total         10523172K
~~~

Resulting limits = entire system memory:
~~~
[root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do  date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit'  ; sleep 2 ; done
Mon Oct  5 20:34:53 UTC 2020
hierarchical_memory_limit 134013546496
total_rss 786513920
total_rss_huge 721420288
~~~

~~~
[root@openshift-worker-1 ~]# echo "134013546496 / 1024 / 1024" | bc
127805
[root@openshift-worker-1 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:         128829        5463      115112          14        8253      122976
Swap:             0           0           0
~~~

Stress-ng inside VM:
~~~
ssh -i gss-stack-tools -p 20022 fedora.54.202
yum install stress-ng -y
stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1
~~~

Network load - from hypervisor run:
~~~
dd if=/dev/zero of=test.txt bs=1G count=1
sudo ip route add 172.30.54.202 via 172.24.2.1 dev ovn-k8s-mp0 src 172.24.2.2  mtu 1400
while true ; do scp -i gss-stack-tools -P 20022  test.txt  fedora.54.202: ; done
~~~

Issue could not be reproduced.

===================================

[0]
~~~
[root@openshift-jumpserver-0 ~]# cat fedora-vm.yaml 
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  annotations:
    kubevirt.io/latest-observed-api-version: v1alpha3
    kubevirt.io/storage-observed-api-version: v1alpha3
    name.os.template.kubevirt.io/fedora31: Fedora 31
  labels:
    app: fedora-test-vm
    flavor.template.kubevirt.io/Custom: "true"
    os.template.kubevirt.io/fedora31: "true"
    vm.kubevirt.io/template.namespace: openshift
    vm.kubevirt.io/template.revision: "1"
    vm.kubevirt.io/template.version: v0.9.1
    workload.template.kubevirt.io/server: "true"
  name: fedora-test-vm
  namespace: test
spec:
  running: true
  template:
    metadata:
      labels:
        flavor.template.kubevirt.io/Custom: "true"
        kubevirt.io/domain: fedora-test-vm
        kubevirt.io/size: large
        os.template.kubevirt.io/fedora31: "true"
        vm.kubevirt.io/name: fedora-test-vm
        workload.template.kubevirt.io/server: "true"
    spec:
      domain:
        cpu:
          cores: 3
          model: host-model
        devices:
          disks:
          - bootOrder: 1
            disk:
              bus: virtio
            name: rootdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
          interfaces:
          - masquerade: {}
            model: virtio
            name: nic0
          networkInterfaceMultiqueue: true
          rng: {}
        features:
          acpi:
            enabled: true
        machine:
          type: q35
        resources:
          requests:
            cpu: "3"
            memory: 8192Mi
      evictionStrategy: LiveMigrate
      hostname: fedora-test-vm
      networks:
      - name: nic0
        pod: {}
      terminationGracePeriodSeconds: 0
      volumes:
      - containerDisk:
          image: docker.io/kubevirt/fedora-cloud-container-disk-demo
        name: rootdisk
      - cloudInitNoCloud:
          userData: |
            #cloud-config
            name: default
            hostname: fedora-test-vm
            ssh_authorized_keys:
              - >-
                ssh-rsa
                AAAA(...)l49IU=
        name: cloudinitdisk
~~~

Comment 3 Andreas Karis 2020-10-05 21:17:34 UTC
Request != Limit with nothing else [0]. OOM can be reproduced. I think this here might be a bug, as the qemu-kvm memory is set according to limits when as a user I'd expect it to be set according to requests.
~~~
        resources:
          requests:
            cpu: "3"
            memory: 8192Mi
          limits:
            cpu: "3"
            memory: 8704Mi
~~~

Pod description:
~~~
[root@openshift-jumpserver-0 ~]# oc describe pod  | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:'
    Limits:
      memory:  40M
    Requests:
      memory:     1M
Containers:
  compute:
    Limits:
      memory:                         9103432Ki
    Requests:
      memory:                         8579144Ki
  volumerootdisk:
    Limits:
      memory:  40M
    Requests:
      memory:     1M
~~~

Resulting limits:
~~~
[root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do  date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit'  ; sleep 2 ; done
Mon Oct  5 20:02:03 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 788336640
total_rss_huge 723517440
~~~

Resutling qemu-kvm memory configuration:
~~~
[root@openshift-worker-1 ~]# pgrep qemu-kvm | while read pid ; do echo === $pid === ; ps aux | grep $pid |  grep -o '\-m [0-9]*' ; pmap $pid | tail ; done
=== 1030316 ===
-m 8704
00007f3726efb000     12K rw-s-   [ anon ]
00007f3726efe000      4K rw-s- zero (deleted)
00007f3726eff000      4K r---- ld-2.28.so
00007f3726f00000      4K rw--- ld-2.28.so
00007f3726f01000      4K rw---   [ anon ]
00007ffe6a689000    132K rw---   [ stack ]
00007ffe6a7f2000     12K r----   [ anon ]
00007ffe6a7f5000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]
 total         13949548K
~~~

We can see that libvirt memory is set according to limits + some overhead:
~~~
[root@openshift-worker-1 ~]# echo "9321914368 / 1024 / 1024" | bc
8890
~~~

Stress-ng inside VM:
~~~
ssh -i gss-stack-tools -p 20022 fedora.54.202
yum install stress-ng -y
stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1
~~~

Network load - from hypervisor run:
~~~
dd if=/dev/zero of=test.txt bs=1G count=1
sudo ip route add 172.30.54.202 via 172.24.2.1 dev ovn-k8s-mp0 src 172.24.2.2  mtu 1400
while true ; do scp -i gss-stack-tools -P 20022  test.txt  fedora.54.202: ; done
~~~

Reproduce the OOM killer issue:
~~~
[root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do  date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit'  ; sleep 2 ; done
Mon Oct  5 20:28:43 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 4644237312
total_rss_huge 4559208448
Mon Oct  5 20:28:45 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 6227689472
total_rss_huge 6142558208
Mon Oct  5 20:28:47 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 7760707584
total_rss_huge 7675576320
Mon Oct  5 20:28:49 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9149022208
total_rss_huge 9063890944
Mon Oct  5 20:28:51 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9224306688
total_rss_huge 9143582720
Mon Oct  5 20:28:53 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9224441856
total_rss_huge 9143582720
Mon Oct  5 20:28:55 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9226633216
total_rss_huge 9145679872
Mon Oct  5 20:28:57 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9228836864
total_rss_huge 9147777024
Mon Oct  5 20:28:59 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9228959744
total_rss_huge 9147777024
Mon Oct  5 20:29:01 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9228857344
total_rss_huge 9147777024
Mon Oct  5 20:29:03 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9228865536
total_rss_huge 9147777024
Mon Oct  5 20:29:05 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9228865536
total_rss_huge 9147777024
Mon Oct  5 20:29:07 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9228988416
total_rss_huge 9147777024
Mon Oct  5 20:29:09 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9228988416
total_rss_huge 9147777024
Mon Oct  5 20:29:11 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9228988416
total_rss_huge 9147777024
Mon Oct  5 20:29:13 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9228988416
total_rss_huge 9147777024
Mon Oct  5 20:29:15 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229012992
total_rss_huge 9147777024
Mon Oct  5 20:29:18 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229082624
total_rss_huge 9147777024
Mon Oct  5 20:29:20 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229090816
total_rss_huge 9147777024
Mon Oct  5 20:29:22 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229090816
total_rss_huge 9147777024
Mon Oct  5 20:29:24 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229094912
total_rss_huge 9147777024
Mon Oct  5 20:29:26 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229094912
total_rss_huge 9147777024
Mon Oct  5 20:29:28 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229094912
total_rss_huge 9147777024
Mon Oct  5 20:29:30 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9227079680
total_rss_huge 9145679872
Mon Oct  5 20:29:32 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9227091968
total_rss_huge 9145679872
Mon Oct  5 20:29:34 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9227100160
total_rss_huge 9145679872
Mon Oct  5 20:29:36 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9227100160
total_rss_huge 9145679872
Mon Oct  5 20:29:38 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9227100160
total_rss_huge 9145679872
Mon Oct  5 20:29:40 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9227100160
total_rss_huge 9145679872
Mon Oct  5 20:29:42 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9227100160
total_rss_huge 9145679872
Mon Oct  5 20:29:44 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229205504
total_rss_huge 9147777024
Mon Oct  5 20:29:46 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229205504
total_rss_huge 9147777024
Mon Oct  5 20:29:48 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229205504
total_rss_huge 9147777024
Mon Oct  5 20:29:50 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229205504
total_rss_huge 9147777024
Mon Oct  5 20:29:52 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229221888
total_rss_huge 9147777024
Mon Oct  5 20:29:54 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229221888
total_rss_huge 9147777024
Mon Oct  5 20:29:56 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229221888
total_rss_huge 9147777024
Mon Oct  5 20:29:58 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229221888
total_rss_huge 9147777024
Mon Oct  5 20:30:00 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229221888
total_rss_huge 9147777024
Mon Oct  5 20:30:02 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229221888
total_rss_huge 9147777024
Mon Oct  5 20:30:04 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229221888
total_rss_huge 9147777024
Mon Oct  5 20:30:06 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229221888
total_rss_huge 9147777024
Mon Oct  5 20:30:08 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229221888
total_rss_huge 9147777024
Mon Oct  5 20:30:10 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229221888
total_rss_huge 9147777024
Mon Oct  5 20:30:12 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9229221888
total_rss_huge 9147777024
Mon Oct  5 20:30:14 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9227591680
total_rss_huge 9149874176
Mon Oct  5 20:30:16 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 9227591680
total_rss_huge 9149874176
Mon Oct  5 20:30:18 UTC 2020
hierarchical_memory_limit 9321914368
total_rss 86454272
total_rss_huge 39845888
Mon Oct  5 20:30:20 UTC 2020
cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0a6d58f0_176c_4d7b_81f4_87d5f2cf46f3.slice/crio-a4b729c945d076cd8cc312cdc46c0df732821ddda1db909cb5fdce7f5493e620.scope/memory.stat: No such file or directory
Mon Oct  5 20:30:22 UTC 2020
cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0a6d58f0_176c_4d7b_81f4_87d5f2cf46f3.slice/crio-a4b729c945d076cd8cc312cdc46c0df732821ddda1db909cb5fdce7f5493e620.scope/memory.stat: No such file or directory
~~~

~~~
[root@fedora-test-vm ~]# stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1
stress-ng: debug: [9883] 3 processors online, 3 processors configured
stress-ng: info:  [9883] dispatching hogs: 1 vm, 1 hdd, 1 io
stress-ng: debug: [9883] cache allocate: default cache size: 16384K
stress-ng: debug: [9883] starting stressors
stress-ng: debug: [9883] 3 stressors spawned
stress-ng: debug: [9885] stress-ng-hdd: started [9885] (instance 0)
stress-ng: debug: [9886] stress-ng-io: started [9886] (instance 0)
stress-ng: debug: [9884] stress-ng-vm: started [9884] (instance 0)
stress-ng: debug: [9884] stress-ng-vm using method 'all'


~~~

~~
[core@openshift-worker-1 ~]$ while true ; do scp -i gss-stack-tools -P 20022  test.txt  fedora.54.202: ; done
test.txt                                                                           100% 1024MB 231.2MB/s   00:04    
test.txt                                                                           100% 1024MB 259.3MB/s   00:03    
test.txt                                                                           100% 1024MB 205.7MB/s   00:04    
test.txt                                                                           100% 1024MB 250.8MB/s   00:04    
test.txt                                                                           100% 1024MB 260.7MB/s   00:03   ~~
~~~

message from the journal:
~~~
[root@openshift-worker-1 ~]# journalctl -f | grep -i oomkilled


Oct 05 20:30:19 openshift-worker-1.example.com hyperkube[2223]: I1005 20:30:19.455745    2223 status_manager.go:570] Patch status for pod "virt-launcher-fedora-test-vm-q625c_test(0a6d58f0-176c-4d7b-81f4-87d5f2cf46f3)" with "{\"metadata\":{\"uid\":\"0a6d58f0-176c-4d7b-81f4-87d5f2cf46f3\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"reason\":\"PodCompleted\",\"type\":\"Initialized\"},{\"lastTransitionTime\":\"2020-10-05T20:30:19Z\",\"reason\":\"PodCompleted\",\"status\":\"False\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-10-05T20:30:19Z\",\"reason\":\"PodCompleted\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://a4b729c945d076cd8cc312cdc46c0df732821ddda1db909cb5fdce7f5493e620\",\"image\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"imageID\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"lastState\":{},\"name\":\"compute\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://a4b729c945d076cd8cc312cdc46c0df732821ddda1db909cb5fdce7f5493e620\",\"exitCode\":0,\"finishedAt\":\"2020-10-05T20:30:18Z\",\"reason\":\"OOMKilled\",\"startedAt\":\"2020-10-05T20:00:07Z\"}}},{\"containerID\":\"cri-o://d4c1b2dd7ea7e089eff7615a30a140fa90c8acfa31b8db7b62b649012c733ca1\",\"image\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo:latest\",\"imageID\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331\",\"lastState\":{},\"name\":\"volumerootdisk\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://d4c1b2dd7ea7e089eff7615a30a140fa90c8acfa31b8db7b62b649012c733ca1\",\"exitCode\":0,\"finishedAt\":\"2020-10-05T20:30:17Z\",\"reason\":\"Completed\",\"startedAt\":\"2020-10-05T20:00:07Z\"}}}],\"phase\":\"Succeeded\"}}"
Oct 05 20:30:19 openshift-worker-1.example.com hyperkube[2223]: I1005 20:30:19.455810    2223 status_manager.go:578] Status for pod "virt-launcher-fedora-test-vm-q625c_test(0a6d58f0-176c-4d7b-81f4-87d5f2cf46f3)" updated successfully: (5, {Phase:Succeeded Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 20:00:07 +0000 UTC Reason:PodCompleted Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 20:30:19 +0000 UTC Reason:PodCompleted Message:} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 20:30:19 +0000 UTC Reason:PodCompleted Message:} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 20:04:16 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:192.168.123.221 PodIP:172.24.2.17 PodIPs:[{IP:172.24.2.17}] StartTime:2020-10-05 20:00:04 +0000 UTC InitContainerStatuses:[{Name:container-disk-binary State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-05 20:00:06 +0000 UTC,FinishedAt:2020-10-05 20:00:06 +0000 UTC,ContainerID:cri-o://1f77c1391a2f82692015bc07cf0ded891abbd7566c1e35c03cf2b1d401124101,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://1f77c1391a2f82692015bc07cf0ded891abbd7566c1e35c03cf2b1d401124101 Started:<nil>}] ContainerStatuses:[{Name:compute State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:OOMKilled,Message:,StartedAt:2020-10-05 20:00:07 +0000 UTC,FinishedAt:2020-10-05 20:30:18 +0000 UTC,ContainerID:cri-o://a4b729c945d076cd8cc312cdc46c0df732821ddda1db909cb5fdce7f5493e620,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://a4b729c945d076cd8cc312cdc46c0df732821ddda1db909cb5fdce7f5493e620 Started:0xc00331fe72} {Name:volumerootdisk State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-05 20:00:07 +0000 UTC,FinishedAt:2020-10-05 20:30:17 +0000 UTC,ContainerID:cri-o://d4c1b2dd7ea7e089eff7615a30a140fa90c8acfa31b8db7b62b649012c733ca1,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://d4c1b2dd7ea7e089eff7615a30a140fa90c8acfa31b8db7b62b649012c733ca1 Started:0xc00331fe73}] QOSClass:Burstable EphemeralContainerStatuses:[]})
Oct 05 20:30:19 openshift-worker-1.example.com hyperkube[2223]: I1005 20:30:19.455945    2223 kubelet_pods.go:961] Pod "virt-launcher-fedora-test-vm-q625c_test(0a6d58f0-176c-4d7b-81f4-87d5f2cf46f3)" is terminated, but some containers have not been cleaned up: {ID:{Type:cri-o ID:d4c1b2dd7ea7e089eff7615a30a140fa90c8acfa31b8db7b62b649012c733ca1} Name:volumerootdisk State:exited CreatedAt:2020-10-05 20:00:07.924820053 +0000 UTC StartedAt:2020-10-05 20:00:07.972630569 +0000 UTC FinishedAt:2020-10-05 20:30:17.989063778 +0000 UTC ExitCode:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 Hash:174137120 RestartCount:0 Reason:Completed Message:} {ID:{Type:cri-o ID:a4b729c945d076cd8cc312cdc46c0df732821ddda1db909cb5fdce7f5493e620} Name:compute State:exited CreatedAt:2020-10-05 20:00:07.178925816 +0000 UTC StartedAt:2020-10-05 20:00:07.21124455 +0000 UTC FinishedAt:2020-10-05 20:30:18.525063787 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:4137454197 RestartCount:0 Reason:OOMKilled Message:} {ID:{Type:cri-o ID:1f77c1391a2f82692015bc07cf0ded891abbd7566c1e35c03cf2b1d401124101} Name:container-disk-binary State:exited CreatedAt:2020-10-05 20:00:06.936000415 +0000 UTC StartedAt:2020-10-05 20:00:06.972586276 +0000 UTC FinishedAt:2020-10-05 20:00:06.989032504 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:1161206034 RestartCount:0 Reason:Completed Message:}
Oct 05 20:30:19 openshift-worker-1.example.com hyperkube[2223]: I1005 20:30:19.461492    2223 kubelet_pods.go:961] Pod "virt-launcher-fedora-test-vm-q625c_test(0a6d58f0-176c-4d7b-81f4-87d5f2cf46f3)" is terminated, but some containers have not been cleaned up: {ID:{Type:cri-o ID:d4c1b2dd7ea7e089eff7615a30a140fa90c8acfa31b8db7b62b649012c733ca1} Name:volumerootdisk State:exited CreatedAt:2020-10-05 20:00:07.924820053 +0000 UTC StartedAt:2020-10-05 20:00:07.972630569 +0000 UTC FinishedAt:2020-10-05 20:30:17.989063778 +0000 UTC ExitCode:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 Hash:174137120 RestartCount:0 Reason:Completed Message:} {ID:{Type:cri-o ID:a4b729c945d076cd8cc312cdc46c0df732821ddda1db909cb5fdce7f5493e620} Name:compute State:exited CreatedAt:2020-10-05 20:00:07.178925816 +0000 UTC StartedAt:2020-10-05 20:00:07.21124455 +0000 UTC FinishedAt:2020-10-05 20:30:18.525063787 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:4137454197 RestartCount:0 Reason:OOMKilled Message:} {ID:{Type:cri-o ID:1f77c1391a2f82692015bc07cf0ded891abbd7566c1e35c03cf2b1d401124101} Name:container-disk-binary State:exited CreatedAt:2020-10-05 20:00:06.936000415 +0000 UTC StartedAt:2020-10-05 20:00:06.972586276 +0000 UTC FinishedAt:2020-10-05 20:00:06.989032504 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:1161206034 RestartCount:0 Reason:Completed Message:}
Oct 05 20:30:19 openshift-worker-1.example.com hyperkube[2223]: I1005 20:30:19.466722    2223 kubelet_pods.go:961] Pod "virt-launcher-fedora-test-vm-q625c_test(0a6d58f0-176c-4d7b-81f4-87d5f2cf46f3)" is terminated, but some containers have not been cleaned up: {ID:{Type:cri-o ID:d4c1b2dd7ea7e089eff7615a30a140fa90c8acfa31b8db7b62b649012c733ca1} Name:volumerootdisk State:exited CreatedAt:2020-10-05 20:00:07.924820053 +0000 UTC StartedAt:2020-10-05 20:00:07.972630569 +0000 UTC FinishedAt:2020-10-05 20:30:17.989063778 +0000 UTC ExitCode:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 Hash:174137120 RestartCount:0 Reason:Completed Message:} {ID:{Type:cri-o ID:a4b729c945d076cd8cc312cdc46c0df732821ddda1db909cb5fdce7f5493e620} Name:compute State:exited CreatedAt:2020-10-05 20:00:07.178925816 +0000 UTC StartedAt:2020-10-05 20:00:07.21124455 +0000 UTC FinishedAt:2020-10-05 20:30:18.525063787 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:4137454197 RestartCount:0 Reason:OOMKilled Message:} {ID:{Type:cri-o ID:1f77c1391a2f82692015bc07cf0ded891abbd7566c1e35c03cf2b1d401124101} Name:container-disk-binary State:exited CreatedAt:2020-10-05 20:00:06.936000415 +0000 UTC StartedAt:2020-10-05 20:00:06.972586276 +0000 UTC FinishedAt:2020-10-05 20:00:06.989032504 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:1161206034 RestartCount:0 Reason:Completed Message:}
^C
~~~





====================

[0]
~~~
[root@openshift-jumpserver-0 ~]# cat fedora-vm.yaml 
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  annotations:
    kubevirt.io/latest-observed-api-version: v1alpha3
    kubevirt.io/storage-observed-api-version: v1alpha3
    name.os.template.kubevirt.io/fedora31: Fedora 31
  labels:
    app: fedora-test-vm
    flavor.template.kubevirt.io/Custom: "true"
    os.template.kubevirt.io/fedora31: "true"
    vm.kubevirt.io/template.namespace: openshift
    vm.kubevirt.io/template.revision: "1"
    vm.kubevirt.io/template.version: v0.9.1
    workload.template.kubevirt.io/server: "true"
  name: fedora-test-vm
  namespace: test
spec:
  running: true
  template:
    metadata:
      labels:
        flavor.template.kubevirt.io/Custom: "true"
        kubevirt.io/domain: fedora-test-vm
        kubevirt.io/size: large
        os.template.kubevirt.io/fedora31: "true"
        vm.kubevirt.io/name: fedora-test-vm
        workload.template.kubevirt.io/server: "true"
    spec:
      domain:
        cpu:
          cores: 3
          model: host-model
        devices:
          disks:
          - bootOrder: 1
            disk:
              bus: virtio
            name: rootdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
          interfaces:
          - masquerade: {}
            model: virtio
            name: nic0
          networkInterfaceMultiqueue: true
          rng: {}
        features:
          acpi:
            enabled: true
        machine:
          type: q35
        resources:
          requests:
            cpu: "3"
            memory: 8192Mi
          limits:
            cpu: "3"
            memory: 8704Mi
      evictionStrategy: LiveMigrate
      hostname: fedora-test-vm
      networks:
      - name: nic0
        pod: {}
      terminationGracePeriodSeconds: 0
      volumes:
      - containerDisk:
          image: docker.io/kubevirt/fedora-cloud-container-disk-demo
        name: rootdisk
      - cloudInitNoCloud:
          userData: |
            #cloud-config
            name: default
            hostname: fedora-test-vm
            ssh_authorized_keys:
              - >-
                ssh-rsa
                AAAAB(...)hT/a4jl49IU=
        name: cloudinitdisk
~~~

Comment 4 Andreas Karis 2020-10-05 21:17:48 UTC
Testing with limit = request, without anything else - reproduces the issue, too [0]
~~~
        resources:
          limits:
            cpu: "3"
            memory: 8Gi
          requests:
            cpu: "3"
            memory: 8Gi
~
~~~

Pod description:
~~~
[root@openshift-jumpserver-0 ~]# oc describe pod  | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:'
    Limits:
      memory:  40M
    Requests:
      memory:     40M
Containers:
  compute:
    Limits:
      memory:                         8579144Ki
    Requests:
      memory:                         8579144Ki
  volumerootdisk:
    Limits:
      memory:  40M
    Requests:
      memory:     40M
~~~

qemu-kvm process memory:
~~~
[root@openshift-worker-1 ~]# pgrep qemu-kvm | while read pid ; do echo === $pid === ; ps aux | grep $pid |  grep -o '\-m [0-9]*' ; pmap $pid | tail ; done
=== 935914 ===
-m 8192
00007fd1f5b9a000     12K rw-s-   [ anon ]
00007fd1f5b9d000      4K rw-s- zero (deleted)
00007fd1f5b9e000      4K r---- ld-2.28.so
00007fd1f5b9f000      4K rw--- ld-2.28.so
00007fd1f5ba0000      4K rw---   [ anon ]
00007fffd8274000    132K rw---   [ stack ]
00007fffd83b3000     12K r----   [ anon ]
00007fffd83b6000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]
 total         10582228K
~~~

OOM easily reproduced:
~~~
[root@fedora-test-vm ~]# stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1
stress-ng: debug: [9377] 3 processors online, 3 processors configured
stress-ng: info:  [9377] dispatching hogs: 1 vm, 1 hdd, 1 io
stress-ng: debug: [9377] cache allocate: default cache size: 16384K
stress-ng: debug: [9377] starting stressors
stress-ng: debug: [9377] 3 stressors spawned
stress-ng: debug: [9378] stress-ng-vm: started [9378] (instance 0)
stress-ng: debug: [9379] stress-ng-hdd: started [9379] (instance 0)
stress-ng: debug: [9380] stress-ng-io: started [9380] (instance 0)
stress-ng: debug: [9378] stress-ng-vm using method 'all'
client_loop: send disconnect: Broken pipe
[core@openshift-worker-1 ~]$ 
~~~

~~~
[core@openshift-worker-1 ~]$ scp -i gss-stack-tools -P 20022  test.txt  fedora.54.202:
test.txt                                                                           100% 1024MB 222.2MB/s   00:04    
[core@openshift-worker-1 ~]$ while true ; do scp -i gss-stack-tools -P 20022  test.txt  fedora.54.202: ; done
test.txt                                                                           100% 1024MB 211.0MB/s   00:04    
test.txt                                                                           100% 1024MB 129.0MB/s   00:07    
test.txt                                                                           100% 1024MB 147.7MB/s   00:06
~~~

~~~
[root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do  date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit'  ; sleep 2 ; done
Mon Oct  5 19:39:31 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 5333508096
total_rss_huge 5257560064
Mon Oct  5 19:39:33 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 6784737280
total_rss_huge 6708789248
Mon Oct  5 19:39:35 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8141832192
total_rss_huge 8065646592
Mon Oct  5 19:39:37 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693645312
total_rss_huge 8617197568
Mon Oct  5 19:39:39 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692658176
total_rss_huge 8619294720
Mon Oct  5 19:39:41 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692658176
total_rss_huge 8619294720
Mon Oct  5 19:39:43 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692682752
total_rss_huge 8619294720
Mon Oct  5 19:39:45 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8694796288
total_rss_huge 8621391872
Mon Oct  5 19:39:47 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8694796288
total_rss_huge 8621391872
Mon Oct  5 19:39:49 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8695336960
total_rss_huge 8621391872
Mon Oct  5 19:39:51 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693264384
total_rss_huge 8619294720
Mon Oct  5 19:39:53 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693284864
total_rss_huge 8619294720
Mon Oct  5 19:39:55 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693329920
total_rss_huge 8619294720
Mon Oct  5 19:39:57 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693374976
total_rss_huge 8619294720
Mon Oct  5 19:39:59 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693411840
total_rss_huge 8619294720
Mon Oct  5 19:40:01 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693411840
total_rss_huge 8619294720
Mon Oct  5 19:40:03 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693411840
total_rss_huge 8619294720
Mon Oct  5 19:40:05 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693411840
total_rss_huge 8615100416
Mon Oct  5 19:40:07 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691314688
total_rss_huge 8608808960
Mon Oct  5 19:40:09 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691445760
total_rss_huge 8608808960
Mon Oct  5 19:40:11 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691490816
total_rss_huge 8608808960
Mon Oct  5 19:40:13 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691490816
total_rss_huge 8608808960
Mon Oct  5 19:40:16 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691527680
total_rss_huge 8608808960
Mon Oct  5 19:40:18 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691539968
total_rss_huge 8608808960
Mon Oct  5 19:40:20 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691539968
total_rss_huge 8608808960
Mon Oct  5 19:40:22 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691539968
total_rss_huge 8608808960
Mon Oct  5 19:40:24 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691539968
total_rss_huge 8608808960
Mon Oct  5 19:40:26 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691539968
total_rss_huge 8608808960
Mon Oct  5 19:40:28 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691539968
total_rss_huge 8608808960
Mon Oct  5 19:40:30 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691539968
total_rss_huge 8608808960
Mon Oct  5 19:40:32 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691539968
total_rss_huge 8608808960
Mon Oct  5 19:40:34 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8689311744
total_rss_huge 8608808960
Mon Oct  5 19:40:36 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8689311744
total_rss_huge 8608808960
Mon Oct  5 19:40:38 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8689311744
total_rss_huge 8608808960
Mon Oct  5 19:40:40 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8689438720
total_rss_huge 8608808960
Mon Oct  5 19:40:42 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8689537024
total_rss_huge 8608808960
Mon Oct  5 19:40:44 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691634176
total_rss_huge 8610906112
Mon Oct  5 19:40:46 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691662848
total_rss_huge 8610906112
Mon Oct  5 19:40:48 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691671040
total_rss_huge 8610906112
Mon Oct  5 19:40:50 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691687424
total_rss_huge 8610906112
Mon Oct  5 19:40:52 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691699712
total_rss_huge 8610906112
Mon Oct  5 19:40:54 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691712000
total_rss_huge 8610906112
Mon Oct  5 19:40:56 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692711424
total_rss_huge 8613003264
Mon Oct  5 19:40:58 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692740096
total_rss_huge 8613003264
Mon Oct  5 19:41:00 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692748288
total_rss_huge 8613003264
Mon Oct  5 19:41:02 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692756480
total_rss_huge 8613003264
Mon Oct  5 19:41:04 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692756480
total_rss_huge 8613003264
Mon Oct  5 19:41:06 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692781056
total_rss_huge 8613003264
Mon Oct  5 19:41:08 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692789248
total_rss_huge 8613003264
Mon Oct  5 19:41:10 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692789248
total_rss_huge 8613003264
Mon Oct  5 19:41:12 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692789248
total_rss_huge 8613003264
Mon Oct  5 19:41:14 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692789248
total_rss_huge 8613003264
Mon Oct  5 19:41:16 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692838400
total_rss_huge 8613003264
Mon Oct  5 19:41:18 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692838400
total_rss_huge 8613003264
Mon Oct  5 19:41:20 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692850688
total_rss_huge 8613003264
Mon Oct  5 19:41:22 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693526528
total_rss_huge 8613003264
Mon Oct  5 19:41:24 UTC 2020
cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod3d96a3d2_ea99_43d4_859f_12fe85b89762.slice/crio-e70a1a1739bb4a41875168459a4208d4726e9d29268f6009f53c8c302d60177a.scope/memory.stat: No such file or directory
Mon Oct  5 19:41:26 UTC 2020
~~~

~~~
[root@openshift-worker-1 ~]# journalctl -f | grep -i oomkilled
Oct 05 19:41:22 openshift-worker-1.example.com hyperkube[2223]: I1005 19:41:22.830995    2223 status_manager.go:570] Patch status for pod "virt-launcher-fedora-test-vm-v8b9l_test(3d96a3d2-ea99-43d4-859f-12fe85b89762)" with "{\"metadata\":{\"uid\":\"3d96a3d2-ea99-43d4-859f-12fe85b89762\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"lastTransitionTime\":\"2020-10-05T19:41:22Z\",\"message\":\"containers with unready status: [compute]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-10-05T19:41:22Z\",\"message\":\"containers with unready status: [compute]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://e70a1a1739bb4a41875168459a4208d4726e9d29268f6009f53c8c302d60177a\",\"image\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"imageID\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"lastState\":{},\"name\":\"compute\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://e70a1a1739bb4a41875168459a4208d4726e9d29268f6009f53c8c302d60177a\",\"exitCode\":137,\"finishedAt\":\"2020-10-05T19:41:22Z\",\"reason\":\"OOMKilled\",\"startedAt\":\"2020-10-05T19:35:56Z\"}}},{\"containerID\":\"cri-o://66027bd3bcaf5374cde0fd9f6d4ef1bfdbf74a987cce4193e483e21b9955564e\",\"image\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo:latest\",\"imageID\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331\",\"lastState\":{},\"name\":\"volumerootdisk\",\"ready\":true,\"restartCount\":0,\"started\":true,\"state\":{\"running\":{\"startedAt\":\"2020-10-05T19:36:01Z\"}}}]}}"
Oct 05 19:41:22 openshift-worker-1.example.com hyperkube[2223]: I1005 19:41:22.831057    2223 status_manager.go:578] Status for pod "virt-launcher-fedora-test-vm-v8b9l_test(3d96a3d2-ea99-43d4-859f-12fe85b89762)" updated successfully: (4, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:35:56 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:41:22 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:41:22 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:39:59 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:192.168.123.221 PodIP:172.24.2.17 PodIPs:[{IP:172.24.2.17}] StartTime:2020-10-05 19:35:48 +0000 UTC InitContainerStatuses:[{Name:container-disk-binary State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-05 19:35:55 +0000 UTC,FinishedAt:2020-10-05 19:35:55 +0000 UTC,ContainerID:cri-o://0de49e24f5467bfb0d3131e725367a51acb74265ef629d2e39755b5b5dc91a31,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://0de49e24f5467bfb0d3131e725367a51acb74265ef629d2e39755b5b5dc91a31 Started:<nil>}] ContainerStatuses:[{Name:compute State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:137,Signal:0,Reason:OOMKilled,Message:,StartedAt:2020-10-05 19:35:56 +0000 UTC,FinishedAt:2020-10-05 19:41:22 +0000 UTC,ContainerID:cri-o://e70a1a1739bb4a41875168459a4208d4726e9d29268f6009f53c8c302d60177a,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://e70a1a1739bb4a41875168459a4208d4726e9d29268f6009f53c8c302d60177a Started:0xc001b50ef2} {Name:volumerootdisk State:{Waiting:nil Running:&ContainerStateRunning{StartedAt:2020-10-05 19:36:01 +0000 UTC,} Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://66027bd3bcaf5374cde0fd9f6d4ef1bfdbf74a987cce4193e483e21b9955564e Started:0xc001b50ef3}] QOSClass:Guaranteed EphemeralContainerStatuses:[]})
^C
~~~

================================

[0]

~~~
[root@openshift-jumpserver-0 ~]# oc get -f fedora-vm.yaml  -o yaml
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"kubevirt.io/v1alpha3","kind":"VirtualMachine","metadata":{"annotations":{"kubevirt.io/latest-observed-api-version":"v1alpha3","kubevirt.io/storage-observed-api-version":"v1alpha3","name.os.template.kubevirt.io/fedora31":"Fedora 31"},"labels":{"app":"fedora-test-vm","flavor.template.kubevirt.io/Custom":"true","os.template.kubevirt.io/fedora31":"true","vm.kubevirt.io/template.namespace":"openshift","vm.kubevirt.io/template.revision":"1","vm.kubevirt.io/template.version":"v0.9.1","workload.template.kubevirt.io/server":"true"},"name":"fedora-test-vm","namespace":"test"},"spec":{"running":true,"template":{"metadata":{"labels":{"flavor.template.kubevirt.io/Custom":"true","kubevirt.io/domain":"fedora-test-vm","kubevirt.io/size":"large","os.template.kubevirt.io/fedora31":"true","vm.kubevirt.io/name":"fedora-test-vm","workload.template.kubevirt.io/server":"true"}},"spec":{"domain":{"cpu":{"cores":3,"model":"host-model"},"devices":{"disks":[{"bootOrder":1,"disk":{"bus":"virtio"},"name":"rootdisk"},{"disk":{"bus":"virtio"},"name":"cloudinitdisk"}],"interfaces":[{"masquerade":{},"model":"virtio","name":"nic0"}],"networkInterfaceMultiqueue":true,"rng":{}},"features":{"acpi":{"enabled":true}},"machine":{"type":"q35"},"resources":{"limits":{"cpu":"3","memory":"8192Mi"},"requests":{"cpu":"3","memory":"8192Mi"}}},"evictionStrategy":"LiveMigrate","hostname":"fedora-test-vm","networks":[{"name":"nic0","pod":{}}],"terminationGracePeriodSeconds":0,"volumes":[{"containerDisk":{"image":"docker.io/kubevirt/fedora-cloud-container-disk-demo"},"name":"rootdisk"},{"cloudInitNoCloud":{"userData":"#cloud-config\nname: default\nhostname: fedora-test-vm\nssh_authorized_keys:\n  - \u003e-\n    ssh-rsa\n    AAAAB3NzaC1yc2EAAAADAQABAAABgQDpUE2TTSndAH40z9ccaOv4jqBwSFP6M1HGDTp5MlSjuhARj4b5MNnOawqxBLO4DUdYmZtDPkwFSq0JTV59RudOWF+RmShc5WotVBYqLnmGzq5WEVhbo5ZwHey8OQ9env0lJBHzAArzq878B6Y0aEwUV+vBqv7wHZK+J7aF04d/qQm7yPlcuCc0JCE2I2Dqgy6O9dJI/bNLvqWKCt+d9og2nmzJbCPI3w4rscZZizL7d1kk93DiDFFnjfNaChgFcpDASnZOZlceEY7xt2Kc4SwHDfg7ykACDoonleUuF0eDbpCUPo3QIvKGLEzzo4ONQMEfb7qFjQxk+YxlpnXDE5Gl30IK0u/HtmJyA1hZyhJqb8l+Bh0qnm8+fxxctsGcROyXoIYfHeMucOjr0nZitplL6bI+NTI6TqILphfEbNCUJe/Uo9CrojU9dN1pfCZ71+FRaHGDrxoZLyhduxgBrHIlQEPOiYbiGif2unMNERaPdjgB3R5bqiCrhT/a4jl49IU=\n"},"name":"cloudinitdisk"}]}}}}
    kubevirt.io/latest-observed-api-version: v1alpha3
    kubevirt.io/storage-observed-api-version: v1alpha3
    name.os.template.kubevirt.io/fedora31: Fedora 31
  creationTimestamp: "2020-10-05T19:39:59Z"
  generation: 3
  labels:
    app: fedora-test-vm
    flavor.template.kubevirt.io/Custom: "true"
    os.template.kubevirt.io/fedora31: "true"
    vm.kubevirt.io/template.namespace: openshift
    vm.kubevirt.io/template.revision: "1"
    vm.kubevirt.io/template.version: v0.9.1
    workload.template.kubevirt.io/server: "true"
  managedFields:
  - apiVersion: kubevirt.io/v1alpha3
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
          f:kubevirt.io/latest-observed-api-version: {}
          f:kubevirt.io/storage-observed-api-version: {}
          f:name.os.template.kubevirt.io/fedora31: {}
        f:labels:
          .: {}
          f:app: {}
          f:flavor.template.kubevirt.io/Custom: {}
          f:os.template.kubevirt.io/fedora31: {}
          f:vm.kubevirt.io/template.namespace: {}
          f:vm.kubevirt.io/template.revision: {}
          f:vm.kubevirt.io/template.version: {}
          f:workload.template.kubevirt.io/server: {}
      f:spec:
        .: {}
        f:running: {}
        f:template:
          .: {}
          f:metadata:
            .: {}
            f:labels:
              .: {}
              f:flavor.template.kubevirt.io/Custom: {}
              f:kubevirt.io/domain: {}
              f:kubevirt.io/size: {}
              f:os.template.kubevirt.io/fedora31: {}
              f:vm.kubevirt.io/name: {}
              f:workload.template.kubevirt.io/server: {}
          f:spec:
            .: {}
            f:domain:
              .: {}
              f:cpu:
                .: {}
                f:cores: {}
                f:model: {}
              f:devices:
                .: {}
                f:disks: {}
                f:interfaces: {}
                f:networkInterfaceMultiqueue: {}
                f:rng: {}
              f:features:
                .: {}
                f:acpi:
                  .: {}
                  f:enabled: {}
              f:machine:
                .: {}
                f:type: {}
              f:resources:
                .: {}
                f:limits:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
                f:requests:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
            f:evictionStrategy: {}
            f:hostname: {}
            f:networks: {}
            f:terminationGracePeriodSeconds: {}
            f:volumes: {}
    manager: oc
    operation: Update
    time: "2020-10-05T19:39:59Z"
  - apiVersion: kubevirt.io/v1alpha3
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        .: {}
        f:conditions: {}
        f:created: {}
        f:ready: {}
    manager: virt-controller
    operation: Update
    time: "2020-10-05T19:40:14Z"
  name: fedora-test-vm
  namespace: test
  resourceVersion: "252796"
  selfLink: /apis/kubevirt.io/v1alpha3/namespaces/test/virtualmachines/fedora-test-vm
  uid: a9378cdc-c270-4092-9692-4bacfb5c0e8a
spec:
  running: true
  template:
    metadata:
      creationTimestamp: null
      labels:
        flavor.template.kubevirt.io/Custom: "true"
        kubevirt.io/domain: fedora-test-vm
        kubevirt.io/size: large
        os.template.kubevirt.io/fedora31: "true"
        vm.kubevirt.io/name: fedora-test-vm
        workload.template.kubevirt.io/server: "true"
    spec:
      domain:
        cpu:
          cores: 3
          model: host-model
        devices:
          disks:
          - bootOrder: 1
            disk:
              bus: virtio
            name: rootdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
          interfaces:
          - masquerade: {}
            model: virtio
            name: nic0
          networkInterfaceMultiqueue: true
          rng: {}
        features:
          acpi:
            enabled: true
        machine:
          type: q35
        resources:
          limits:
            cpu: "3"
            memory: 8Gi
          requests:
            cpu: "3"
            memory: 8Gi
      evictionStrategy: LiveMigrate
      hostname: fedora-test-vm
      networks:
      - name: nic0
        pod: {}
      terminationGracePeriodSeconds: 0
      volumes:
      - containerDisk:
          image: docker.io/kubevirt/fedora-cloud-container-disk-demo
        name: rootdisk
      - cloudInitNoCloud:
          userData: |
            #cloud-config
            name: default
            hostname: fedora-test-vm
            ssh_authorized_keys:
              - >-
                ssh-rsa
                AAAAB3NzaC1yc2EAAAADAQABAAABgQDpUE2TTSndAH40z9ccaOv4jqBwSFP6M1HGDTp5MlSjuhARj4b5MNnOawqxBLO4DUdYmZtDPkwFSq0JTV59RudOWF+RmShc5WotVBYqLnmGzq5WEVhbo5ZwHey8OQ9env0lJBHzAArzq878B6Y0aEwUV+vBqv7wHZK+J7aF04d/qQm7yPlcuCc0JCE2I2Dqgy6O9dJI/bNLvqWKCt+d9og2nmzJbCPI3w4rscZZizL7d1kk93DiDFFnjfNaChgFcpDASnZOZlceEY7xt2Kc4SwHDfg7ykACDoonleUuF0eDbpCUPo3QIvKGLEzzo4ONQMEfb7qFjQxk+YxlpnXDE5Gl30IK0u/HtmJyA1hZyhJqb8l+Bh0qnm8+fxxctsGcROyXoIYfHeMucOjr0nZitplL6bI+NTI6TqILphfEbNCUJe/Uo9CrojU9dN1pfCZ71+FRaHGDrxoZLyhduxgBrHIlQEPOiYbiGif2unMNERaPdjgB3R5bqiCrhT/a4jl49IU=
        name: cloudinitdisk
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-10-05T19:36:02Z"
    status: "True"
    type: Ready
  created: true
  ready: true
~~~

Comment 5 Andreas Karis 2020-10-05 21:17:59 UTC
Repeating the same test but undercommiting the VM [0] and without overcommitGuestOverhead - this way, the issue does not reproduce.
~~~
        resources:
          requests:
            cpu: "3"
            memory: 8704Mi
          limits:
            cpu: "3"
            memory: 8704Mi
        memory:
          guest: 8192Mi
~~~

Stress-ng inside VM:
~~~
ssh -i gss-stack-tools -p 20022 fedora.54.202
yum install stress-ng -y
stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1
~~~

Network load - from hypervisor run:
~~~
dd if=/dev/zero of=test.txt bs=1G count=1
sudo ip route add 172.30.54.202 via 172.24.2.1 dev ovn-k8s-mp0 src 172.24.2.2  mtu 1400
while true ; do scp -i gss-stack-tools -P 20022  test.txt  fedora.54.202: ; done
~~~

~~~
[root@openshift-jumpserver-0 ~]# oc describe pod  | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:'
    Limits:
      memory:  40M
    Requests:
      memory:     40M
Containers:
  compute:
    Limits:
      memory:                         9104456Ki
    Requests:
      memory:                         9104456Ki
  volumerootdisk:
    Limits:
      memory:  40M
    Requests:
      memory:     40M
~~~

We can see that qemu-kvm is way below the cgroup limits:
~~~
[root@openshift-worker-1 ~]# pgrep qemu-kvm | while read pid ; do echo === $pid === ; ps aux | grep $pid |  grep -o '\-m [0-9]*' ; pmap $pid | tail ; done
=== 896856 ===
-m 8192
00007ffa5715a000     12K rw-s-   [ anon ]
00007ffa5715d000      4K rw-s- zero (deleted)
00007ffa5715e000      4K r---- ld-2.28.so
00007ffa5715f000      4K rw--- ld-2.28.so
00007ffa57160000      4K rw---   [ anon ]
00007ffd48abe000    132K rw---   [ stack ]
00007ffd48adf000     12K r----   [ anon ]
00007ffd48ae2000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]
 total         13127572K
~~~

~~~
[root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do  date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit'  ; sleep 2 ; done
Mon Oct  5 19:20:54 UTC 2020
hierarchical_memory_limit 9322962944
total_rss 772190208
total_rss_huge 706740224
~~~

And the OOM does not reproduce within 10 minutes of testing.

========

[0] 
~~~
[root@openshift-jumpserver-0 ~]# cat fedora-vm.yaml 
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  annotations:
    kubevirt.io/latest-observed-api-version: v1alpha3
    kubevirt.io/storage-observed-api-version: v1alpha3
    name.os.template.kubevirt.io/fedora31: Fedora 31
  labels:
    app: fedora-test-vm
    flavor.template.kubevirt.io/Custom: "true"
    os.template.kubevirt.io/fedora31: "true"
    vm.kubevirt.io/template.namespace: openshift
    vm.kubevirt.io/template.revision: "1"
    vm.kubevirt.io/template.version: v0.9.1
    workload.template.kubevirt.io/server: "true"
  name: fedora-test-vm
  namespace: test
spec:
  running: true
  template:
    metadata:
      labels:
        flavor.template.kubevirt.io/Custom: "true"
        kubevirt.io/domain: fedora-test-vm
        kubevirt.io/size: large
        os.template.kubevirt.io/fedora31: "true"
        vm.kubevirt.io/name: fedora-test-vm
        workload.template.kubevirt.io/server: "true"
    spec:
      domain:
        cpu:
          cores: 3
          model: host-model
        devices:
          disks:
          - bootOrder: 1
            disk:
              bus: virtio
            name: rootdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
          interfaces:
          - masquerade: {}
            model: virtio
            name: nic0
          networkInterfaceMultiqueue: true
          rng: {}
        features:
          acpi:
            enabled: true
        machine:
          type: q35
        resources:
          requests:
            cpu: "3"
            memory: 8704Mi
          limits:
            cpu: "3"
            memory: 8704Mi
        memory:
          guest: 8192Mi
      evictionStrategy: LiveMigrate
      hostname: fedora-test-vm
      networks:
      - name: nic0
        pod: {}
      terminationGracePeriodSeconds: 0
      volumes:
      - containerDisk:
          image: docker.io/kubevirt/fedora-cloud-container-disk-demo
        name: rootdisk
      - cloudInitNoCloud:
          userData: |
            #cloud-config
            name: default
            hostname: fedora-test-vm
            ssh_authorized_keys:
              - >-
                ssh-rsa
                AA(...)jl49IU=
        name: cloudinitdisk
~~~

Comment 6 Andreas Karis 2020-10-05 21:18:16 UTC
Repeating the same test with [0]
~~~
        resources:
          requests:
            cpu: "3"
            memory: 8192Mi
          limits:
            cpu: "3"
            memory: 8192Mi
        memory:
          guest: 8192Mi
~~~


But still without:
~~~
overcommitGuestOverhead: true
~~~

This reproduces the issue.

We now see:
~~~
[root@openshift-jumpserver-0 ~]# oc describe pod  | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:'
    Limits:
      memory:  40M
    Requests:
      memory:     40M
Containers:
  compute:
    Limits:
      memory:                         8579144Ki
    Requests:
      memory:                         8579144Ki
  volumerootdisk:
    Limits:
      memory:  40M
    Requests:
      memory:     40M
~~~

~~~
[root@openshift-worker-1 ~]# pgrep qemu-kvm | while read pid ; do echo === $pid === ; ps aux | grep $pid |  grep -o '\-m [0-9]*' ; pmap $pid | tail ; done
=== 877221 ===
-m 8192
00007f89ce717000     12K rw-s-   [ anon ]
00007f89ce71a000      4K rw-s- zero (deleted)
00007f89ce71b000      4K r---- ld-2.28.so
00007f89ce71c000      4K rw--- ld-2.28.so
00007f89ce71d000      4K rw---   [ anon ]
00007fff0b5df000    132K rw---   [ stack ]
00007fff0b61a000     12K r----   [ anon ]
00007fff0b61d000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]
 total         11206356K
~~~

~~~
[root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do  date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit'  ; sleep 2 ; done
Mon Oct  5 19:13:35 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 1432940544
total_rss_huge 1354760192
Mon Oct  5 19:13:37 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 1426857984
total_rss_huge 1348468736
~~~

That's in MiB:
~~~
[root@openshift-worker-1 ~]# echo "8785043456 / 1024 /1024" |  bc
8378
~~~

And I can still produce the OOM:
~~~
[root@fedora-test-vm ~]# stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1
stress-ng: debug: [9372] 3 processors online, 3 processors configured
stress-ng: info:  [9372] dispatching hogs: 1 vm, 1 hdd, 1 io
stress-ng: debug: [9372] cache allocate: default cache size: 16384K
stress-ng: debug: [9372] starting stressors
stress-ng: debug: [9373] stress-ng-vm: started [9373] (instance 0)
stress-ng: debug: [9372] 3 stressors spawned
stress-ng: debug: [9374] stress-ng-hdd: started [9374] (instance 0)
stress-ng: debug: [9373] stress-ng-vm using method 'all'
stress-ng: debug: [9375] stress-ng-io: started [9375] (instance 0)

~~~

~~~
[root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do  date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit'  ; sleep 2 ; done
Mon Oct  5 19:13:35 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 1432940544
total_rss_huge 1354760192
Mon Oct  5 19:13:37 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 1426857984
total_rss_huge 1348468736
Mon Oct  5 19:13:39 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 3285221376
total_rss_huge 3206545408
Mon Oct  5 19:13:41 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 4770004992
total_rss_huge 4691329024
Mon Oct  5 19:13:43 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 6103789568
total_rss_huge 6025117696
Mon Oct  5 19:13:45 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 7917981696
total_rss_huge 7839154176
Mon Oct  5 19:13:47 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8656314368
total_rss_huge 8577351680
Mon Oct  5 19:13:49 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692006912
total_rss_huge 8613003264
Mon Oct  5 19:13:51 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692006912
total_rss_huge 8613003264
Mon Oct  5 19:13:53 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692006912
total_rss_huge 8613003264
Mon Oct  5 19:13:55 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692006912
total_rss_huge 8613003264
Mon Oct  5 19:13:57 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692006912
total_rss_huge 8613003264
Mon Oct  5 19:13:59 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8694104064
total_rss_huge 8615100416
Mon Oct  5 19:14:01 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8694104064
total_rss_huge 8615100416
Mon Oct  5 19:14:03 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8694104064
total_rss_huge 8615100416
Mon Oct  5 19:14:05 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8694104064
total_rss_huge 8615100416
Mon Oct  5 19:14:07 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8694104064
total_rss_huge 8615100416
Mon Oct  5 19:14:09 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8694104064
total_rss_huge 8615100416
Mon Oct  5 19:14:11 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8694841344
total_rss_huge 8617197568
Mon Oct  5 19:14:13 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8692932608
total_rss_huge 8615100416
Mon Oct  5 19:14:15 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693264384
total_rss_huge 8615100416
Mon Oct  5 19:14:17 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8695607296
total_rss_huge 8617197568
Mon Oct  5 19:14:19 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693518336
total_rss_huge 8615100416
Mon Oct  5 19:14:21 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693518336
total_rss_huge 8615100416
Mon Oct  5 19:14:23 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693518336
total_rss_huge 8615100416
Mon Oct  5 19:14:25 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693518336
total_rss_huge 8615100416
Mon Oct  5 19:14:27 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693518336
total_rss_huge 8615100416
Mon Oct  5 19:14:29 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693547008
total_rss_huge 8615100416
Mon Oct  5 19:14:31 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693547008
total_rss_huge 8615100416
Mon Oct  5 19:14:33 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693547008
total_rss_huge 8615100416
Mon Oct  5 19:14:35 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693547008
total_rss_huge 8615100416
Mon Oct  5 19:14:37 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693547008
total_rss_huge 8615100416
Mon Oct  5 19:14:39 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693547008
total_rss_huge 8615100416
Mon Oct  5 19:14:41 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693547008
total_rss_huge 8615100416
Mon Oct  5 19:14:43 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693547008
total_rss_huge 8615100416
Mon Oct  5 19:14:45 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693555200
total_rss_huge 8615100416
Mon Oct  5 19:14:47 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693555200
total_rss_huge 8615100416
Mon Oct  5 19:14:49 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693555200
total_rss_huge 8615100416
Mon Oct  5 19:14:51 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693559296
total_rss_huge 8615100416
Mon Oct  5 19:14:53 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693559296
total_rss_huge 8615100416
Mon Oct  5 19:14:55 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693559296
total_rss_huge 8615100416
Mon Oct  5 19:14:57 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693698560
total_rss_huge 8615100416
Mon Oct  5 19:14:59 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693907456
total_rss_huge 8613003264
Mon Oct  5 19:15:01 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693907456
total_rss_huge 8610906112
Mon Oct  5 19:15:03 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693907456
total_rss_huge 8610906112
Mon Oct  5 19:15:05 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693907456
total_rss_huge 8610906112
Mon Oct  5 19:15:07 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693907456
total_rss_huge 8610906112
Mon Oct  5 19:15:09 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8693907456
total_rss_huge 8610906112
Mon Oct  5 19:15:11 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691810304
total_rss_huge 8608808960
Mon Oct  5 19:15:13 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691810304
total_rss_huge 8608808960
Mon Oct  5 19:15:15 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691810304
total_rss_huge 8608808960
Mon Oct  5 19:15:17 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691810304
total_rss_huge 8608808960
Mon Oct  5 19:15:19 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691810304
total_rss_huge 8608808960
Mon Oct  5 19:15:21 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691810304
total_rss_huge 8608808960
Mon Oct  5 19:15:23 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8691810304
total_rss_huge 8608808960
Mon Oct  5 19:15:25 UTC 2020
cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod12fb4c1b_a719_42dc_92e0_bf2d3b7ff322.slice/crio-5266b88b2b7381299ff5395cd47d5684289c81969ee5cfee7e9e2a25415c3b55.scope/memory.stat: No such file or directory
Mon Oct  5 19:15:27 UTC 2020
cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod12fb4c1b_a719_42dc_92e0_bf2d3b7ff322.slice/crio-5266b88b2b7381299ff5395cd47d5684289c81969ee5cfee7e9e2a25415c3b55.scope/memory.stat: No such file or directory
Mon Oct  5 19:15:29 UTC 2020
cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod12fb4c1b_a719_42dc_92e0_bf2d3b7ff322.slice/crio-5266b88b2b7381299ff5395cd47d5684289c81969ee5cfee7e9e2a25415c3b55.scope/memory.stat: No such file or directory
Mon Oct  5 19:15:31 UTC 2020
cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod12fb4c1b_a719_42dc_92e0_bf2d3b7ff322.slice/crio-5266b88b2b7381299ff5395cd47d5684289c81969ee5cfee7e9e2a25415c3b55.scope/memory.stat: No such file or directory
Mon Oct  5 19:15:33 UTC 2020
cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod12fb4c1b_a719_42dc_92e0_bf2d3b7ff322.slice/crio-5266b88b2b7381299ff5395cd47d5684289c81969ee5cfee7e9e2a25415c3b55.scope/memory.stat: No such file or directory
^C
[root@openshift-worker-1 ~]# ^C
[root@openshift-worker-1 ~]# 
~~~

~~~
[root@openshift-worker-1 ~]# journalctl -f | grep -i oomkilled
Oct 05 19:15:24 openshift-worker-1.example.com hyperkube[2223]: I1005 19:15:24.796770    2223 status_manager.go:570] Patch status for pod "virt-launcher-fedora-test-vm-g629t_test(12fb4c1b-a719-42dc-92e0-bf2d3b7ff322)" with "{\"metadata\":{\"uid\":\"12fb4c1b-a719-42dc-92e0-bf2d3b7ff322\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"lastTransitionTime\":\"2020-10-05T19:15:24Z\",\"message\":\"containers with unready status: [compute]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-10-05T19:15:24Z\",\"message\":\"containers with unready status: [compute]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://5266b88b2b7381299ff5395cd47d5684289c81969ee5cfee7e9e2a25415c3b55\",\"image\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"imageID\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"lastState\":{},\"name\":\"compute\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://5266b88b2b7381299ff5395cd47d5684289c81969ee5cfee7e9e2a25415c3b55\",\"exitCode\":137,\"finishedAt\":\"2020-10-05T19:15:24Z\",\"reason\":\"OOMKilled\",\"startedAt\":\"2020-10-05T19:10:49Z\"}}},{\"containerID\":\"cri-o://c3fc850a6e652fcc04d680903945a1f4cd868e0a2e06a0085ee2449c0d77d5e3\",\"image\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo:latest\",\"imageID\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331\",\"lastState\":{},\"name\":\"volumerootdisk\",\"ready\":true,\"restartCount\":0,\"started\":true,\"state\":{\"running\":{\"startedAt\":\"2020-10-05T19:10:54Z\"}}}]}}"
Oct 05 19:15:24 openshift-worker-1.example.com hyperkube[2223]: I1005 19:15:24.796833    2223 status_manager.go:578] Status for pod "virt-launcher-fedora-test-vm-g629t_test(12fb4c1b-a719-42dc-92e0-bf2d3b7ff322)" updated successfully: (4, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:10:49 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:15:24 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:15:24 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:14:52 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:192.168.123.221 PodIP:172.24.2.17 PodIPs:[{IP:172.24.2.17}] StartTime:2020-10-05 19:10:41 +0000 UTC InitContainerStatuses:[{Name:container-disk-binary State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-05 19:10:48 +0000 UTC,FinishedAt:2020-10-05 19:10:49 +0000 UTC,ContainerID:cri-o://05d454b3e4456ca0e6a9e1789bd7c5dc1aa6a7167bf43f56e2763ff4825f06a2,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://05d454b3e4456ca0e6a9e1789bd7c5dc1aa6a7167bf43f56e2763ff4825f06a2 Started:<nil>}] ContainerStatuses:[{Name:compute State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:137,Signal:0,Reason:OOMKilled,Message:,StartedAt:2020-10-05 19:10:49 +0000 UTC,FinishedAt:2020-10-05 19:15:24 +0000 UTC,ContainerID:cri-o://5266b88b2b7381299ff5395cd47d5684289c81969ee5cfee7e9e2a25415c3b55,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://5266b88b2b7381299ff5395cd47d5684289c81969ee5cfee7e9e2a25415c3b55 Started:0xc002ac55d2} {Name:volumerootdisk State:{Waiting:nil Running:&ContainerStateRunning{StartedAt:2020-10-05 19:10:54 +0000 UTC,} Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://c3fc850a6e652fcc04d680903945a1f4cd868e0a2e06a0085ee2449c0d77d5e3 Started:0xc002ac55d3}] QOSClass:Guaranteed EphemeralContainerStatuses:[]})
^C
~~~




================================================================================

[0] 
~~~
[root@openshift-jumpserver-0 ~]# cat fedora-vm.yaml 
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  annotations:
    kubevirt.io/latest-observed-api-version: v1alpha3
    kubevirt.io/storage-observed-api-version: v1alpha3
    name.os.template.kubevirt.io/fedora31: Fedora 31
  labels:
    app: fedora-test-vm
    flavor.template.kubevirt.io/Custom: "true"
    os.template.kubevirt.io/fedora31: "true"
    vm.kubevirt.io/template.namespace: openshift
    vm.kubevirt.io/template.revision: "1"
    vm.kubevirt.io/template.version: v0.9.1
    workload.template.kubevirt.io/server: "true"
  name: fedora-test-vm
  namespace: test
spec:
  running: true
  template:
    metadata:
      labels:
        flavor.template.kubevirt.io/Custom: "true"
        kubevirt.io/domain: fedora-test-vm
        kubevirt.io/size: large
        os.template.kubevirt.io/fedora31: "true"
        vm.kubevirt.io/name: fedora-test-vm
        workload.template.kubevirt.io/server: "true"
    spec:
      domain:
        cpu:
          cores: 3
          model: host-model
        devices:
          disks:
          - bootOrder: 1
            disk:
              bus: virtio
            name: rootdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
          interfaces:
          - masquerade: {}
            model: virtio
            name: nic0
          networkInterfaceMultiqueue: true
          rng: {}
        features:
          acpi:
            enabled: true
        machine:
          type: q35
        resources:
          requests:
            cpu: "3"
            memory: 8192Mi
          limits:
            cpu: "3"
            memory: 8192Mi
        memory:
          guest: 8192Mi
      evictionStrategy: LiveMigrate
      hostname: fedora-test-vm
      networks:
      - name: nic0
        pod: {}
      terminationGracePeriodSeconds: 0
      volumes:
      - containerDisk:
          image: docker.io/kubevirt/fedora-cloud-container-disk-demo
        name: rootdisk
      - cloudInitNoCloud:
          userData: |
            #cloud-config
            name: default
            hostname: fedora-test-vm
            ssh_authorized_keys:
              - >-
                ssh-rsa
                AAA(...)l49IU=
        name: cloudinitdisk
~~~

Comment 7 Andreas Karis 2020-10-05 21:18:31 UTC
I repeated the same test with a VM with 8192Mi and overcommitGuestOverhead: true [0]
~~~
        resources:
          requests:
            cpu: "3"
            memory: 8192Mi
          limits:
            cpu: "3"
            memory: 8192Mi
          overcommitGuestOverhead: true
        memory:
          guest: 8192Mi
~~~

The issue still reproduces.

The memory limits inside kubernetes DB and on the host match:
~~~
[root@openshift-jumpserver-0 ~]# oc describe pod  | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:'
    Limits:
      memory:  40M
    Requests:
      memory:     40M
Containers:
  compute:
    Limits:
      memory:                         8579144Ki
    Requests:
      memory:                         8Gi
  volumerootdisk:
    Limits:
      memory:  40M
    Requests:
      memory:     40M
~~~

~~~
[root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do  date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit'  ; sleep 2 ; done
Mon Oct  5 19:04:45 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 855515136
total_rss_huge 788529152
Mon Oct  5 19:04:47 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 855515136
~~~

~~~
[root@openshift-worker-1 ~]# echo "8579144 * 1024" | bc
8785043456
~~~

So here, we avoid https://github.com/kubevirt/kubevirt/pull/3921

Stress-ng inside VM:
~~~
ssh -i gss-stack-tools -p 20022 fedora.54.202
yum install stress-ng -y
stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1
~~~

Network load - from hypervisor run:
~~~
dd if=/dev/zero of=test.txt bs=1G count=1
sudo ip route add 172.30.54.202 via 172.24.2.1 dev ovn-k8s-mp0 src 172.24.2.2  mtu 1400
while true ; do scp -i gss-stack-tools -P 20022  test.txt  fedora.54.202: ; done
~~~

I can still hit the same OOM:
~~~
[root@openshift-worker-1 ~]# journalctl -f | grep -i oomkilled
Oct 05 19:08:33 openshift-worker-1.example.com hyperkube[2223]: I1005 19:08:33.003078    2223 status_manager.go:570] Patch status for pod "virt-launcher-fedora-test-vm-bsplz_test(668e7ae5-1830-436e-870d-8dfdbea89e0f)" with "{\"metadata\":{\"uid\":\"668e7ae5-1830-436e-870d-8dfdbea89e0f\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"reason\":\"PodCompleted\",\"type\":\"Initialized\"},{\"lastTransitionTime\":\"2020-10-05T19:08:32Z\",\"reason\":\"PodCompleted\",\"status\":\"False\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-10-05T19:08:32Z\",\"reason\":\"PodCompleted\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://92302e1637ad140832c49abe07c7341d1751e2fe418d27079710f85a965ee09b\",\"image\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"imageID\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"lastState\":{},\"name\":\"compute\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://92302e1637ad140832c49abe07c7341d1751e2fe418d27079710f85a965ee09b\",\"exitCode\":0,\"finishedAt\":\"2020-10-05T19:08:31Z\",\"reason\":\"OOMKilled\",\"startedAt\":\"2020-10-05T19:03:21Z\"}}},{\"containerID\":\"cri-o://5574c2e6ddca338c6d116b869a9f8718c41d3c8905eae5956d1ef15758ca764a\",\"image\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo:latest\",\"imageID\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331\",\"lastState\":{},\"name\":\"volumerootdisk\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://5574c2e6ddca338c6d116b869a9f8718c41d3c8905eae5956d1ef15758ca764a\",\"exitCode\":0,\"finishedAt\":\"2020-10-05T19:08:31Z\",\"reason\":\"Completed\",\"startedAt\":\"2020-10-05T19:03:26Z\"}}}],\"phase\":\"Succeeded\"}}"
Oct 05 19:08:33 openshift-worker-1.example.com hyperkube[2223]: I1005 19:08:33.003147    2223 status_manager.go:578] Status for pod "virt-launcher-fedora-test-vm-bsplz_test(668e7ae5-1830-436e-870d-8dfdbea89e0f)" updated successfully: (5, {Phase:Succeeded Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:03:21 +0000 UTC Reason:PodCompleted Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:08:32 +0000 UTC Reason:PodCompleted Message:} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:08:32 +0000 UTC Reason:PodCompleted Message:} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:07:25 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:192.168.123.221 PodIP:172.24.2.17 PodIPs:[{IP:172.24.2.17}] StartTime:2020-10-05 19:03:13 +0000 UTC InitContainerStatuses:[{Name:container-disk-binary State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-05 19:03:20 +0000 UTC,FinishedAt:2020-10-05 19:03:21 +0000 UTC,ContainerID:cri-o://7f05df1530eeecab0ee41b71727bfcdbda86ba08f2907f5a3879354bb02d31b5,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://7f05df1530eeecab0ee41b71727bfcdbda86ba08f2907f5a3879354bb02d31b5 Started:<nil>}] ContainerStatuses:[{Name:compute State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:OOMKilled,Message:,StartedAt:2020-10-05 19:03:21 +0000 UTC,FinishedAt:2020-10-05 19:08:31 +0000 UTC,ContainerID:cri-o://92302e1637ad140832c49abe07c7341d1751e2fe418d27079710f85a965ee09b,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://92302e1637ad140832c49abe07c7341d1751e2fe418d27079710f85a965ee09b Started:0xc00317b28a} {Name:volumerootdisk State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-05 19:03:26 +0000 UTC,FinishedAt:2020-10-05 19:08:31 +0000 UTC,ContainerID:cri-o://5574c2e6ddca338c6d116b869a9f8718c41d3c8905eae5956d1ef15758ca764a,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://5574c2e6ddca338c6d116b869a9f8718c41d3c8905eae5956d1ef15758ca764a Started:0xc00317b28b}] QOSClass:Burstable EphemeralContainerStatuses:[]})
Oct 05 19:08:33 openshift-worker-1.example.com hyperkube[2223]: I1005 19:08:33.003277    2223 kubelet_pods.go:961] Pod "virt-launcher-fedora-test-vm-bsplz_test(668e7ae5-1830-436e-870d-8dfdbea89e0f)" is terminated, but some containers have not been cleaned up: {ID:{Type:cri-o ID:5574c2e6ddca338c6d116b869a9f8718c41d3c8905eae5956d1ef15758ca764a} Name:volumerootdisk State:exited CreatedAt:2020-10-05 19:03:26.178788793 +0000 UTC StartedAt:2020-10-05 19:03:26.472343354 +0000 UTC FinishedAt:2020-10-05 19:08:31.979979057 +0000 UTC ExitCode:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 Hash:1814580024 RestartCount:0 Reason:Completed Message:} {ID:{Type:cri-o ID:92302e1637ad140832c49abe07c7341d1751e2fe418d27079710f85a965ee09b} Name:compute State:exited CreatedAt:2020-10-05 19:03:21.505102416 +0000 UTC StartedAt:2020-10-05 19:03:21.535758318 +0000 UTC FinishedAt:2020-10-05 19:08:31.932979056 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:1362036640 RestartCount:0 Reason:OOMKilled Message:} {ID:{Type:cri-o ID:7f05df1530eeecab0ee41b71727bfcdbda86ba08f2907f5a3879354bb02d31b5} Name:container-disk-binary State:exited CreatedAt:2020-10-05 19:03:20.571269695 +0000 UTC StartedAt:2020-10-05 19:03:20.772646491 +0000 UTC FinishedAt:2020-10-05 19:03:21.270973691 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:793891490 RestartCount:0 Reason:Completed Message:}
Oct 05 19:08:33 openshift-worker-1.example.com hyperkube[2223]: I1005 19:08:33.006942    2223 kubelet_pods.go:961] Pod "virt-launcher-fedora-test-vm-bsplz_test(668e7ae5-1830-436e-870d-8dfdbea89e0f)" is terminated, but some containers have not been cleaned up: {ID:{Type:cri-o ID:5574c2e6ddca338c6d116b869a9f8718c41d3c8905eae5956d1ef15758ca764a} Name:volumerootdisk State:exited CreatedAt:2020-10-05 19:03:26.178788793 +0000 UTC StartedAt:2020-10-05 19:03:26.472343354 +0000 UTC FinishedAt:2020-10-05 19:08:31.979979057 +0000 UTC ExitCode:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 Hash:1814580024 RestartCount:0 Reason:Completed Message:} {ID:{Type:cri-o ID:92302e1637ad140832c49abe07c7341d1751e2fe418d27079710f85a965ee09b} Name:compute State:exited CreatedAt:2020-10-05 19:03:21.505102416 +0000 UTC StartedAt:2020-10-05 19:03:21.535758318 +0000 UTC FinishedAt:2020-10-05 19:08:31.932979056 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:1362036640 RestartCount:0 Reason:OOMKilled Message:} {ID:{Type:cri-o ID:7f05df1530eeecab0ee41b71727bfcdbda86ba08f2907f5a3879354bb02d31b5} Name:container-disk-binary State:exited CreatedAt:2020-10-05 19:03:20.571269695 +0000 UTC StartedAt:2020-10-05 19:03:20.772646491 +0000 UTC FinishedAt:2020-10-05 19:03:21.270973691 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:793891490 RestartCount:0 Reason:Completed Message:}
Oct 05 19:08:33 openshift-worker-1.example.com hyperkube[2223]: I1005 19:08:33.010683    2223 kubelet_pods.go:961] Pod "virt-launcher-fedora-test-vm-bsplz_test(668e7ae5-1830-436e-870d-8dfdbea89e0f)" is terminated, but some containers have not been cleaned up: {ID:{Type:cri-o ID:5574c2e6ddca338c6d116b869a9f8718c41d3c8905eae5956d1ef15758ca764a} Name:volumerootdisk State:exited CreatedAt:2020-10-05 19:03:26.178788793 +0000 UTC StartedAt:2020-10-05 19:03:26.472343354 +0000 UTC FinishedAt:2020-10-05 19:08:31.979979057 +0000 UTC ExitCode:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 Hash:1814580024 RestartCount:0 Reason:Completed Message:} {ID:{Type:cri-o ID:92302e1637ad140832c49abe07c7341d1751e2fe418d27079710f85a965ee09b} Name:compute State:exited CreatedAt:2020-10-05 19:03:21.505102416 +0000 UTC StartedAt:2020-10-05 19:03:21.535758318 +0000 UTC FinishedAt:2020-10-05 19:08:31.932979056 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:1362036640 RestartCount:0 Reason:OOMKilled Message:} {ID:{Type:cri-o ID:7f05df1530eeecab0ee41b71727bfcdbda86ba08f2907f5a3879354bb02d31b5} Name:container-disk-binary State:exited CreatedAt:2020-10-05 19:03:20.571269695 +0000 UTC StartedAt:2020-10-05 19:03:20.772646491 +0000 UTC FinishedAt:2020-10-05 19:03:21.270973691 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:793891490 RestartCount:0 Reason:Completed Message:}
^C
~~~

~~~
[root@fedora-test-vm ~]# stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1
stress-ng: debug: [1278] 3 processors online, 3 processors configured
stress-ng: info:  [1278] dispatching hogs: 1 vm, 1 hdd, 1 io
stress-ng: debug: [1278] cache allocate: default cache size: 16384K
stress-ng: debug: [1278] starting stressors
stress-ng: debug: [1278] 3 stressors spawned
stress-ng: debug: [1280] stress-ng-hdd: started [1280] (instance 0)
stress-ng: debug: [1281] stress-ng-io: started [1281] (instance 0)
stress-ng: debug: [1279] stress-ng-vm: started [1279] (instance 0)
stress-ng: debug: [1279] stress-ng-vm using method 'all'

~~~

~~~
[root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do  date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit'  ; sleep 2 ; done
Mon Oct  5 19:04:45 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 855515136
total_rss_huge 788529152
Mon Oct  5 19:04:47 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 855515136
total_rss_huge 788529152
Mon Oct  5 19:04:49 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 857620480
total_rss_huge 790626304
Mon Oct  5 19:04:51 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 878592000
total_rss_huge 811597824
Mon Oct  5 19:04:53 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 949895168
total_rss_huge 882900992
Mon Oct  5 19:04:55 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 1002323968
total_rss_huge 935329792
Mon Oct  5 19:04:57 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 1073627136
total_rss_huge 1006632960
Mon Oct  5 19:04:59 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 1075724288
total_rss_huge 1008730112
Mon Oct  5 19:05:01 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 1075724288
total_rss_huge 1008730112
Mon Oct  5 19:05:03 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 1077878784
total_rss_huge 1010827264
Mon Oct  5 19:05:05 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 1077878784
total_rss_huge 1010827264
Mon Oct  5 19:05:07 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 1126113280
total_rss_huge 1059061760
Mon Oct  5 19:05:09 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 1182736384
total_rss_huge 1115684864
Mon Oct  5 19:05:11 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 1228873728
total_rss_huge 1161822208
Mon Oct  5 19:05:13 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 1256136704
total_rss_huge 1189085184
Mon Oct  5 19:05:15 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 1719635968
total_rss_huge 1652555776
Mon Oct  5 19:05:17 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2212573184
total_rss_huge 2145386496
Mon Oct  5 19:05:19 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2424594432
total_rss_huge 2357198848
Mon Oct  5 19:05:21 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2491797504
total_rss_huge 2424307712
Mon Oct  5 19:05:23 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2506485760
total_rss_huge 2438987776
Mon Oct  5 19:05:25 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2542268416
total_rss_huge 2474639360
Mon Oct  5 19:05:27 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2544500736
total_rss_huge 2472542208
Mon Oct  5 19:05:29 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559508480
total_rss_huge 2483027968
Mon Oct  5 19:05:31 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559508480
total_rss_huge 2478833664
Mon Oct  5 19:05:33 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559508480
total_rss_huge 2478833664
Mon Oct  5 19:05:35 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559508480
total_rss_huge 2478833664
Mon Oct  5 19:05:37 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559643648
total_rss_huge 2478833664
Mon Oct  5 19:05:39 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559643648
total_rss_huge 2478833664
Mon Oct  5 19:05:41 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559643648
total_rss_huge 2478833664
Mon Oct  5 19:05:43 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559643648
total_rss_huge 2478833664
Mon Oct  5 19:05:45 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559643648
total_rss_huge 2478833664
Mon Oct  5 19:05:47 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559643648
total_rss_huge 2478833664
Mon Oct  5 19:05:49 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559643648
total_rss_huge 2478833664
Mon Oct  5 19:05:51 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559651840
total_rss_huge 2478833664
Mon Oct  5 19:05:53 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559660032
total_rss_huge 2478833664
Mon Oct  5 19:05:55 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559660032
total_rss_huge 2478833664
Mon Oct  5 19:05:57 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559737856
total_rss_huge 2478833664
Mon Oct  5 19:05:59 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559778816
total_rss_huge 2478833664
Mon Oct  5 19:06:01 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559799296
total_rss_huge 2478833664
Mon Oct  5 19:06:03 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559877120
total_rss_huge 2478833664
Mon Oct  5 19:06:05 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559877120
total_rss_huge 2478833664
Mon Oct  5 19:06:07 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559877120
total_rss_huge 2478833664
Mon Oct  5 19:06:09 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559877120
total_rss_huge 2478833664
Mon Oct  5 19:06:11 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2559877120
total_rss_huge 2478833664
Mon Oct  5 19:06:13 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2560098304
total_rss_huge 2478833664
Mon Oct  5 19:06:15 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2560098304
total_rss_huge 2478833664
Mon Oct  5 19:06:17 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2560229376
total_rss_huge 2478833664
Mon Oct  5 19:06:19 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2560229376
total_rss_huge 2478833664
Mon Oct  5 19:06:21 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2560229376
total_rss_huge 2478833664
Mon Oct  5 19:06:23 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2560229376
total_rss_huge 2478833664
Mon Oct  5 19:06:25 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2562351104
total_rss_huge 2480930816
Mon Oct  5 19:06:27 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2562498560
total_rss_huge 2480930816
Mon Oct  5 19:06:29 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 2801614848
total_rss_huge 2720006144
Mon Oct  5 19:06:31 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 4674666496
total_rss_huge 4592762880
Mon Oct  5 19:06:33 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 6388121600
total_rss_huge 6306136064
Mon Oct  5 19:06:35 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 7944278016
total_rss_huge 7862222848
Mon Oct  5 19:06:37 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8672026624
total_rss_huge 8589934592
Mon Oct  5 19:06:39 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8696958976
total_rss_huge 8617197568
Mon Oct  5 19:06:41 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697098240
total_rss_huge 8617197568
Mon Oct  5 19:06:43 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697167872
total_rss_huge 8617197568
Mon Oct  5 19:06:45 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697188352
total_rss_huge 8617197568
Mon Oct  5 19:06:47 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697188352
total_rss_huge 8617197568
Mon Oct  5 19:06:49 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697294848
total_rss_huge 8617197568
Mon Oct  5 19:06:51 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697331712
total_rss_huge 8617197568
Mon Oct  5 19:06:53 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697331712
total_rss_huge 8617197568
Mon Oct  5 19:06:55 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697356288
total_rss_huge 8617197568
Mon Oct  5 19:06:57 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697339904
total_rss_huge 8617197568
Mon Oct  5 19:06:59 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697356288
total_rss_huge 8617197568
Mon Oct  5 19:07:01 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8696741888
total_rss_huge 8617197568
Mon Oct  5 19:07:03 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8696741888
total_rss_huge 8617197568
Mon Oct  5 19:07:05 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8696741888
total_rss_huge 8617197568
Mon Oct  5 19:07:07 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8696741888
total_rss_huge 8617197568
Mon Oct  5 19:07:09 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8696741888
total_rss_huge 8617197568
Mon Oct  5 19:07:11 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8696741888
total_rss_huge 8617197568
Mon Oct  5 19:07:13 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8696741888
total_rss_huge 8617197568
Mon Oct  5 19:07:15 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8696741888
total_rss_huge 8617197568
Mon Oct  5 19:07:17 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8696819712
total_rss_huge 8617197568
Mon Oct  5 19:07:19 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8696844288
total_rss_huge 8617197568
Mon Oct  5 19:07:21 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8696844288
total_rss_huge 8617197568
Mon Oct  5 19:07:23 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8696889344
total_rss_huge 8617197568
Mon Oct  5 19:07:25 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697004032
total_rss_huge 8617197568
Mon Oct  5 19:07:27 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697180160
total_rss_huge 8617197568
Mon Oct  5 19:07:29 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697212928
total_rss_huge 8617197568
Mon Oct  5 19:07:31 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697315328
total_rss_huge 8613003264
Mon Oct  5 19:07:33 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697352192
total_rss_huge 8610906112
Mon Oct  5 19:07:35 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697352192
total_rss_huge 8610906112
Mon Oct  5 19:07:37 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697364480
total_rss_huge 8610906112
Mon Oct  5 19:07:39 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697372672
total_rss_huge 8610906112
Mon Oct  5 19:07:41 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697372672
total_rss_huge 8610906112
Mon Oct  5 19:07:43 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697372672
total_rss_huge 8610906112
Mon Oct  5 19:07:45 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8695275520
total_rss_huge 8608808960
Mon Oct  5 19:07:47 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8695304192
total_rss_huge 8608808960
Mon Oct  5 19:07:49 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8695410688
total_rss_huge 8608808960
Mon Oct  5 19:07:51 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8695410688
total_rss_huge 8608808960
Mon Oct  5 19:07:53 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8695504896
total_rss_huge 8608808960
Mon Oct  5 19:07:55 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8695504896
total_rss_huge 8608808960
Mon Oct  5 19:07:57 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8695517184
total_rss_huge 8608808960
Mon Oct  5 19:07:59 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8695566336
total_rss_huge 8608808960
Mon Oct  5 19:08:01 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8695566336
total_rss_huge 8608808960
Mon Oct  5 19:08:03 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697663488
total_rss_huge 8610906112
Mon Oct  5 19:08:05 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697663488
total_rss_huge 8610906112
Mon Oct  5 19:08:07 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697712640
total_rss_huge 8610906112
Mon Oct  5 19:08:09 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697733120
total_rss_huge 8610906112
Mon Oct  5 19:08:11 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697798656
total_rss_huge 8610906112
Mon Oct  5 19:08:13 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697798656
total_rss_huge 8610906112
Mon Oct  5 19:08:15 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697798656
total_rss_huge 8610906112
Mon Oct  5 19:08:17 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697798656
total_rss_huge 8610906112
Mon Oct  5 19:08:19 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697798656
total_rss_huge 8610906112
Mon Oct  5 19:08:21 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697798656
total_rss_huge 8610906112
Mon Oct  5 19:08:23 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697798656
total_rss_huge 8610906112
Mon Oct  5 19:08:25 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697798656
total_rss_huge 8610906112
Mon Oct  5 19:08:27 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697798656
total_rss_huge 8610906112
Mon Oct  5 19:08:29 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697798656
total_rss_huge 8610906112
Mon Oct  5 19:08:31 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 0
total_rss_huge 0
Mon Oct  5 19:08:33 UTC 2020
cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod668e7ae5_1830_436e_870d_8dfdbea89e0f.slice/crio-92302e1637ad140832c49abe07c7341d1751e2fe418d27079710f85a965ee09b.scope/memory.stat: No such file or directory
Mon Oct  5 19:08:35 UTC 2020
cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod668e7ae5_1830_436e_870d_8dfdbea89e0f.slice/crio-92302e1637ad140832c49abe07c7341d1751e2fe418d27079710f85a965ee09b.scope/memory.stat: No such file or directory
^C
~~~

~~~
[core@openshift-worker-1 ~]$ while true ; do scp -i gss-stack-tools -P 20022  test.txt  fedora.54.202: ; done
test.txt                                                                           100% 1024MB 208.9MB/s   00:04    
test.txt                                                                           100% 1024MB 233.3MB/s   00:04    
test.txt                                                                           100% 1024MB 183.4MB/s   00:05    
test.txt                                                                           100% 1024MB 233.3MB/s   00:04    
test.txt                                                                           100% 1024MB 181.2MB/s   00:05    
test.txt                                                                           100% 1024MB 213.4MB/s   00:04    
(...)
~~~  



------------------

[0]

~~~
[root@openshift-jumpserver-0 ~]# cat fedora-vm.yaml 
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  annotations:
    kubevirt.io/latest-observed-api-version: v1alpha3
    kubevirt.io/storage-observed-api-version: v1alpha3
    name.os.template.kubevirt.io/fedora31: Fedora 31
  labels:
    app: fedora-test-vm
    flavor.template.kubevirt.io/Custom: "true"
    os.template.kubevirt.io/fedora31: "true"
    vm.kubevirt.io/template.namespace: openshift
    vm.kubevirt.io/template.revision: "1"
    vm.kubevirt.io/template.version: v0.9.1
    workload.template.kubevirt.io/server: "true"
  name: fedora-test-vm
  namespace: test
spec:
  running: true
  template:
    metadata:
      labels:
        flavor.template.kubevirt.io/Custom: "true"
        kubevirt.io/domain: fedora-test-vm
        kubevirt.io/size: large
        os.template.kubevirt.io/fedora31: "true"
        vm.kubevirt.io/name: fedora-test-vm
        workload.template.kubevirt.io/server: "true"
    spec:
      domain:
        cpu:
          cores: 3
          model: host-model
        devices:
          disks:
          - bootOrder: 1
            disk:
              bus: virtio
            name: rootdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
          interfaces:
          - masquerade: {}
            model: virtio
            name: nic0
          networkInterfaceMultiqueue: true
          rng: {}
        features:
          acpi:
            enabled: true
        machine:
          type: q35
        resources:
          requests:
            cpu: "3"
            memory: 8192Mi
          limits:
            cpu: "3"
            memory: 8192Mi
          overcommitGuestOverhead: true
        memory:
          guest: 8192Mi
      evictionStrategy: LiveMigrate
      hostname: fedora-test-vm
      networks:
      - name: nic0
        pod: {}
      terminationGracePeriodSeconds: 0
      volumes:
      - containerDisk:
          image: docker.io/kubevirt/fedora-cloud-container-disk-demo
        name: rootdisk
      - cloudInitNoCloud:
          userData: |
            #cloud-config
            name: default
            hostname: fedora-test-vm
            ssh_authorized_keys:
              - >-
                ssh-rsa
                AAAAB(...)jl49IU=
        name: cloudinitdisk
~~~

Comment 8 Andreas Karis 2020-10-05 21:18:45 UTC
Now, I tested with MB instead of MiB and run into https://github.com/kubevirt/kubevirt/pull/3921 on top of that [0]
~~~
        resources:
          requests:
            cpu: "3"
            memory: 10240M
          limits:
            cpu: "3"
            memory: 10240M
          overcommitGuestOverhead: true
        memory:
          guest: 10240M
~~~

Pod description:
~~~
[root@openshift-jumpserver-0 ~]# oc describe pod  | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:'
    Limits:
      memory:  40M
    Requests:
      memory:     40M
Containers:
  compute:
    Limits:
      memory:                         10438331648
    Requests:
      memory:                         10240M
  volumerootdisk:
    Limits:
      memory:  40M
    Requests:
      memory:     40M
~~~

But now we see a mismatch between kubernetes' limits and the cgroup limits:
~~~
[root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do  date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit'  ; sleep 2 ; done
Mon Oct  5 21:03:03 UTC 2020
hierarchical_memory_limit 10438328320
total_rss 831369216
total_rss_huge 765460480
~~~

The VM:
~~~
[root@openshift-worker-1 ~]# pgrep qemu-kvm | while read pid ; do echo === $pid === ; ps aux | grep $pid |  grep -o '\-m [0-9]*' ; pmap $pid | tail ; done
=== 1172052 ===
-m 9766
00007ffb4d2d3000     12K rw-s-   [ anon ]
00007ffb4d2d6000      4K rw-s- zero (deleted)
00007ffb4d2d7000      4K r---- ld-2.28.so
00007ffb4d2d8000      4K rw--- ld-2.28.so
00007ffb4d2d9000      4K rw---   [ anon ]
00007ffde233d000    132K rw---   [ stack ]
00007ffde2368000     12K r----   [ anon ]
00007ffde236b000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]
 total         15162840K
~~~

The mismatch is small in this case, only 3 K, but it's there:
~~~
[root@openshift-worker-1 ~]# echo "10438328320 / 1024 / 1024" | bc
9954
[root@openshift-worker-1 ~]# echo "10438331648 / 1024 / 1024" | bc
9954
[root@openshift-worker-1 ~]# echo "10438328320 / 1024" | bc
10193680
[root@openshift-worker-1 ~]# echo "10438331648 / 1024" | bc
10193683
~~~

The OOM can easily be reproduced:
~~~
[root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do  date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | grep total_rss ; sleep 2 ; done
Mon Oct  5 18:59:00 UTC 2020
total_rss 1483591680
total_rss_huge 1400897536
Mon Oct  5 18:59:02 UTC 2020
total_rss 1483591680
total_rss_huge 1400897536
Mon Oct  5 18:59:05 UTC 2020
total_rss 1483591680
total_rss_huge 1400897536
Mon Oct  5 18:59:07 UTC 2020
total_rss 1479483392
total_rss_huge 1396703232
Mon Oct  5 18:59:09 UTC 2020
total_rss 1479483392
total_rss_huge 1396703232
Mon Oct  5 18:59:11 UTC 2020
total_rss 1479483392
total_rss_huge 1396703232
Mon Oct  5 18:59:13 UTC 2020
total_rss 1479483392
total_rss_huge 1396703232
Mon Oct  5 18:59:15 UTC 2020
total_rss 1743835136
total_rss_huge 1660944384
Mon Oct  5 18:59:17 UTC 2020
total_rss 3922923520
total_rss_huge 3839885312
Mon Oct  5 18:59:19 UTC 2020
total_rss 5042941952
total_rss_huge 4959764480
Mon Oct  5 18:59:21 UTC 2020
total_rss 6917820416
total_rss_huge 6834618368
Mon Oct  5 18:59:23 UTC 2020
total_rss 8656658432
total_rss_huge 8573157376
Mon Oct  5 18:59:25 UTC 2020
total_rss 9718091776
total_rss_huge 9634316288
Mon Oct  5 18:59:27 UTC 2020
total_rss 10336915456
total_rss_huge 10255073280
Mon Oct  5 18:59:29 UTC 2020
total_rss 10339106816
total_rss_huge 10257170432
Mon Oct  5 18:59:31 UTC 2020
total_rss 10339176448
total_rss_huge 10257170432
Mon Oct  5 18:59:33 UTC 2020
total_rss 10337443840
total_rss_huge 10257170432
Mon Oct  5 18:59:35 UTC 2020
total_rss 10338512896
total_rss_huge 10259267584
Mon Oct  5 18:59:37 UTC 2020
total_rss 10338521088
total_rss_huge 10259267584
Mon Oct  5 18:59:39 UTC 2020
total_rss 10338705408
total_rss_huge 10259267584
Mon Oct  5 18:59:41 UTC 2020
total_rss 10339704832
total_rss_huge 10259267584
Mon Oct  5 18:59:43 UTC 2020
total_rss 10339807232
total_rss_huge 10259267584
Mon Oct  5 18:59:45 UTC 2020
total_rss 10339876864
total_rss_huge 10259267584
Mon Oct  5 18:59:47 UTC 2020
total_rss 10339921920
total_rss_huge 10259267584
Mon Oct  5 18:59:49 UTC 2020
total_rss 10339938304
total_rss_huge 10259267584
Mon Oct  5 18:59:51 UTC 2020
total_rss 10340093952
total_rss_huge 10259267584
Mon Oct  5 18:59:53 UTC 2020
total_rss 10340093952
total_rss_huge 10259267584
Mon Oct  5 18:59:55 UTC 2020
total_rss 10340171776
total_rss_huge 10259267584
Mon Oct  5 18:59:57 UTC 2020
total_rss 10340315136
total_rss_huge 10259267584
Mon Oct  5 18:59:59 UTC 2020
total_rss 10340315136
total_rss_huge 10259267584
Mon Oct  5 19:00:01 UTC 2020
total_rss 10340315136
total_rss_huge 10259267584
Mon Oct  5 19:00:03 UTC 2020
total_rss 10340315136
total_rss_huge 10259267584
Mon Oct  5 19:00:05 UTC 2020
cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podd70e792d_be77_4568_ab04_da9590844ce5.slice/crio-6ec77a2e338be7a63eb57a695452a538abd0ab41675a4156f4fd7fde0dad6264.scope/memory.stat: No such file or directory
Mon Oct  5 19:00:07 UTC 2020
cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podd70e792d_be77_4568_ab04_da9590844ce5.slice/crio-6ec77a2e338be7a63eb57a695452a538abd0ab41675a4156f4fd7fde0dad6264.scope/memory.stat: No such file or directory
Mon Oct  5 19:00:09 UTC 2020
cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podd70e792d_be77_4568_ab04_da9590844ce5.slice/crio-6ec77a2e338be7a63eb57a695452a538abd0ab41675a4156f4fd7fde0dad6264.scope/memory.stat: No such file or directory
^C
~~~

~~~
[root@openshift-worker-1 ~]# journalctl -f | grep -i oomkilled
Oct 05 19:00:05 openshift-worker-1.example.com hyperkube[2223]: I1005 19:00:05.997108    2223 status_manager.go:570] Patch status for pod "virt-launcher-fedora-test-vm-nlbfm_test(d70e792d-be77-4568-ab04-da9590844ce5)" with "{\"metadata\":{\"uid\":\"d70e792d-be77-4568-ab04-da9590844ce5\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"reason\":\"PodCompleted\",\"type\":\"Initialized\"},{\"lastTransitionTime\":\"2020-10-05T19:00:05Z\",\"reason\":\"PodCompleted\",\"status\":\"False\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-10-05T19:00:05Z\",\"reason\":\"PodCompleted\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://6ec77a2e338be7a63eb57a695452a538abd0ab41675a4156f4fd7fde0dad6264\",\"image\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"imageID\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"lastState\":{},\"name\":\"compute\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://6ec77a2e338be7a63eb57a695452a538abd0ab41675a4156f4fd7fde0dad6264\",\"exitCode\":0,\"finishedAt\":\"2020-10-05T19:00:05Z\",\"reason\":\"OOMKilled\",\"startedAt\":\"2020-10-05T18:55:56Z\"}}},{\"containerID\":\"cri-o://3a22d8cbe650ab7cd316f2f8ffe5066e431555a375e9dbe5cf5d6feddec00877\",\"image\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo:latest\",\"imageID\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331\",\"lastState\":{},\"name\":\"volumerootdisk\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://3a22d8cbe650ab7cd316f2f8ffe5066e431555a375e9dbe5cf5d6feddec00877\",\"exitCode\":0,\"finishedAt\":\"2020-10-05T19:00:04Z\",\"reason\":\"Completed\",\"startedAt\":\"2020-10-05T18:56:02Z\"}}}],\"phase\":\"Succeeded\"}}"
Oct 05 19:00:05 openshift-worker-1.example.com hyperkube[2223]: I1005 19:00:05.997177    2223 status_manager.go:578] Status for pod "virt-launcher-fedora-test-vm-nlbfm_test(d70e792d-be77-4568-ab04-da9590844ce5)" updated successfully: (6, {Phase:Succeeded Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 18:55:56 +0000 UTC Reason:PodCompleted Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:00:05 +0000 UTC Reason:PodCompleted Message:} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:00:05 +0000 UTC Reason:PodCompleted Message:} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 18:59:58 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:192.168.123.221 PodIP:172.24.2.17 PodIPs:[{IP:172.24.2.17}] StartTime:2020-10-05 18:55:47 +0000 UTC InitContainerStatuses:[{Name:container-disk-binary State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-05 18:55:55 +0000 UTC,FinishedAt:2020-10-05 18:55:55 +0000 UTC,ContainerID:cri-o://427a6f43f61c73abeb7b5e5a142678c11cab2413d0519c98349d60c1f3d9492c,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://427a6f43f61c73abeb7b5e5a142678c11cab2413d0519c98349d60c1f3d9492c Started:<nil>}] ContainerStatuses:[{Name:compute State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:OOMKilled,Message:,StartedAt:2020-10-05 18:55:56 +0000 UTC,FinishedAt:2020-10-05 19:00:05 +0000 UTC,ContainerID:cri-o://6ec77a2e338be7a63eb57a695452a538abd0ab41675a4156f4fd7fde0dad6264,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://6ec77a2e338be7a63eb57a695452a538abd0ab41675a4156f4fd7fde0dad6264 Started:0xc001b3317c} {Name:volumerootdisk State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-05 18:56:02 +0000 UTC,FinishedAt:2020-10-05 19:00:04 +0000 UTC,ContainerID:cri-o://3a22d8cbe650ab7cd316f2f8ffe5066e431555a375e9dbe5cf5d6feddec00877,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://3a22d8cbe650ab7cd316f2f8ffe5066e431555a375e9dbe5cf5d6feddec00877 Started:0xc001b3317d}] QOSClass:Burstable EphemeralContainerStatuses:[]})
Oct 05 19:00:05 openshift-worker-1.example.com hyperkube[2223]: I1005 19:00:05.997317    2223 kubelet_pods.go:961] Pod "virt-launcher-fedora-test-vm-nlbfm_test(d70e792d-be77-4568-ab04-da9590844ce5)" is terminated, but some containers have not been cleaned up: {ID:{Type:cri-o ID:3a22d8cbe650ab7cd316f2f8ffe5066e431555a375e9dbe5cf5d6feddec00877} Name:volumerootdisk State:exited CreatedAt:2020-10-05 18:56:02.171592428 +0000 UTC StartedAt:2020-10-05 18:56:02.372442071 +0000 UTC FinishedAt:2020-10-05 19:00:04.948970301 +0000 UTC ExitCode:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 Hash:1204204984 RestartCount:0 Reason:Completed Message:} {ID:{Type:cri-o ID:6ec77a2e338be7a63eb57a695452a538abd0ab41675a4156f4fd7fde0dad6264} Name:compute State:exited CreatedAt:2020-10-05 18:55:56.669982263 +0000 UTC StartedAt:2020-10-05 18:55:56.70232152 +0000 UTC FinishedAt:2020-10-05 19:00:05.066970303 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:988105243 RestartCount:0 Reason:OOMKilled Message:} {ID:{Type:cri-o ID:427a6f43f61c73abeb7b5e5a142678c11cab2413d0519c98349d60c1f3d9492c} Name:container-disk-binary State:exited CreatedAt:2020-10-05 18:55:54.971599116 +0000 UTC StartedAt:2020-10-05 18:55:55.271974426 +0000 UTC FinishedAt:2020-10-05 18:55:55.769965998 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:793891490 RestartCount:0 Reason:Completed Message:}
^C
~~~

~~~
[root@fedora-test-vm ~]# stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1
stress-ng: debug: [9457] 3 processors online, 3 processors configured
stress-ng: info:  [9457] dispatching hogs: 1 vm, 1 hdd, 1 io
stress-ng: debug: [9457] cache allocate: default cache size: 16384K
stress-ng: debug: [9457] starting stressors
stress-ng: debug: [9458] stress-ng-vm: started [9458] (instance 0)
stress-ng: debug: [9457] 3 stressors spawned
stress-ng: debug: [9459] stress-ng-hdd: started [9459] (instance 0)
stress-ng: debug: [9460] stress-ng-io: started [9460] (instance 0)
stress-ng: debug: [9458] stress-ng-vm using method 'all'



~~~

~~~
[core@openshift-worker-1 ~]$ while true ; do scp -i gss-stack-tools -P 20022  test.txt  fedora.54.202: ; done
test.txt                                                                            60%  616MB 156.7MB/s   00:02 ETA
test.txt                                                                            68%  702MB 149.8MB/s   00:02 ETA











test.txt                                                                           100% 1024MB  62.8MB/s   00:16    
test.txt                                                                           100% 1024MB 156.8MB/s   00:06    
test.txt                                                                           100% 1024MB 150.7MB/s   00:06    
test.txt                                                                           100% 1024MB 166.4MB/s   00:06    
^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C
[core@openshift-worker-1 ~]$ ^C
~~~

=======================================

[0]

VM definition was:
~~~
[root@openshift-jumpserver-0 ~]# 
[root@openshift-jumpserver-0 ~]# cat fedora-vm.yaml 
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  annotations:
    kubevirt.io/latest-observed-api-version: v1alpha3
    kubevirt.io/storage-observed-api-version: v1alpha3
    name.os.template.kubevirt.io/fedora31: Fedora 31
  labels:
    app: fedora-test-vm
    flavor.template.kubevirt.io/Custom: "true"
    os.template.kubevirt.io/fedora31: "true"
    vm.kubevirt.io/template.namespace: openshift
    vm.kubevirt.io/template.revision: "1"
    vm.kubevirt.io/template.version: v0.9.1
    workload.template.kubevirt.io/server: "true"
  name: fedora-test-vm
  namespace: test
spec:
  running: true
  template:
    metadata:
      labels:
        flavor.template.kubevirt.io/Custom: "true"
        kubevirt.io/domain: fedora-test-vm
        kubevirt.io/size: large
        os.template.kubevirt.io/fedora31: "true"
        vm.kubevirt.io/name: fedora-test-vm
        workload.template.kubevirt.io/server: "true"
    spec:
      domain:
        cpu:
          cores: 3
          model: host-model
        devices:
          disks:
          - bootOrder: 1
            disk:
              bus: virtio
            name: rootdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
          interfaces:
          - masquerade: {}
            model: virtio
            name: nic0
          networkInterfaceMultiqueue: true
          rng: {}
        features:
          acpi:
            enabled: true
        machine:
          type: q35
        resources:
          requests:
            cpu: "3"
            memory: 10240M
          limits:
            cpu: "3"
            memory: 10240M
          overcommitGuestOverhead: true
        memory:
          guest: 10240M
      evictionStrategy: LiveMigrate
      hostname: fedora-test-vm
      networks:
      - name: nic0
        pod: {}
      terminationGracePeriodSeconds: 0
      volumes:
      - containerDisk:
          image: docker.io/kubevirt/fedora-cloud-container-disk-demo
        name: rootdisk
      - cloudInitNoCloud:
          userData: |
            #cloud-config
            name: default
            hostname: fedora-test-vm
            ssh_authorized_keys:
              - >-
                ssh-rsa
                AA(...)IU=
        name: cloudinitdisk
~~~

Comment 9 Roman Mohr 2020-10-06 11:12:05 UTC
One issue here is definitely that we had a miscalculation which got fixed in https://github.com/kubevirt/kubevirt/pull/3921. That will be part of CNV 2.5. We should definitely try to reproduce these very good examples on newer kubevirt releases on OCP and k8s to see if it works now.

One workaround, as already elaborated above by Andreas, is explicitly undercommitting the VM until it stops crashing: https://bugzilla.redhat.com/show_bug.cgi?id=1885418#c5

Comment 10 Roman Mohr 2020-10-06 11:17:23 UTC
One general not on overcommitGuestOverhead: It makes only sense to use this option when swap is enabled. CNV VMs have right now no way to give back memory once they touched it. When there is no swap space available the VM will sooner or later operate above its request. This means that one always risks that the kubelet may evict the workload, even if staying below the limits. Since swap is normally disabled on openshift (and most other k8s), this should not be used here. Undercommiting can be done without this option.

Comment 11 sgott 2020-10-07 12:06:05 UTC
Adding TestOnly flag to this BZ as it should be fixed by https://github.com/kubevirt/kubevirt/pull/3921

Comment 15 Andreas Karis 2020-10-08 10:21:23 UTC
Repeated the exact same issue on:
~~~
[root@cnvqe-01 ~]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-10-03-051134   True        False         3d22h   Cluster version is 4.6.0-0.nightly-2020-10-03-051134
~~~

~~~
[root@cnvqe-01 ~]# oc get csv -n openshift-cnv
NAME                                      DISPLAY                    VERSION   REPLACES                                  PHASE
kubevirt-hyperconverged-operator.v2.5.0   OpenShift Virtualization   2.5.0     kubevirt-hyperconverged-operator.v2.4.1   Succeeded
~~~

~~~
[root@cnvqe-01 ~]# cat akaris-vm.yaml 
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  annotations:
    kubevirt.io/latest-observed-api-version: v1alpha3
    kubevirt.io/storage-observed-api-version: v1alpha3
    name.os.template.kubevirt.io/fedora31: Fedora 31
  labels:
    app: akaris-test-vm
    flavor.template.kubevirt.io/Custom: "true"
    os.template.kubevirt.io/fedora31: "true"
    vm.kubevirt.io/template.namespace: openshift
    vm.kubevirt.io/template.revision: "1"
    vm.kubevirt.io/template.version: v0.9.1
    workload.template.kubevirt.io/server: "true"
  name: akaris-test-vm
spec:
  running: true
  template:
    metadata:
      labels:
        flavor.template.kubevirt.io/Custom: "true"
        kubevirt.io/domain: akaris-test-vm
        kubevirt.io/size: large
        os.template.kubevirt.io/fedora31: "true"
        vm.kubevirt.io/name: akaris-test-vm
        workload.template.kubevirt.io/server: "true"
    spec:
      domain:
        cpu:
          cores: 3
          model: host-model
        devices:
          disks:
          - bootOrder: 1
            disk:
              bus: virtio
            name: rootdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
          interfaces:
          - masquerade: {}
            model: virtio
            name: nic0
          networkInterfaceMultiqueue: true
          rng: {}
        features:
          acpi:
            enabled: true
        machine:
          type: q35
        resources:
          requests:
            cpu: "3"
            memory: 8Gi
          limits:
            cpu: "3"
            memory: 8Gi
#          overcommitGuestOverhead: true
#        memory:
#          guest: 8Gi
      evictionStrategy: LiveMigrate
      hostname: akaris-test-vm
      networks:
      - name: nic0
        pod: {}
      terminationGracePeriodSeconds: 0
      volumes:
      - containerDisk:
          image: docker.io/kubevirt/fedora-cloud-container-disk-demo
        name: rootdisk
      - cloudInitNoCloud:
          userData: |
            #cloud-config
            name: default
            hostname: akaris-test-vm
            ssh_authorized_keys:
              - >-
                ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDEn6wI5YZbCbwUX8RlBYkkYGFK8vTXyeuOJWf6WrWcLi2rKfCoXWNxAw2ekLqIQRpc2xI5wCquWPeZ3aiHpqlocbFgLuGEBIqciJtGnI1IGOZtdccfEe8MRKJMjy3EnktGnHiFkxPXE7UYglujq1GN2BAzgGKigjOLu+k56tVWs+U96KtZz81P3sE4wye4rCm0R0lS4kaEX5bQ8ZAQG4KnWVuuW/PMm2g5n8FImjBz2t7kW3w4N9O4/yvUpgrsR75FXwdZ4gfBLxcKm2oqXHker31UFo4R5uuov5KO43OEOswGu/JmbcBm5T9ozyE/9RE+iuMrYIDWV9v8SgwT7Tpm24dKpcRWO968jDzaWPgGL8EN4y9uDYx7oIlYmwzJ5xFiDgQh+kwWi5p67TD2Iuxmxp5yOjv9ZKJKSk+wU8Z3407kB/AoempquSVXPU6jfZzAxFWlsQdJOd1ETmgcAjbF/WhFGR1LjJrzK6Nm3Gr2HDA2VAmOnLzyh7BBIHZSqLM= core.eng.tlv2.redhat.com
        name: cloudinitdisk
~~~

~~~
oc create -f akaris-vm.yaml
~~~

~~~
virtctl expose  vmi akaris-test-vm --port=20022 --target-port=22 --name=akaris-test-vm-ssh
~~~

Resulting VM and pod:
~~~
[root@cnvqe-01 ~]#  oc describe pod virt-launcher-akaris-test-vm-nb4fg | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:'
    Limits:
      memory:  40M
    Requests:
      memory:     40M
    Limits:
      memory:  40M
    Requests:
      memory:     40M
Containers:
  compute:
    Limits:
      memory:                         8579144Ki
    Requests:
      memory:                         8579144Ki
  volumerootdisk:
    Limits:
      memory:  40M
    Requests:
      memory:     40M
~~~

On the worker node:
~~~
[root@cnvqe-07 ~]# pgrep -f guest=test-bz-1885418_akaris-test-vm | while read pid ; do echo === $pid === ; ps aux | grep $pid |  grep -o '\-m [0-9]*' ; pmap $pid | tail ; done
=== 1633178 ===
-m 8192
00007f0d54e16000     12K rw-s-   [ anon ]
00007f0d54e19000      4K rw-s- zero (deleted)
00007f0d54e1a000      4K r---- ld-2.28.so
00007f0d54e1b000      4K rw--- ld-2.28.so
00007f0d54e1c000      4K rw---   [ anon ]
00007fffc86ef000    136K rw---   [ stack ]
00007fffc879b000     12K r----   [ anon ]
00007fffc879e000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]
 total         13640628K
[root@cnvqe-07 ~]# cgroup=$(cat /proc/$(pgrep -f guest=test-bz-1885418_akaris-test-vm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit' 
Thu Oct  8 09:58:48 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8704077824
total_rss_huge 8625586176
~~~

Compare this to https://bugzilla.redhat.com/show_bug.cgi?id=1885418#c4

It's clear at this point that the issue will reproduce, even without testing, given that the cgroup limits and VM memory are exactly the same.

Connect to VM in one terminal:
~~~
ssh -i test-key -p 20022 fedora.95.83
sudo -i # MUST BE ROOT
yum install stress-ng -y
stress-ng --vm 4 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1   # we are memory ballooning now, so I'm a bit more aggressive here
~~~

In other terminal run:
~~~
while true ; do scp -i test-key -P 20022  test.txt  fedora.95.83: ; done
~~~

Check journal and wait:
~~~
journalctl -f | grep -i oomkilled
~~~

Check in other terminal:
~~~
[root@cnvqe-07 ~]# cgroup=$(cat /proc/$(pgrep -f guest=test-bz-1885418_akaris-test-vm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do  date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit'  ; sleep 2 ; done
Thu Oct  8 09:58:48 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8704077824
total_rss_huge 8625586176
(...)
~~~

FYI: With the new version of CNV, we are ballooning now:
~~~
[root@cnvqe-07 ~]# ps aux | grep akaris | grep qemu | grep ball
107      1633178 53.8  6.4 14091404 8452648 ?    Sl   09:32  20:53 /usr/libexec/qemu-kvm -name guest=test-bz-1885418_akaris-test-vm,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-test-bz-1885418_akar/master-key.aes -machine pc-q35-rhel8.2.0,accel=kvm,usb=off,dump-guest-core=off -cpu Cascadelake-Server,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on -m 8192 -overcommit mem-lock=off -smp 3,sockets=1,dies=1,cores=3,threads=1 -object iothread,id=iothread1 -uuid c7ba2145-f452-55ea-93e8-bbe46f64e795 -smbios type=1,manufacturer=Red Hat,product=Container-native virtualization,version=2.5.0,uuid=c7ba2145-f452-55ea-93e8-bbe46f64e795,sku=2.5.0,family=Red Hat -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=20,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x0 -blockdev {"driver":"file","filename":"/var/run/kubevirt/container-disks/disk_0.img","node-name":"libvirt-3-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-3-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-3-storage","backing":null} -blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/disk-data/rootdisk/disk.qcow2","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"} -device virtio-blk-pci,scsi=off,bus=pci.3,addr=0x0,drive=libvirt-2-format,id=ua-rootdisk,bootindex=1,write-cache=on -blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/cloud-init-data/test-bz-1885418/akaris-test-vm/noCloud.iso","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"} -device virtio-blk-pci,scsi=off,bus=pci.4,addr=0x0,drive=libvirt-1-format,id=ua-cloudinitdisk,write-cache=on -netdev tap,fds=22:23:24,id=hostua-nic0,vhost=on,vhostfds=25:26:27 -device virtio-net-pci,mq=on,vectors=8,host_mtu=1450,netdev=hostua-nic0,id=ua-nic0,mac=02:00:00:1d:9e:07,bus=pci.1,addr=0x0,romfile= -chardev socket,id=charserial0,fd=28,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=29,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc vnc=unix:/var/run/kubevirt-private/403f1b15-67c9-4755-b8a9-167cc069a0c7/virt-vnc -device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 -device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.6,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
~~~

Before, we were not doing this:
~~~
[root@openshift-worker-1 ~]# ps aux | grep qemu-kvm
107        22602  0.2  0.9 13820176 1305248 ?    Sl   Oct07   2:58 /usr/libexec/qemu-kvm -name guest=test_fedora-test-vm,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-test_fedora-test-vm/master-key.aes -machine pc-q35-rhel8.2.0,accel=kvm,usb=off,dump-guest-core=off -cpu Broadwell,vme=on,ss=on,vmx=on,f16c=on,rdrand=on,hypervisor=on,arat=on,tsc-adjust=on,umip=on,arch-capabilities=on,xsaveopt=on,pdpe1gb=on,abm=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,rtm=on,hle=on -m 9766 -overcommit mem-lock=off -smp 3,sockets=1,dies=1,cores=3,threads=1 -object iothread,id=iothread1 -uuid 7871df6c-d00f-5202-84d5-788821cd576c -smbios type=1,manufacturer=Red Hat,product=Container-native virtualization,version=2.4.2,uuid=7871df6c-d00f-5202-84d5-788821cd576c,sku=2.4.2,family=Red Hat -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=21,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 -device virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x0 -blockdev {"driver":"file","filename":"/var/run/kubevirt/container-disks/disk_0.img","node-name":"libvirt-3-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-3-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-3-storage","backing":null} -blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/disk-data/rootdisk/disk.qcow2","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"} -device virtio-blk-pci,scsi=off,bus=pci.3,addr=0x0,drive=libvirt-2-format,id=ua-rootdisk,bootindex=1,write-cache=on -blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/cloud-init-data/test/fedora-test-vm/noCloud.iso","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"} -device virtio-blk-pci,scsi=off,bus=pci.4,addr=0x0,drive=libvirt-1-format,id=ua-cloudinitdisk,write-cache=on -netdev tap,fds=23:24:25,id=hostua-nic0,vhost=on,vhostfds=26:27:28 -device virtio-net-pci,mq=on,vectors=8,host_mtu=8900,netdev=hostua-nic0,id=ua-nic0,mac=02:00:00:21:b0:6c,bus=pci.1,addr=0x0 -chardev socket,id=charserial0,fd=29,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=30,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc vnc=unix:/var/run/kubevirt-private/9d236936-6f31-41a0-a5e4-bd374e11a580/virt-vnc -device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.5,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
~~~

After a while, I get the OOM kill. Note that I needed to run stress-ng as root and with --vm 4 to achieve this, this time:
~~~
[root@akaris-test-vm ~]# stress-ng --vm 4 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1
stress-ng: debug: [11310] 3 processors online, 3 processors configured
stress-ng: info:  [11310] dispatching hogs: 4 vm, 1 hdd, 1 io
stress-ng: debug: [11310] cache allocate: default cache size: 16384K
stress-ng: debug: [11310] starting stressors
stress-ng: debug: [11311] stress-ng-vm: started [11311] (instance 0)
stress-ng: debug: [11312] stress-ng-hdd: started [11312] (instance 0)
stress-ng: debug: [11311] stress-ng-vm using method 'all'
stress-ng: debug: [11310] 6 stressors spawned
stress-ng: debug: [11317] stress-ng-vm: started [11317] (instance 3)
stress-ng: debug: [11314] stress-ng-vm: started [11314] (instance 1)
stress-ng: debug: [11317] stress-ng-vm using method 'all'
stress-ng: debug: [11315] stress-ng-vm: started [11315] (instance 2)
stress-ng: debug: [11314] stress-ng-vm using method 'all'
stress-ng: debug: [11315] stress-ng-vm using method 'all'
stress-ng: debug: [11313] stress-ng-io: started [11313] (instance 0)

~~~

~~~
(...)
Thu Oct  8 10:17:47 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8697028608
total_rss_huge 8623489024
Thu Oct  8 10:17:49 UTC 2020
hierarchical_memory_limit 8785043456
total_rss 8649449472
total_rss_huge 8613003264
Thu Oct  8 10:17:51 UTC 2020
cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod13874f96_492a_4672_a6d9_45df3b521b6e.slice/crio-1e7eb162cadbda84b46f3a018e1f6576fd9e9521c44d6286f69d2f12eb7509b7.scope/memory.stat: No such file or directory
Thu Oct  8 10:17:53 UTC 2020
~~~

~~~
[root@cnvqe-07 ~]# journalctl -f | grep -i oomkilled
Oct 08 10:17:49 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:17:49.861166    3275 status_manager.go:564] Patch status for pod "virt-launcher-akaris-test-vm-nb4fg_test-bz-1885418(13874f96-492a-4672-a6d9-45df3b521b6e)" with "{\"metadata\":{\"uid\":\"13874f96-492a-4672-a6d9-45df3b521b6e\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"lastTransitionTime\":\"2020-10-08T10:17:49Z\",\"message\":\"containers with unready status: [compute]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-10-08T10:17:49Z\",\"message\":\"containers with unready status: [compute]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://1e7eb162cadbda84b46f3a018e1f6576fd9e9521c44d6286f69d2f12eb7509b7\",\"image\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:c64a9aa0ba5c598e999929049d52afb5959e91debf38aed6937f53fe4ecbc84c\",\"imageID\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:b2ecfb71c9ff2e898f9864b56573f724f4cc7ae510c038f3642e99b351676d56\",\"lastState\":{},\"name\":\"compute\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://1e7eb162cadbda84b46f3a018e1f6576fd9e9521c44d6286f69d2f12eb7509b7\",\"exitCode\":137,\"finishedAt\":\"2020-10-08T10:17:49Z\",\"reason\":\"OOMKilled\",\"startedAt\":\"2020-10-08T09:31:58Z\"}}},{\"containerID\":\"cri-o://784e1ae476b77353011a988c4d9cc783d69595f540b31099f67ca678267c12e3\",\"image\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo:latest\",\"imageID\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331\",\"lastState\":{},\"name\":\"volumerootdisk\",\"ready\":true,\"restartCount\":0,\"started\":true,\"state\":{\"running\":{\"startedAt\":\"2020-10-08T09:32:05Z\"}}}]}}"
Oct 08 10:17:49 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:17:49.861217    3275 status_manager.go:572] Status for pod "virt-launcher-akaris-test-vm-nb4fg_test-bz-1885418(13874f96-492a-4672-a6d9-45df3b521b6e)" updated successfully: (6, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 09:31:58 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 10:17:49 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 10:17:49 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 09:31:43 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.46.41.22 PodIP:10.131.0.249 PodIPs:[{IP:10.131.0.249}] StartTime:2020-10-08 09:31:43 +0000 UTC InitContainerStatuses:[{Name:container-disk-binary State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-08 09:31:49 +0000 UTC,FinishedAt:2020-10-08 09:31:49 +0000 UTC,ContainerID:cri-o://72f571fc2ddf077ee3972a36adee2e4b1f5e0eed6b677219a079aa1d1f5d1053,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:c64a9aa0ba5c598e999929049d52afb5959e91debf38aed6937f53fe4ecbc84c ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:b2ecfb71c9ff2e898f9864b56573f724f4cc7ae510c038f3642e99b351676d56 ContainerID:cri-o://72f571fc2ddf077ee3972a36adee2e4b1f5e0eed6b677219a079aa1d1f5d1053 Started:<nil>} {Name:volumerootdisk-init State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-08 09:31:58 +0000 UTC,FinishedAt:2020-10-08 09:31:58 +0000 UTC,ContainerID:cri-o://7240f8f53aa37c4fb002ae4ea2b0986e8beec3e2bd51ba965925c80a9ebab8eb,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://7240f8f53aa37c4fb002ae4ea2b0986e8beec3e2bd51ba965925c80a9ebab8eb Started:<nil>}] ContainerStatuses:[{Name:compute State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:137,Signal:0,Reason:OOMKilled,Message:,StartedAt:2020-10-08 09:31:58 +0000 UTC,FinishedAt:2020-10-08 10:17:49 +0000 UTC,ContainerID:cri-o://1e7eb162cadbda84b46f3a018e1f6576fd9e9521c44d6286f69d2f12eb7509b7,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:c64a9aa0ba5c598e999929049d52afb5959e91debf38aed6937f53fe4ecbc84c ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:b2ecfb71c9ff2e898f9864b56573f724f4cc7ae510c038f3642e99b351676d56 ContainerID:cri-o://1e7eb162cadbda84b46f3a018e1f6576fd9e9521c44d6286f69d2f12eb7509b7 Started:0xc004faa9ca} {Name:volumerootdisk State:{Waiting:nil Running:&ContainerStateRunning{StartedAt:2020-10-08 09:32:05 +0000 UTC,} Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://784e1ae476b77353011a988c4d9cc783d69595f540b31099f67ca678267c12e3 Started:0xc004faa9cb}] QOSClass:Guaranteed EphemeralContainerStatuses:[]})
Oct 08 10:18:27 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:18:27.972579    3275 status_manager.go:564] Patch status for pod "virt-launcher-akaris-test-vm-nb4fg_test-bz-1885418(13874f96-492a-4672-a6d9-45df3b521b6e)" with "{\"metadata\":{\"uid\":\"13874f96-492a-4672-a6d9-45df3b521b6e\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"message\":\"containers with unready status: [compute volumerootdisk]\",\"type\":\"Ready\"},{\"message\":\"containers with unready status: [compute volumerootdisk]\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://1e7eb162cadbda84b46f3a018e1f6576fd9e9521c44d6286f69d2f12eb7509b7\",\"image\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:c64a9aa0ba5c598e999929049d52afb5959e91debf38aed6937f53fe4ecbc84c\",\"imageID\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:b2ecfb71c9ff2e898f9864b56573f724f4cc7ae510c038f3642e99b351676d56\",\"lastState\":{},\"name\":\"compute\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://1e7eb162cadbda84b46f3a018e1f6576fd9e9521c44d6286f69d2f12eb7509b7\",\"exitCode\":137,\"finishedAt\":\"2020-10-08T10:17:49Z\",\"reason\":\"OOMKilled\",\"startedAt\":\"2020-10-08T09:31:58Z\"}}},{\"image\":\"docker.io/kubevirt/fedoraA-cloud-container-disk-demo\",\"imageID\":\"\",\"lastState\":{},\"name\":\"volumerootdisk\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"waiting\":{\"reason\":\"PodInitializing\"}}}],\"phase\":\"Pending\"}}"
Oct 08 10:18:27 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:18:27.972622    3275 status_manager.go:572] Status for pod "virt-launcher-akaris-test-vm-nb4fg_test-bz-1885418(13874f96-492a-4672-a6d9-45df3b521b6e)" updated successfully: (8, {Phase:Pending Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 09:31:58 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 10:17:49 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute volumerootdisk]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 10:17:49 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute volumerootdisk]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 09:31:43 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.46.41.22 PodIP:10.131.0.249 PodIPs:[{IP:10.131.0.249}] StartTime:2020-10-08 09:31:43 +0000 UTC InitContainerStatuses:[{Name:container-disk-binary State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-08 09:31:49 +0000 UTC,FinishedAt:2020-10-08 09:31:49 +0000 UTC,ContainerID:cri-o://72f571fc2ddf077ee3972a36adee2e4b1f5e0eed6b677219a079aa1d1f5d1053,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:c64a9aa0ba5c598e999929049d52afb5959e91debf38aed6937f53fe4ecbc84c ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:b2ecfb71c9ff2e898f9864b56573f724f4cc7ae510c038f3642e99b351676d56 ContainerID:cri-o://72f571fc2ddf077ee3972a36adee2e4b1f5e0eed6b677219a079aa1d1f5d1053 Started:<nil>} {Name:volumerootdisk-init State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-08 09:31:58 +0000 UTC,FinishedAt:2020-10-08 09:31:58 +0000 UTC,ContainerID:cri-o://7240f8f53aa37c4fb002ae4ea2b0986e8beec3e2bd51ba965925c80a9ebab8eb,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://7240f8f53aa37c4fb002ae4ea2b0986e8beec3e2bd51ba965925c80a9ebab8eb Started:<nil>}] ContainerStatuses:[{Name:compute State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:137,Signal:0,Reason:OOMKilled,Message:,StartedAt:2020-10-08 09:31:58 +0000 UTC,FinishedAt:2020-10-08 10:17:49 +0000 UTC,ContainerID:cri-o://1e7eb162cadbda84b46f3a018e1f6576fd9e9521c44d6286f69d2f12eb7509b7,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:c64a9aa0ba5c598e999929049d52afb5959e91debf38aed6937f53fe4ecbc84c ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:b2ecfb71c9ff2e898f9864b56573f724f4cc7ae510c038f3642e99b351676d56 ContainerID:cri-o://1e7eb162cadbda84b46f3a018e1f6576fd9e9521c44d6286f69d2f12eb7509b7 Started:0xc0046d47ad} {Name:volumerootdisk State:{Waiting:&ContainerStateWaiting{Reason:PodInitializing,Message:,} Running:nil Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo ImageID: ContainerID: Started:0xc0046d47ae}] QOSClass:Guaranteed EphemeralContainerStatuses:[]})
~~~

Comment 16 Andreas Karis 2020-10-08 10:41:15 UTC
So it has really gotten more difficult to get here. I'm still trying to find the correct stressor.

E.g. ...:
 stress-ng --vm 4 --vm-bytes 100% --vm-method all --vm-locked --verify -t 10m -v --hdd 1 --io 1 --tun 2

Comment 17 Andreas Karis 2020-10-08 10:45:16 UTC
[root@cnvqe-07 ~]# journalctl -f | grep -i oomkilled
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.856521    3275 status_manager.go:564] Patch status for pod "virt-launcher-akaris-test-vm-xkqdw_test-bz-1885418(64516f15-013a-4c1d-8f03-9b0f1ac98ace)" with "{\"metadata\":{\"uid\":\"64516f15-013a-4c1d-8f03-9b0f1ac98ace\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"lastTransitionTime\":\"2020-10-08T10:40:12Z\",\"message\":\"containers with unready status: [compute]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-10-08T10:40:12Z\",\"message\":\"containers with unready status: [compute]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c\",\"image\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:c64a9aa0ba5c598e999929049d52afb5959e91debf38aed6937f53fe4ecbc84c\",\"imageID\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:b2ecfb71c9ff2e898f9864b56573f724f4cc7ae510c038f3642e99b351676d56\",\"lastState\":{},\"name\":\"compute\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c\",\"exitCode\":137,\"finishedAt\":\"2020-10-08T10:40:12Z\",\"reason\":\"OOMKilled\",\"startedAt\":\"2020-10-08T10:18:55Z\"}}},{\"containerID\":\"cri-o://2341b1474239dc3ed0a1755dfb51b7836248edc08e1682e60fbb0ad9995cee85\",\"image\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo:latest\",\"imageID\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331\",\"lastState\":{},\"name\":\"volumerootdisk\",\"ready\":true,\"restartCount\":0,\"started\":true,\"state\":{\"running\":{\"startedAt\":\"2020-10-08T10:19:02Z\"}}}]}}"
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.856567    3275 status_manager.go:572] Status for pod "virt-launcher-akaris-test-vm-xkqdw_test-bz-1885418(64516f15-013a-4c1d-8f03-9b0f1ac98ace)" updated successfully: (6, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 10:18:55 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 10:40:12 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 10:40:12 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 10:18:38 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.46.41.22 PodIP:10.131.1.9 PodIPs:[{IP:10.131.1.9}] StartTime:2020-10-08 10:18:38 +0000 UTC InitContainerStatuses:[{Name:container-disk-binary State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-08 10:18:45 +0000 UTC,FinishedAt:2020-10-08 10:18:45 +0000 UTC,ContainerID:cri-o://fb3dbd220dea2b2dc5d10837774ac75071fbd33fdf9ed2d3990f7110a18e1191,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:c64a9aa0ba5c598e999929049d52afb5959e91debf38aed6937f53fe4ecbc84c ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:b2ecfb71c9ff2e898f9864b56573f724f4cc7ae510c038f3642e99b351676d56 ContainerID:cri-o://fb3dbd220dea2b2dc5d10837774ac75071fbd33fdf9ed2d3990f7110a18e1191 Started:<nil>} {Name:volumerootdisk-init State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-08 10:18:53 +0000 UTC,FinishedAt:2020-10-08 10:18:54 +0000 UTC,ContainerID:cri-o://178dc90fd4221e4b20e6ed80187acde6bd17671c94d807f606eafd51519aa5df,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://178dc90fd4221e4b20e6ed80187acde6bd17671c94d807f606eafd51519aa5df Started:<nil>}] ContainerStatuses:[{Name:compute State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:137,Signal:0,Reason:OOMKilled,Message:,StartedAt:2020-10-08 10:18:55 +0000 UTC,FinishedAt:2020-10-08 10:40:12 +0000 UTC,ContainerID:cri-o://8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:c64a9aa0ba5c598e999929049d52afb5959e91debf38aed6937f53fe4ecbc84c ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:b2ecfb71c9ff2e898f9864b56573f724f4cc7ae510c038f3642e99b351676d56 ContainerID:cri-o://8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c Started:0xc0025ac75a} {Name:volumerootdisk State:{Waiting:nil Running:&ContainerStateRunning{StartedAt:2020-10-08 10:19:02 +0000 UTC,} Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://2341b1474239dc3ed0a1755dfb51b7836248edc08e1682e60fbb0ad9995cee85 Started:0xc0025ac75b}] QOSClass:Guaranteed EphemeralContainerStatuses:[]})
^C





And here from the journal:
~~~
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.028188    3275 config.go:278] Setting pods for source file
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.340260    3275 prober.go:174] HTTP-Probe Host: https://10.131.0.2, Port: 8443, Path: /healthz
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.340291    3275 prober.go:177] HTTP-Probe Headers: map[]
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.344666    3275 http.go:128] Probe succeeded for https://10.131.0.2:8443/healthz, Response: {200 OK 200 HTTP/1.1 1 1 map[Content-Length:[381] Content-Type:[application/json] Date:[Thu, 08 Oct 2020 10:40:11 GMT]] 0xc002afa200 381 [] true false map[] 0xc001596700 0xc00801fd90}
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.344706    3275 prober.go:126] Liveness probe for "virt-handler-w5slc_openshift-cnv(b1111e19-148c-41c0-bc65-8b9d7cc65bf4):virt-handler" succeeded
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.371453    3275 prober.go:159] Exec-Probe Pod: mdns-publisher-cnvqe-07.lab.eng.tlv2.redhat.com, Container: mdns-publisher, Command: [pgrep mdns-publisher]
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.406596    3275 handler.go:125] Unable to get network stats from pid 1806771: couldn't read network stats: failure opening /proc/1806771/net/dev: open /proc/1806771/net/dev: no such file or directory
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.406623    3275 handler.go:181] Unable to get Process Stats: couldn't open cpu cgroup procs file /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod668ab052_7ce8_4264_b726_a363bbf35f6a.slice/crio-3ae79fe3f553052068e9c85d724e67003bd8eb903d20dd2b46a0f15d03bf903c.scope/cgroup.procs : open /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod668ab052_7ce8_4264_b726_a363bbf35f6a.slice/crio-3ae79fe3f553052068e9c85d724e67003bd8eb903d20dd2b46a0f15d03bf903c.scope/cgroup.procs: no such file or directory
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.435575    3275 exec.go:60] Exec probe response: "1\n"
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.435598    3275 prober.go:126] Liveness probe for "mdns-publisher-cnvqe-07.lab.eng.tlv2.redhat.com_openshift-kni-infra(4c3626876428bd9e1291590fe6f28e21):mdns-publisher" succeeded
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.463212    3275 httplog.go:94] "HTTP" verb="GET" URI="/metrics" latency="12.735098ms" userAgent="Prometheus/2.21.0" srcIP="10.46.41.21:60926" resp=200
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.485477    3275 handler.go:181] Unable to get Process Stats: couldn't open cpu cgroup procs file /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod48b967a9_8729_459a_abfc_b5b5b4416480.slice/crio-5eeddbf7dce3039dfc11bf0f27e905be721eb9025d8a80c2f2f3cad9622e6ade.scope/cgroup.procs : open /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod48b967a9_8729_459a_abfc_b5b5b4416480.slice/crio-5eeddbf7dce3039dfc11bf0f27e905be721eb9025d8a80c2f2f3cad9622e6ade.scope/cgroup.procs: no such file or directory
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.490357    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: hostpath-provisioner-gnnl9, container: hostpath-provisioner, container id: 3e4e7e4097a34df1e1826bd58197ffc02b06b44461f398c06e3ffc59cc71dcd5, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: W1008 10:40:11.506689    3275 cpu_manager.go:397] [cpumanager] reconcileState: ignoring terminated container (pod: kubevirt-node-labeller-qjb28, container id: 0423170fc39fe5eb26dc807855f98177d3d399e4262cc952051a9a9444f9278f)
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: W1008 10:40:11.506731    3275 cpu_manager.go:397] [cpumanager] reconcileState: ignoring terminated container (pod: kubevirt-node-labeller-qjb28, container id: 308fe4998c826d377c54856ddfc7a8705abcd778fac873b3105f4f25c43044e7)
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.506757    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: kubevirt-node-labeller-qjb28, container: kubevirt-node-labeller-sleeper, container id: c996bcd36104cdc6ccadead0c79fc424b4f0a995fe0749e944745dfbe0a11c3b, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.522146    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: csi-cephfsplugin-provisioner-67bff7547f-b6md6, container: csi-attacher, container id: 9069342d674f6a88be93ecb59a1a9b5eeff08c6bf0d53fe237733eb8f3bc9f43, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.537146    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: csi-cephfsplugin-provisioner-67bff7547f-b6md6, container: csi-snapshotter, container id: 954749d9b47d9c5452d45ed2dcf0567b7cd5df381205b0f0f9cbfe34ebcd9bcc, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.552307    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: csi-cephfsplugin-provisioner-67bff7547f-b6md6, container: csi-resizer, container id: 5608bfd23cc9882204fb978468632ddfc56277f8e59c00764faeda191dfd084e, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.567834    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: csi-cephfsplugin-provisioner-67bff7547f-b6md6, container: csi-provisioner, container id: c04860b544555b35a5f8fe955d91eb85403c32de155cc082800cedc1ef67c2dc, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.583181    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: csi-cephfsplugin-provisioner-67bff7547f-b6md6, container: csi-cephfsplugin, container id: df791cf3d2f8393c96f55af3910218e578c689f42f6bfc56cb25aded66046894, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.598351    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: csi-cephfsplugin-provisioner-67bff7547f-b6md6, container: liveness-prometheus, container id: 62c0f7e98cd6afa0bb4577188a840bd8d0fd8f5ff25956d8aa5134fd5132fb38, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.613563    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: ocs-operator-bvgqw, container: registry-server, container id: 72f75b212f5d14b6c8d7c255ce087faa05cd201a6556c073215c926154a28694, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.628764    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: virt-handler-w5slc, container: virt-handler, container id: 3d58b7e2e240c569dfd59116b2623a0e2562b6c953237274f84672fe9ae14268, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.645032    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: machine-config-daemon-btktx, container: machine-config-daemon, container id: 9ac5976e0319c61607dca9e3274fc554590c217da294df302cf1b3362a094f32, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.661151    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: machine-config-daemon-btktx, container: oauth-proxy, container id: 5db20c6c3b2e428427a9e57791105d9597ff99c06d95fa80b5712481744b55a4, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.676577    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: dns-default-wcxvm, container: dns, container id: 5b4d29eadc10c23d68d9f664bf9f8464d4bba417396c9a97c697a7ad83e94c56, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.691408    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: dns-default-wcxvm, container: kube-rbac-proxy, container id: 8936feb35a6682d4e5ddcafdee04f281d3a0ad304e1cd86f6886b793f2516ff6, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.706884    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: dns-default-wcxvm, container: dns-node-resolver, container id: 55e088c60c37ac0602c713c17c566c9fba0077e5b9e0c09cd914fe3d6bd58fdd, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.722852    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: ocs-operator-69d556ccb6-s777s, container: ocs-operator, container id: a864ecda66a0bb0b97f9546026f8c6028d4fcdcd52f96d4b8600dc892e4268c5, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.737401    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: noobaa-operator-77b6d6445c-t65l6, container: noobaa-operator, container id: 61cd959579299c58b0820e792a71eefe093ed4867858900aec6a8c841be47d30, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.741087    3275 handler.go:181] Unable to get Process Stats: couldn't open cpu cgroup procs file /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod4c3626876428bd9e1291590fe6f28e21.slice/crio-2c199aed83f27eeedb7da8c60fb32ca8ada55f1453bcdf66a1228f57e0c62940.scope/cgroup.procs : open /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod4c3626876428bd9e1291590fe6f28e21.slice/crio-2c199aed83f27eeedb7da8c60fb32ca8ada55f1453bcdf66a1228f57e0c62940.scope/cgroup.procs: no such file or directory
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.751851    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: rook-ceph-crashcollector-cnvqe-07.lab.eng.tlv2.redhat.com-hppzj, container: ceph-crash, container id: b624c2fb9f59d3841504a52106a5471308fe0e882cdc29a441110eefac1d550f, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.767803    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: mdns-publisher-cnvqe-07.lab.eng.tlv2.redhat.com, container: mdns-publisher, container id: d1807cb8cfa4057f973ab471177c7dc4f8a4ddd299b0043288f611ca37914a24, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.782778    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: virt-launcher-vm-container-disk-rdcks, container: compute, container id: 0e0fe148383277041fdd1933388713e83ca42c7f7c6a300128313e0a6ee3c86e, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.798077    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: virt-launcher-vm-container-disk-rdcks, container: volumecontainerdisk, container id: 94ec3ad58be1fd9c3f5f0f70774e56b162315ee05e1bfaba519da4f92c115cd0, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.812621    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: ovs-xmhwn, container: openvswitch, container id: a7ff48ad7b26fa87c0949fc47d27ec1ebe904da375db46e0d738eb401844dfdd, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: W1008 10:40:11.831760    3275 cpu_manager.go:397] [cpumanager] reconcileState: ignoring terminated container (pod: multus-q2sgs, container id: a2585c1864c6c5fbc0b2c68a94e3d7b6f4c9150b0ead3db67a5389e1c3f6b2c6)
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.831816    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: multus-q2sgs, container: kube-multus, container id: 5df4e4361a321cb7b7c87e4dc15319ef6110c0f0df8f803179a951cdc61e692b, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com kernel: CPU 2/KVM invoked oom-killer: gfp_mask=0x7000c0(GFP_KERNEL_ACCOUNT), nodemask=(null), order=0, oom_score_adj=-998
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com kernel: CPU 2/KVM cpuset=crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope mems_allowed=0-1
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com kernel: CPU: 2 PID: 1985360 Comm: CPU 2/KVM Not tainted 4.18.0-193.24.1.el8_2.dt1.x86_64 #1
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.897728    3275 kubelet.go:1914] SyncLoop (housekeeping)
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.915913    3275 prober.go:159] Exec-Probe Pod: keepalived-cnvqe-07.lab.eng.tlv2.redhat.com, Container: keepalived, Command: [/bin/bash -c kill -s SIGUSR1 "$(pgrep -o keepalived)" && ! grep -q "State = FAULT" /tmp/keepalived.data
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: ]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Hardware name: Dell Inc. PowerEdge R440/04JN2K, BIOS 2.4.8 11/27/2019
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Call Trace:
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  dump_stack+0x5c/0x80
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  dump_header+0x6e/0x27a
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  oom_kill_process.cold.28+0xb/0x10
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  out_of_memory+0x1ba/0x490
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  mem_cgroup_out_of_memory+0x49/0x80
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  try_charge+0x6fa/0x780
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  __memcg_kmem_charge_memcg+0x33/0x90
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  __memcg_kmem_charge+0x80/0x190
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  __alloc_pages_nodemask+0x1eb/0x280
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  __get_free_pages+0xa/0x30
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  mmu_topup_memory_caches+0x8c/0xf0 [kvm]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  tdp_page_fault+0x69/0x2b0 [kvm]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  ? hrtimer_cancel+0x15/0x20
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  ? restart_apic_timer+0x9c/0x130 [kvm]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  kvm_mmu_page_fault+0x75/0x5d0 [kvm]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  ? vmx_set_msr+0x2f2/0xcd0 [kvm_intel]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  ? x86_virt_spec_ctrl+0x68/0xe0
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  vcpu_enter_guest+0x315/0x1680 [kvm]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  ? __apic_accept_irq+0x1a1/0x2e0 [kvm]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  kvm_arch_vcpu_ioctl_run+0xcf/0x540 [kvm]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  kvm_vcpu_ioctl+0x232/0x5f0 [kvm]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  ? __wake_up_common+0x7a/0x190
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  do_vfs_ioctl+0xa4/0x630
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  ksys_ioctl+0x60/0x90
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  __x64_sys_ioctl+0x16/0x20
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  do_syscall_64+0x5b/0x1a0
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  entry_SYSCALL_64_after_hwframe+0x65/0xca
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RIP: 0033:0x7f1a499a787b
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Code: 0f 1e fa 48 8b 05 0d 96 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 95 2c 00 f7 d8 64 89 01 48
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RSP: 002b:00007f1a327fb618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RAX: ffffffffffffffda RBX: 0000563f87375200 RCX: 00007f1a499a787b
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000027
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RBP: 0000000000000000 R08: 0000563f856b14f0 R09: 00007f1a24472890
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000001
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: R13: 0000563f8568d220 R14: 0000000000000000 R15: 00007f1a4eb44000
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Task in /kubepods.slice/kubepods-pod64516f15_013a_4c1d_8f03_9b0f1ac98ace.slice/crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope killed as a result of limit of /kubepods.slice/kubepods-pod64516f15_013a_4c1d_8f03_9b0f1ac98ace.slice/crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.046780    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: nmstate-handler-tsbgl, container: nmstate-handler, container id: c3338de77ed8ba0a40955fcd0fdfaba649b073451edae442efa385bd4d07d776, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: memory: usage 8579104kB, limit 8579144kB, failcnt 0
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: memory+swap: usage 8579104kB, limit 8579144kB, failcnt 665
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.079855    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: csi-rbdplugin-kcpbn, container: driver-registrar, container id: ff817195edfbec3867c0df1f9a379100648f55695cd6437132aa669b2836cb9b, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: kmem: usage 86108kB, limit 9007199254740988kB, failcnt 0
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.100622    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: csi-rbdplugin-kcpbn, container: csi-rbdplugin, container id: 72550366a822834701df2ea3167de02feeeed7c1bdc2ca7d9fa4cda8e9f7059d, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Memory cgroup stats for /kubepods.slice/kubepods-pod64516f15_013a_4c1d_8f03_9b0f1ac98ace.slice/crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope: cache:108KB rss:8492728KB rss_huge:8419328KB shmem:0KB mapped_file:0KB dirty:264KB writeback:132KB swap:0KB inactive_anon:0KB active_anon:8492928KB inactive_file:0KB active_file:0KB unevictable:0KB
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.136583    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: csi-rbdplugin-kcpbn, container: liveness-prometheus, container id: 14294bb7416c897001aa24d2423e2242a7d2f74cb2487cd2e2ecbe78f506028e, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1984292]     0 1984292   245291    12973   598016        0          -998 virt-launcher
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1984339]     0 1984339   337463    15075   651264        0          -998 virt-launcher
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.156332    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: coredns-cnvqe-07.lab.eng.tlv2.redhat.com, container: coredns, container id: b1453b5bac68a1fde489b22b969b31065ee711a442e1254df5450ea6f8990d98, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.158656    3275 handler.go:181] Unable to get Process Stats: couldn't open cpu cgroup procs file /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod004679a3_085a_4f6f_b08f_4a4c6da44a42.slice/crio-a8e507612e327d36f7a5ebc5b84f6860457261479af9f6be6be2e8bb03922ebc.scope/cgroup.procs : open /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod004679a3_085a_4f6f_b08f_4a4c6da44a42.slice/crio-a8e507612e327d36f7a5ebc5b84f6860457261479af9f6be6be2e8bb03922ebc.scope/cgroup.procs: no such file or directory
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1984363]     0 1984363   343158     6943   544768        0          -998 libvirtd
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1984365]     0 1984365    43479     3798   376832        0          -998 virtlogd
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.176353    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: coredns-cnvqe-07.lab.eng.tlv2.redhat.com, container: coredns-monitor, container id: 73dd8accf980daee53a78e94704242410ac16ad04f8d6a7384234394eaa955f4, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1985348]   107 1985348  3522840  2114432 18427904        0          -998 qemu-kvm
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.195926    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: bridge-marker-964s6, container: bridge-marker, container id: ae62e771f7f74749a1d0264937d4d590cae9494a5f170cac9b1d01cb3578a3e0, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Memory cgroup out of memory: Killed process 1984292 (virt-launcher) total-vm:981164kB, anon-rss:18012kB, file-rss:33880kB, shmem-rss:0kB, UID:0
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.201850    3275 handler.go:325] Added event &{/kubepods.slice/kubepods-pod64516f15_013a_4c1d_8f03_9b0f1ac98ace.slice/crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope 2020-10-08 10:40:09.228958979 +0000 UTC m=+256864.081089745 oom {<nil>}}
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.201893    3275 manager.go:1215] Created an OOM event in container "/kubepods.slice/kubepods-pod64516f15_013a_4c1d_8f03_9b0f1ac98ace.slice/crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope" at 2020-10-08 10:40:09.228958979 +0000 UTC m=+256864.081089745
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.201911    3275 handler.go:325] Added event &{/kubepods.slice/kubepods-pod64516f15_013a_4c1d_8f03_9b0f1ac98ace.slice/crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope 2020-10-08 10:40:09.228958979 +0000 UTC m=+256864.081089745 oomKill {0xc002e91860}}
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.210805    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: router-default-568dc569b-d2t9d, container: router, container id: 8f7f51de94c044721209b6173e8da29d713e0a436d6c15273a1a3bfda3b05ce1, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: oom_reaper: reaped process 1984292 (virt-launcher), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.225396    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-84d8f468xqc5, container: rgw, container id: 070ba8479a2ccbb89ad10fed07b60f22790f9dd07a640f763bf8a0c70ae4860a, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.242433    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: rook-ceph-operator-7857679f9f-gf5sr, container: rook-ceph-operator, container id: f86aa7c908f92664ecd0a0ce60f03bf4eba7cf89549ad8fa5b019f2782b1d750, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.246656    3275 exec.go:60] Exec probe response: ""
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.246672    3275 prober.go:126] Liveness probe for "keepalived-cnvqe-07.lab.eng.tlv2.redhat.com_openshift-kni-infra(05a6eab180a118d7691d3126b4a9049f):keepalived" succeeded
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.258137    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: sdn-cdqgf, container: sdn, container id: 8af1326acb147162ee28cb0544d3e2fc267a184292f133aa037d0fadf137452b, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.274528    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: sdn-cdqgf, container: kube-rbac-proxy, container id: 33a3f2d5469f250467f6d2c8d38f1293da8c58e10a00b00c4ba2068bea596c98, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.289553    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: node-ca-4jzhx, container: node-ca, container id: 11f6d6956d4f0aaf6b932c5f69920a626b34442ee19eccc217757fcb0c61db74, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.305485    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: ovs-cni-amd64-j7qmp, container: ovs-cni-marker, container id: e23761376bdbc124731dcb0e2a36ad9aecbd66445b61a010ecbc6eddc05f781b, cpuset: "0-1,3-31,33,35-63")
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.320968    3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: noobaa-core-0, container: core, container id: f680e10e10d3f2d02f6916a59f1c8a67e38021680a28d95b4a6a9be5c890dad5, cpuset: "0-1,3-31,33,35-63")
~~~

Comment 18 Andreas Karis 2020-10-08 10:48:45 UTC
[root@cnvqe-07 ~]# journalctl --since "15 minutes ago"  | grep cnvqe-07.lab.eng.tlv2.redhat.com kernel:
grep: kernel:: No such file or directory
[root@cnvqe-07 ~]# journalctl --since "15 minutes ago"  | grep "cnvqe-07.lab.eng.tlv2.redhat.com kernel:"
Oct 08 10:35:54 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Oct 08 10:35:54 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_UP): vethb5fd8ac6: link is not ready
Oct 08 10:35:54 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethb5fd8ac6: link becomes ready
Oct 08 10:35:54 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Oct 08 10:35:54 cnvqe-07.lab.eng.tlv2.redhat.com kernel: device vethb5fd8ac6 entered promiscuous mode
Oct 08 10:36:20 cnvqe-07.lab.eng.tlv2.redhat.com kernel: device vethb5fd8ac6 left promiscuous mode
Oct 08 10:37:49 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Oct 08 10:37:49 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_UP): vethe45f748b: link is not ready
Oct 08 10:37:49 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethe45f748b: link becomes ready
Oct 08 10:37:49 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Oct 08 10:37:49 cnvqe-07.lab.eng.tlv2.redhat.com kernel: device vethe45f748b entered promiscuous mode
Oct 08 10:38:22 cnvqe-07.lab.eng.tlv2.redhat.com kernel: device vethe45f748b left promiscuous mode
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com kernel: CPU 2/KVM invoked oom-killer: gfp_mask=0x7000c0(GFP_KERNEL_ACCOUNT), nodemask=(null), order=0, oom_score_adj=-998
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com kernel: CPU 2/KVM cpuset=crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope mems_allowed=0-1
Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com kernel: CPU: 2 PID: 1985360 Comm: CPU 2/KVM Not tainted 4.18.0-193.24.1.el8_2.dt1.x86_64 #1
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Hardware name: Dell Inc. PowerEdge R440/04JN2K, BIOS 2.4.8 11/27/2019
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Call Trace:
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  dump_stack+0x5c/0x80
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  dump_header+0x6e/0x27a
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  oom_kill_process.cold.28+0xb/0x10
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  out_of_memory+0x1ba/0x490
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  mem_cgroup_out_of_memory+0x49/0x80
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  try_charge+0x6fa/0x780
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  __memcg_kmem_charge_memcg+0x33/0x90
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  __memcg_kmem_charge+0x80/0x190
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  __alloc_pages_nodemask+0x1eb/0x280
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  __get_free_pages+0xa/0x30
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  mmu_topup_memory_caches+0x8c/0xf0 [kvm]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  tdp_page_fault+0x69/0x2b0 [kvm]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  ? hrtimer_cancel+0x15/0x20
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  ? restart_apic_timer+0x9c/0x130 [kvm]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  kvm_mmu_page_fault+0x75/0x5d0 [kvm]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  ? vmx_set_msr+0x2f2/0xcd0 [kvm_intel]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  ? x86_virt_spec_ctrl+0x68/0xe0
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  vcpu_enter_guest+0x315/0x1680 [kvm]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  ? __apic_accept_irq+0x1a1/0x2e0 [kvm]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  kvm_arch_vcpu_ioctl_run+0xcf/0x540 [kvm]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  kvm_vcpu_ioctl+0x232/0x5f0 [kvm]
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  ? __wake_up_common+0x7a/0x190
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  do_vfs_ioctl+0xa4/0x630
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  ksys_ioctl+0x60/0x90
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  __x64_sys_ioctl+0x16/0x20
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  do_syscall_64+0x5b/0x1a0
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel:  entry_SYSCALL_64_after_hwframe+0x65/0xca
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RIP: 0033:0x7f1a499a787b
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Code: 0f 1e fa 48 8b 05 0d 96 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 95 2c 00 f7 d8 64 89 01 48
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RSP: 002b:00007f1a327fb618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RAX: ffffffffffffffda RBX: 0000563f87375200 RCX: 00007f1a499a787b
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000027
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RBP: 0000000000000000 R08: 0000563f856b14f0 R09: 00007f1a24472890
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000001
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: R13: 0000563f8568d220 R14: 0000000000000000 R15: 00007f1a4eb44000
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Task in /kubepods.slice/kubepods-pod64516f15_013a_4c1d_8f03_9b0f1ac98ace.slice/crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope killed as a result of limit of /kubepods.slice/kubepods-pod64516f15_013a_4c1d_8f03_9b0f1ac98ace.slice/crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: memory: usage 8579104kB, limit 8579144kB, failcnt 0
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: memory+swap: usage 8579104kB, limit 8579144kB, failcnt 665
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: kmem: usage 86108kB, limit 9007199254740988kB, failcnt 0
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Memory cgroup stats for /kubepods.slice/kubepods-pod64516f15_013a_4c1d_8f03_9b0f1ac98ace.slice/crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope: cache:108KB rss:8492728KB rss_huge:8419328KB shmem:0KB mapped_file:0KB dirty:264KB writeback:132KB swap:0KB inactive_anon:0KB active_anon:8492928KB inactive_file:0KB active_file:0KB unevictable:0KB
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1984292]     0 1984292   245291    12973   598016        0          -998 virt-launcher
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1984339]     0 1984339   337463    15075   651264        0          -998 virt-launcher
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1984363]     0 1984363   343158     6943   544768        0          -998 libvirtd
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1984365]     0 1984365    43479     3798   376832        0          -998 virtlogd
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1985348]   107 1985348  3522840  2114432 18427904        0          -998 qemu-kvm
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Memory cgroup out of memory: Killed process 1984292 (virt-launcher) total-vm:981164kB, anon-rss:18012kB, file-rss:33880kB, shmem-rss:0kB, UID:0
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: oom_reaper: reaped process 1984292 (virt-launcher), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: k6t-eth0: port 2(tap0) entered disabled state
Oct 08 10:40:17 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Oct 08 10:40:17 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_UP): vetheff84337: link is not ready
Oct 08 10:40:17 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vetheff84337: link becomes ready

Comment 20 Andreas Karis 2020-10-08 11:31:24 UTC
I'm going off on a tangent but  ....

The memory calculation issue also affected only the request calculation. While the output of `oc describe pod` is different and indeed calculates a different value for the requested memory, we see the same memory for limits as in my earlier tests:
~~~
        resources:
          requests:
            cpu: "3"
            memory: 10240M
          limits:
            cpu: "3"
            memory: 10240M
#          overcommitGuestOverhead: true
        memory:
          guest: 10240M
~~~

~~~
[root@cnvqe-01 ~]# oc apply -f akaris-vm.yaml 
virtualmachine.kubevirt.io/akaris-test-vm created
[root@cnvqe-01 ~]# cat akaris-vm.yaml 
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  annotations:
    kubevirt.io/latest-observed-api-version: v1alpha3
    kubevirt.io/storage-observed-api-version: v1alpha3
    name.os.template.kubevirt.io/fedora31: Fedora 31
  labels:
    app: akaris-test-vm
    flavor.template.kubevirt.io/Custom: "true"
    os.template.kubevirt.io/fedora31: "true"
    vm.kubevirt.io/template.namespace: openshift
    vm.kubevirt.io/template.revision: "1"
    vm.kubevirt.io/template.version: v0.9.1
    workload.template.kubevirt.io/server: "true"
  name: akaris-test-vm
spec:
  running: true
  template:
    metadata:
      labels:
        flavor.template.kubevirt.io/Custom: "true"
        kubevirt.io/domain: akaris-test-vm
        kubevirt.io/size: large
        os.template.kubevirt.io/fedora31: "true"
        vm.kubevirt.io/name: akaris-test-vm
        workload.template.kubevirt.io/server: "true"
    spec:
      domain:
        cpu:
          cores: 3
          model: host-model
        devices:
          disks:
          - bootOrder: 1
            disk:
              bus: virtio
            name: rootdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
          interfaces:
          - masquerade: {}
            model: virtio
            name: nic0
          networkInterfaceMultiqueue: true
          rng: {}
        features:
          acpi:
            enabled: true
        machine:
          type: q35
        resources:
          requests:
            cpu: "3"
            memory: 10240M
          limits:
            cpu: "3"
            memory: 10240M
#          overcommitGuestOverhead: true
        memory:
          guest: 10240M
      evictionStrategy: LiveMigrate
      hostname: akaris-test-vm
      networks:
      - name: nic0
        pod: {}
      terminationGracePeriodSeconds: 0
      volumes:
      - containerDisk:
          image: docker.io/kubevirt/fedora-cloud-container-disk-demo
        name: rootdisk
      - cloudInitNoCloud:
          userData: |
            #cloud-config
            name: default
            hostname: akaris-test-vm
            ssh_authorized_keys:
              - >-
                ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDEn6wI5YZbCbwUX8RlBYkkYGFK8vTXyeuOJWf6WrWcLi2rKfCoXWNxAw2ekLqIQRpc2xI5wCquWPeZ3aiHpqlocbFgLuGEBIqciJtGnI1IGOZtdccfEe8MRKJMjy3EnktGnHiFkxPXE7UYglujq1GN2BAzgGKigjOLu+k56tVWs+U96KtZz81P3sE4wye4rCm0R0lS4kaEX5bQ8ZAQG4KnWVuuW/PMm2g5n8FImjBz2t7kW3w4N9O4/yvUpgrsR75FXwdZ4gfBLxcKm2oqXHker31UFo4R5uuov5KO43OEOswGu/JmbcBm5T9ozyE/9RE+iuMrYIDWV9v8SgwT7Tpm24dKpcRWO968jDzaWPgGL8EN4y9uDYx7oIlYmwzJ5xFiDgQh+kwWi5p67TD2Iuxmxp5yOjv9ZKJKSk+wU8Z3407kB/AoempquSVXPU6jfZzAxFWlsQdJOd1ETmgcAjbF/WhFGR1LjJrzK6Nm3Gr2HDA2VAmOnLzyh7BBIHZSqLM= core.eng.tlv2.redhat.com
        name: cloudinitdisk
~~~

~~~
[root@cnvqe-01 ~]# oc describe pod virt-launcher-akaris-test-vm-q8rf4  | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:|10240|overcommitGuestOverhead|10438331648|guest'
    Limits:
      memory:  40M
    Requests:
      memory:     40M
    Limits:
      memory:  40M
    Requests:
      memory:     40M
Containers:
  compute:
    Limits:
      memory:                         10438331648
    Requests:
      memory:                         10438331648
  volumerootdisk:
    Limits:
      memory:  40M
    Requests:
      memory:     40M
~~~

Compare this to: https://bugzilla.redhat.com/show_bug.cgi?id=1885418#c8 where we saw:
~~~
  compute:
    Limits:
      memory:                         10438331648
    Requests:
      memory:                         10240M
~~~

10240M should always translate to 10240M though, meaning 10240000000 Bytes.
10240Mi on the other hand should translate to 10737418240 Bytes.

10438331648 - 10240000000 = 198331648 (189 MiB for overhead)

So it seems that the Request memory is not fixed.

However, the actual difference in cgroup limits and in pod output is still there, so the upstream fix isn't working right (?). Or is this a kubernetes issue or am I being overly pedantic?
~~~
[root@cnvqe-07 ~]# cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit'
hierarchical_memory_limit 10438328320
total_rss 839790592
total_rss_huge 761266176
~~~

10438328320 - 10438331648

[akaris@linux ~]$ echo "10438328320 - 10438331648" | bc
-3328

The spawned VM memory is also still exactly the same amount as before:
~~~
[root@cnvqe-07 ~]# pgrep -f guest=test-bz-1885418_akaris-test-vm | while read pid ; do echo === $pid === ; ps aux | grep $pid |  grep -o '\-m [0-9]*' ; pmap $pid | tail ; done
=== 2379976 ===
-m 9766
00007fc117288000     12K rw-s-   [ anon ]
00007fc11728b000      4K rw-s- zero (deleted)
00007fc11728c000      4K r---- ld-2.28.so
00007fc11728d000      4K rw--- ld-2.28.so
00007fc11728e000      4K rw---   [ anon ]
00007fff3783a000    136K rw---   [ stack ]
00007fff37875000     12K r----   [ anon ]
00007fff37878000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]
 total         15318000K
~~~

10240×1000×1000÷1024÷1024=~9766

That's indeed what was requested manually.

For comparison, I spawned the same VM with:
~~~
        resources:
          requests:
            cpu: "3"
            memory: 10240M
          limits:
            cpu: "3"
            memory: 10240M
#          overcommitGuestOverhead: true
#        memory:
#          guest: 10240M
~~~

So without explicit memory request - I get *exactly* the same values:
~~~
[root@cnvqe-07 ~]# pgrep -f guest=test-bz-1885418_akaris-test-vm | while read pid ; do echo === $pid === ; ps aux | grep $pid |  grep -o '\-m [0-9]*' ; pmap $pid | tail ; done
=== 2491260 ===
-m 9766
00007fab74dfe000     12K rw-s-   [ anon ]
00007fab74e01000      4K rw-s- zero (deleted)
00007fab74e02000      4K r---- ld-2.28.so
00007fab74e03000      4K rw--- ld-2.28.so
00007fab74e04000      4K rw---   [ anon ]
00007ffd71ca5000    136K rw---   [ stack ]
00007ffd71cd7000     12K r----   [ anon ]
00007ffd71cda000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]
 total         11297228K
~~~

~~~
[root@cnvqe-07 ~]# cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit'
hierarchical_memory_limit 10438328320
total_rss 836296704
total_rss_huge 769654784
~~~

~~~
[root@cnvqe-01 ~]# oc describe pod virt-launcher-akaris-test-vm-zkksm  | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:|10240|overcommitGuestOverhead|10438331648|guest'
    Limits:
      memory:  40M
    Requests:
      memory:     40M
    Limits:
      memory:  40M
    Requests:
      memory:     40M
Containers:
  compute:
    Limits:
      memory:                         10438331648
    Requests:
      memory:                         10438331648
  volumerootdisk:
    Limits:
      memory:  40M
    Requests:
      memory:     40M
~~~

I also spawned a VM with 8G for comparison:
~~~
[root@cnvqe-01 ~]# oc describe pod virt-launcher-akaris-test-vm-m2bg2  | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:|10240|overcommitGuestOverhead|10438331648|guest'
    Limits:
      memory:  40M
    Requests:
      memory:     40M
    Limits:
      memory:  40M
    Requests:
      memory:     40M
Containers:
  compute:
    Limits:
      memory:                         8193956648
    Requests:
      memory:                         8193956648
  volumerootdisk:
    Limits:
      memory:  40M
    Requests:
      memory:     40M
~~~

~~~
[root@cnvqe-07 ~]# pgrep -f guest=test-bz-1885418_akaris-test-vm | while read pid ; do echo === $pid === ; ps aux | grep $pid |  grep -o '\-m [0-9]*' ; pmap $pid | tail ; done
=== 2516957 ===
-m 7630
00007fcf336b4000     12K rw-s-   [ anon ]
00007fcf336b7000      4K rw-s- zero (deleted)
00007fcf336b8000      4K r---- ld-2.28.so
00007fcf336b9000      4K rw--- ld-2.28.so
00007fcf336ba000      4K rw---   [ anon ]
00007ffc63046000    136K rw---   [ stack ]
00007ffc630a2000     12K r----   [ anon ]
00007ffc630a5000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]
 total         13285652K
~~~

~~~
[root@cnvqe-07 ~]# cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit'
hierarchical_memory_limit 8193953792
total_rss 775507968
total_rss_huge 708837376
~~~

~~~
[root@cnvqe-07 ~]# cat /sys/fs/cgroup/memory$cgroup/memory.stat 
cache 811008
rss 758087680
rss_huge 692060160
shmem 0
mapped_file 0
dirty 0
writeback 0
swap 0
pgpgin 37686
pgpgout 25613
pgfault 46101
pgmajfault 0
inactive_anon 0
active_anon 741584896
inactive_file 17473536
active_file 73728
unevictable 0
hierarchical_memory_limit 8193953792
hierarchical_memsw_limit 8193953792
total_cache 811008
total_rss 758087680
total_rss_huge 692060160
total_shmem 0
total_mapped_file 0
total_dirty 0
total_writeback 0
total_swap 0
total_pgpgin 37686
total_pgpgout 25613
total_pgfault 46101
total_pgmajfault 0
total_inactive_anon 0
total_active_anon 741584896
total_inactive_file 17473536
total_active_file 73728
total_unevictable 0
[root@cnvqe-07 ~]# 
~~~

8193956648 - 8193953792 = 2856

Again, there's an offset here.

8×1000×1000×1000−8193956648=−193956648
−193956648÷1024÷1024=−184.9715 MiB of buffer/overhead that we reserve for qemu

Comment 21 Andreas Karis 2020-10-08 11:33:34 UTC
In my earlier comment, I meant to say:
~~~
So it seems that the Request memory is noW fixed.
~~~

I had written:
~~~
So it seems that the Request memory is not fixed.
~~~

Comment 23 Vladik Romanovsky 2020-10-09 17:00:57 UTC
Stu and I took closer into this issue.

I'm pretty confident at this point that this issue right now is just about adjusting our memory overhead calculation.
So far it was just a rough estimate. More research is needed for us to converge on the numbers.

We've ran multiple combinations of stressed VMs, with and without memory limits.
Looking at the cgroup memory.stat of the compute container we can see that the memory stays well within range on the limits, although there are spikes that lead to OOM.
So far we've seen that OOM does occur when adding ~5Mi overhead. 

Obviously, there can be multiple factors that may cause qemu to consume more memory (i/o threads, emulator thread, vcpus, network...), however, so far we've seen that our previous estimate wasn't off by much.

As a workaround, we can manually request a higher memory limit (resources.limits.memory) than the guest's memory.
This will effectively increase the memory overhead.

Here is an example which adds 10Mi overhead to the limits. Guest memory is 8Gi.

    spec:
      domain:
        cpu:
          cores: 16
        memory:
          guest: 8Gi
....
....
        resources:
          requests:
            cpu: "16"
            memory: 8202Mi
          limits:
            cpu: "16"
            memory: 8202Mi

Comment 24 Vladik Romanovsky 2020-10-09 17:29:18 UTC
In my previous comment, I meant to say that OOM does NOT occur when adding ~5Mi overhead.

Comment 37 Vladik Romanovsky 2020-11-18 13:32:09 UTC
We've added 10Mi to our base overhead calculation, along with some other minor improvements, in this PR: https://github.com/kubevirt/kubevirt/pull/4409

Comment 39 Adam Litke 2021-01-07 21:21:56 UTC
Vladik, can this be moved to ON_QA?

Comment 41 Vladik Romanovsky 2021-01-07 21:40:49 UTC
(In reply to Adam Litke from comment #39)
> Vladik, can this be moved to ON_QA?

Yes. I think so.

Comment 43 Israel Pinto 2021-01-10 11:02:11 UTC
Based on the recommendation we verify the if Requests &  Limits are  > Guest Memory
by 100Mi we don't see the OOM. 
Costumer approved it.

Comment 46 errata-xmlrpc 2021-03-10 11:18:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0799

Comment 47 Red Hat Bugzilla 2023-09-15 00:49:13 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.