Description of problem: CNV does not calculate a sufficient overhead for qemu-kvm to avoid OOM situations when memory limits are set. When providing limit and request values, CNV will set the qemu-kvm proesses memory according to the limit request + overhead calculation instead of according to the request (as I think it should be). This will lead to an OOM situation in my tests, too. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
In the following tests, I use this YAML as a baseline: ~~~ [root@openshift-jumpserver-0 ~]# cat fedora-vm.yaml apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachine metadata: annotations: kubevirt.io/latest-observed-api-version: v1alpha3 kubevirt.io/storage-observed-api-version: v1alpha3 name.os.template.kubevirt.io/fedora31: Fedora 31 labels: app: fedora-test-vm flavor.template.kubevirt.io/Custom: "true" os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/template.namespace: openshift vm.kubevirt.io/template.revision: "1" vm.kubevirt.io/template.version: v0.9.1 workload.template.kubevirt.io/server: "true" name: fedora-test-vm namespace: test spec: running: true template: metadata: labels: flavor.template.kubevirt.io/Custom: "true" kubevirt.io/domain: fedora-test-vm kubevirt.io/size: large os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/name: fedora-test-vm workload.template.kubevirt.io/server: "true" spec: domain: cpu: cores: 3 model: host-model devices: disks: - bootOrder: 1 disk: bus: virtio name: rootdisk - disk: bus: virtio name: cloudinitdisk interfaces: - masquerade: {} model: virtio name: nic0 networkInterfaceMultiqueue: true rng: {} features: acpi: enabled: true machine: type: q35 resources: requests: cpu: "3" memory: 8192Mi evictionStrategy: LiveMigrate hostname: fedora-test-vm networks: - name: nic0 pod: {} terminationGracePeriodSeconds: 0 volumes: - containerDisk: image: docker.io/kubevirt/fedora-cloud-container-disk-demo name: rootdisk - cloudInitNoCloud: userData: | #cloud-config name: default hostname: fedora-test-vm ssh_authorized_keys: - >- ssh-rsa AAAAB3NzaC(...)hT/a4jl49IU= name: cloudinitdisk ~~~ I then expose the VM's SSH port so that I can SSH into the VM from openshift-worker-1. This also allows me to run an SCP stress test. ~~~ virtctl expose vmi fedora-test-vm --port=20022 --target-port=22 --name=fedora-test-vm-ssh ~~~ I then stress test the VM as follows. All but openshift-worker-1 are tainted, so the VM will only be scheduled there. Only one VM exists at a given time on openshift-worker-1. I stress the VM memory with stress-ng inside VM: ~~~ ssh -i gss-stack-tools -p 20022 fedora.54.202 yum install stress-ng -y stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1 ~~~ Network load - from hypervisor run: ~~~ dd if=/dev/zero of=test.txt bs=1G count=1 sudo ip route add 172.30.54.202 via 172.24.2.1 dev ovn-k8s-mp0 src 172.24.2.2 mtu 1400 # my virtual network uses jumbo frame and there is an unrelated issue with jumbo frame MTU so this route is a workaround while true ; do scp -i gss-stack-tools -P 20022 test.txt fedora.54.202: ; done ~~~ I look for memory consumption and OOM errors with: ~~~ journalctl -f | grep -i oomkilled ~~~ And: ~~~ cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit' ; sleep 2 ; done ~~~
Example for how the OOM issue cannot be reproduced. Only requests [0]: ~~~ resources: requests: cpu: "3" memory: 8192Mi ~~~ Resulting pod: ~~~ [root@openshift-jumpserver-0 ~]# oc describe pod | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:' Limits: memory: 40M Requests: memory: 1M Containers: compute: Limits: Requests: memory: 8579144Ki volumerootdisk: Limits: memory: 40M Requests: memory: 1M ~~~ Resulting qemu-kvm: ~~~ [root@openshift-worker-1 ~]# pgrep qemu-kvm | while read pid ; do echo === $pid === ; ps aux | grep $pid | grep -o '\-m [0-9]*' ; pmap $pid | tail ; done === 1106212 === -m 8192 00007f87348fd000 12K rw-s- [ anon ] 00007f8734900000 4K rw-s- zero (deleted) 00007f8734901000 4K r---- ld-2.28.so 00007f8734902000 4K rw--- ld-2.28.so 00007f8734903000 4K rw--- [ anon ] 00007fff9c57b000 132K rw--- [ stack ] 00007fff9c5ba000 12K r---- [ anon ] 00007fff9c5bd000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 10523172K ~~~ Resulting limits = entire system memory: ~~~ [root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit' ; sleep 2 ; done Mon Oct 5 20:34:53 UTC 2020 hierarchical_memory_limit 134013546496 total_rss 786513920 total_rss_huge 721420288 ~~~ ~~~ [root@openshift-worker-1 ~]# echo "134013546496 / 1024 / 1024" | bc 127805 [root@openshift-worker-1 ~]# free -m total used free shared buff/cache available Mem: 128829 5463 115112 14 8253 122976 Swap: 0 0 0 ~~~ Stress-ng inside VM: ~~~ ssh -i gss-stack-tools -p 20022 fedora.54.202 yum install stress-ng -y stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1 ~~~ Network load - from hypervisor run: ~~~ dd if=/dev/zero of=test.txt bs=1G count=1 sudo ip route add 172.30.54.202 via 172.24.2.1 dev ovn-k8s-mp0 src 172.24.2.2 mtu 1400 while true ; do scp -i gss-stack-tools -P 20022 test.txt fedora.54.202: ; done ~~~ Issue could not be reproduced. =================================== [0] ~~~ [root@openshift-jumpserver-0 ~]# cat fedora-vm.yaml apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachine metadata: annotations: kubevirt.io/latest-observed-api-version: v1alpha3 kubevirt.io/storage-observed-api-version: v1alpha3 name.os.template.kubevirt.io/fedora31: Fedora 31 labels: app: fedora-test-vm flavor.template.kubevirt.io/Custom: "true" os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/template.namespace: openshift vm.kubevirt.io/template.revision: "1" vm.kubevirt.io/template.version: v0.9.1 workload.template.kubevirt.io/server: "true" name: fedora-test-vm namespace: test spec: running: true template: metadata: labels: flavor.template.kubevirt.io/Custom: "true" kubevirt.io/domain: fedora-test-vm kubevirt.io/size: large os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/name: fedora-test-vm workload.template.kubevirt.io/server: "true" spec: domain: cpu: cores: 3 model: host-model devices: disks: - bootOrder: 1 disk: bus: virtio name: rootdisk - disk: bus: virtio name: cloudinitdisk interfaces: - masquerade: {} model: virtio name: nic0 networkInterfaceMultiqueue: true rng: {} features: acpi: enabled: true machine: type: q35 resources: requests: cpu: "3" memory: 8192Mi evictionStrategy: LiveMigrate hostname: fedora-test-vm networks: - name: nic0 pod: {} terminationGracePeriodSeconds: 0 volumes: - containerDisk: image: docker.io/kubevirt/fedora-cloud-container-disk-demo name: rootdisk - cloudInitNoCloud: userData: | #cloud-config name: default hostname: fedora-test-vm ssh_authorized_keys: - >- ssh-rsa AAAA(...)l49IU= name: cloudinitdisk ~~~
Request != Limit with nothing else [0]. OOM can be reproduced. I think this here might be a bug, as the qemu-kvm memory is set according to limits when as a user I'd expect it to be set according to requests. ~~~ resources: requests: cpu: "3" memory: 8192Mi limits: cpu: "3" memory: 8704Mi ~~~ Pod description: ~~~ [root@openshift-jumpserver-0 ~]# oc describe pod | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:' Limits: memory: 40M Requests: memory: 1M Containers: compute: Limits: memory: 9103432Ki Requests: memory: 8579144Ki volumerootdisk: Limits: memory: 40M Requests: memory: 1M ~~~ Resulting limits: ~~~ [root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit' ; sleep 2 ; done Mon Oct 5 20:02:03 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 788336640 total_rss_huge 723517440 ~~~ Resutling qemu-kvm memory configuration: ~~~ [root@openshift-worker-1 ~]# pgrep qemu-kvm | while read pid ; do echo === $pid === ; ps aux | grep $pid | grep -o '\-m [0-9]*' ; pmap $pid | tail ; done === 1030316 === -m 8704 00007f3726efb000 12K rw-s- [ anon ] 00007f3726efe000 4K rw-s- zero (deleted) 00007f3726eff000 4K r---- ld-2.28.so 00007f3726f00000 4K rw--- ld-2.28.so 00007f3726f01000 4K rw--- [ anon ] 00007ffe6a689000 132K rw--- [ stack ] 00007ffe6a7f2000 12K r---- [ anon ] 00007ffe6a7f5000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 13949548K ~~~ We can see that libvirt memory is set according to limits + some overhead: ~~~ [root@openshift-worker-1 ~]# echo "9321914368 / 1024 / 1024" | bc 8890 ~~~ Stress-ng inside VM: ~~~ ssh -i gss-stack-tools -p 20022 fedora.54.202 yum install stress-ng -y stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1 ~~~ Network load - from hypervisor run: ~~~ dd if=/dev/zero of=test.txt bs=1G count=1 sudo ip route add 172.30.54.202 via 172.24.2.1 dev ovn-k8s-mp0 src 172.24.2.2 mtu 1400 while true ; do scp -i gss-stack-tools -P 20022 test.txt fedora.54.202: ; done ~~~ Reproduce the OOM killer issue: ~~~ [root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit' ; sleep 2 ; done Mon Oct 5 20:28:43 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 4644237312 total_rss_huge 4559208448 Mon Oct 5 20:28:45 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 6227689472 total_rss_huge 6142558208 Mon Oct 5 20:28:47 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 7760707584 total_rss_huge 7675576320 Mon Oct 5 20:28:49 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9149022208 total_rss_huge 9063890944 Mon Oct 5 20:28:51 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9224306688 total_rss_huge 9143582720 Mon Oct 5 20:28:53 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9224441856 total_rss_huge 9143582720 Mon Oct 5 20:28:55 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9226633216 total_rss_huge 9145679872 Mon Oct 5 20:28:57 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9228836864 total_rss_huge 9147777024 Mon Oct 5 20:28:59 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9228959744 total_rss_huge 9147777024 Mon Oct 5 20:29:01 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9228857344 total_rss_huge 9147777024 Mon Oct 5 20:29:03 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9228865536 total_rss_huge 9147777024 Mon Oct 5 20:29:05 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9228865536 total_rss_huge 9147777024 Mon Oct 5 20:29:07 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9228988416 total_rss_huge 9147777024 Mon Oct 5 20:29:09 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9228988416 total_rss_huge 9147777024 Mon Oct 5 20:29:11 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9228988416 total_rss_huge 9147777024 Mon Oct 5 20:29:13 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9228988416 total_rss_huge 9147777024 Mon Oct 5 20:29:15 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229012992 total_rss_huge 9147777024 Mon Oct 5 20:29:18 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229082624 total_rss_huge 9147777024 Mon Oct 5 20:29:20 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229090816 total_rss_huge 9147777024 Mon Oct 5 20:29:22 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229090816 total_rss_huge 9147777024 Mon Oct 5 20:29:24 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229094912 total_rss_huge 9147777024 Mon Oct 5 20:29:26 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229094912 total_rss_huge 9147777024 Mon Oct 5 20:29:28 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229094912 total_rss_huge 9147777024 Mon Oct 5 20:29:30 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9227079680 total_rss_huge 9145679872 Mon Oct 5 20:29:32 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9227091968 total_rss_huge 9145679872 Mon Oct 5 20:29:34 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9227100160 total_rss_huge 9145679872 Mon Oct 5 20:29:36 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9227100160 total_rss_huge 9145679872 Mon Oct 5 20:29:38 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9227100160 total_rss_huge 9145679872 Mon Oct 5 20:29:40 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9227100160 total_rss_huge 9145679872 Mon Oct 5 20:29:42 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9227100160 total_rss_huge 9145679872 Mon Oct 5 20:29:44 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229205504 total_rss_huge 9147777024 Mon Oct 5 20:29:46 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229205504 total_rss_huge 9147777024 Mon Oct 5 20:29:48 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229205504 total_rss_huge 9147777024 Mon Oct 5 20:29:50 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229205504 total_rss_huge 9147777024 Mon Oct 5 20:29:52 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229221888 total_rss_huge 9147777024 Mon Oct 5 20:29:54 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229221888 total_rss_huge 9147777024 Mon Oct 5 20:29:56 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229221888 total_rss_huge 9147777024 Mon Oct 5 20:29:58 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229221888 total_rss_huge 9147777024 Mon Oct 5 20:30:00 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229221888 total_rss_huge 9147777024 Mon Oct 5 20:30:02 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229221888 total_rss_huge 9147777024 Mon Oct 5 20:30:04 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229221888 total_rss_huge 9147777024 Mon Oct 5 20:30:06 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229221888 total_rss_huge 9147777024 Mon Oct 5 20:30:08 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229221888 total_rss_huge 9147777024 Mon Oct 5 20:30:10 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229221888 total_rss_huge 9147777024 Mon Oct 5 20:30:12 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9229221888 total_rss_huge 9147777024 Mon Oct 5 20:30:14 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9227591680 total_rss_huge 9149874176 Mon Oct 5 20:30:16 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 9227591680 total_rss_huge 9149874176 Mon Oct 5 20:30:18 UTC 2020 hierarchical_memory_limit 9321914368 total_rss 86454272 total_rss_huge 39845888 Mon Oct 5 20:30:20 UTC 2020 cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0a6d58f0_176c_4d7b_81f4_87d5f2cf46f3.slice/crio-a4b729c945d076cd8cc312cdc46c0df732821ddda1db909cb5fdce7f5493e620.scope/memory.stat: No such file or directory Mon Oct 5 20:30:22 UTC 2020 cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0a6d58f0_176c_4d7b_81f4_87d5f2cf46f3.slice/crio-a4b729c945d076cd8cc312cdc46c0df732821ddda1db909cb5fdce7f5493e620.scope/memory.stat: No such file or directory ~~~ ~~~ [root@fedora-test-vm ~]# stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1 stress-ng: debug: [9883] 3 processors online, 3 processors configured stress-ng: info: [9883] dispatching hogs: 1 vm, 1 hdd, 1 io stress-ng: debug: [9883] cache allocate: default cache size: 16384K stress-ng: debug: [9883] starting stressors stress-ng: debug: [9883] 3 stressors spawned stress-ng: debug: [9885] stress-ng-hdd: started [9885] (instance 0) stress-ng: debug: [9886] stress-ng-io: started [9886] (instance 0) stress-ng: debug: [9884] stress-ng-vm: started [9884] (instance 0) stress-ng: debug: [9884] stress-ng-vm using method 'all' ~~~ ~~ [core@openshift-worker-1 ~]$ while true ; do scp -i gss-stack-tools -P 20022 test.txt fedora.54.202: ; done test.txt 100% 1024MB 231.2MB/s 00:04 test.txt 100% 1024MB 259.3MB/s 00:03 test.txt 100% 1024MB 205.7MB/s 00:04 test.txt 100% 1024MB 250.8MB/s 00:04 test.txt 100% 1024MB 260.7MB/s 00:03 ~~ ~~~ message from the journal: ~~~ [root@openshift-worker-1 ~]# journalctl -f | grep -i oomkilled Oct 05 20:30:19 openshift-worker-1.example.com hyperkube[2223]: I1005 20:30:19.455745 2223 status_manager.go:570] Patch status for pod "virt-launcher-fedora-test-vm-q625c_test(0a6d58f0-176c-4d7b-81f4-87d5f2cf46f3)" with "{\"metadata\":{\"uid\":\"0a6d58f0-176c-4d7b-81f4-87d5f2cf46f3\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"reason\":\"PodCompleted\",\"type\":\"Initialized\"},{\"lastTransitionTime\":\"2020-10-05T20:30:19Z\",\"reason\":\"PodCompleted\",\"status\":\"False\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-10-05T20:30:19Z\",\"reason\":\"PodCompleted\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://a4b729c945d076cd8cc312cdc46c0df732821ddda1db909cb5fdce7f5493e620\",\"image\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"imageID\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"lastState\":{},\"name\":\"compute\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://a4b729c945d076cd8cc312cdc46c0df732821ddda1db909cb5fdce7f5493e620\",\"exitCode\":0,\"finishedAt\":\"2020-10-05T20:30:18Z\",\"reason\":\"OOMKilled\",\"startedAt\":\"2020-10-05T20:00:07Z\"}}},{\"containerID\":\"cri-o://d4c1b2dd7ea7e089eff7615a30a140fa90c8acfa31b8db7b62b649012c733ca1\",\"image\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo:latest\",\"imageID\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331\",\"lastState\":{},\"name\":\"volumerootdisk\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://d4c1b2dd7ea7e089eff7615a30a140fa90c8acfa31b8db7b62b649012c733ca1\",\"exitCode\":0,\"finishedAt\":\"2020-10-05T20:30:17Z\",\"reason\":\"Completed\",\"startedAt\":\"2020-10-05T20:00:07Z\"}}}],\"phase\":\"Succeeded\"}}" Oct 05 20:30:19 openshift-worker-1.example.com hyperkube[2223]: I1005 20:30:19.455810 2223 status_manager.go:578] Status for pod "virt-launcher-fedora-test-vm-q625c_test(0a6d58f0-176c-4d7b-81f4-87d5f2cf46f3)" updated successfully: (5, {Phase:Succeeded Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 20:00:07 +0000 UTC Reason:PodCompleted Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 20:30:19 +0000 UTC Reason:PodCompleted Message:} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 20:30:19 +0000 UTC Reason:PodCompleted Message:} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 20:04:16 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:192.168.123.221 PodIP:172.24.2.17 PodIPs:[{IP:172.24.2.17}] StartTime:2020-10-05 20:00:04 +0000 UTC InitContainerStatuses:[{Name:container-disk-binary State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-05 20:00:06 +0000 UTC,FinishedAt:2020-10-05 20:00:06 +0000 UTC,ContainerID:cri-o://1f77c1391a2f82692015bc07cf0ded891abbd7566c1e35c03cf2b1d401124101,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://1f77c1391a2f82692015bc07cf0ded891abbd7566c1e35c03cf2b1d401124101 Started:<nil>}] ContainerStatuses:[{Name:compute State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:OOMKilled,Message:,StartedAt:2020-10-05 20:00:07 +0000 UTC,FinishedAt:2020-10-05 20:30:18 +0000 UTC,ContainerID:cri-o://a4b729c945d076cd8cc312cdc46c0df732821ddda1db909cb5fdce7f5493e620,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://a4b729c945d076cd8cc312cdc46c0df732821ddda1db909cb5fdce7f5493e620 Started:0xc00331fe72} {Name:volumerootdisk State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-05 20:00:07 +0000 UTC,FinishedAt:2020-10-05 20:30:17 +0000 UTC,ContainerID:cri-o://d4c1b2dd7ea7e089eff7615a30a140fa90c8acfa31b8db7b62b649012c733ca1,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://d4c1b2dd7ea7e089eff7615a30a140fa90c8acfa31b8db7b62b649012c733ca1 Started:0xc00331fe73}] QOSClass:Burstable EphemeralContainerStatuses:[]}) Oct 05 20:30:19 openshift-worker-1.example.com hyperkube[2223]: I1005 20:30:19.455945 2223 kubelet_pods.go:961] Pod "virt-launcher-fedora-test-vm-q625c_test(0a6d58f0-176c-4d7b-81f4-87d5f2cf46f3)" is terminated, but some containers have not been cleaned up: {ID:{Type:cri-o ID:d4c1b2dd7ea7e089eff7615a30a140fa90c8acfa31b8db7b62b649012c733ca1} Name:volumerootdisk State:exited CreatedAt:2020-10-05 20:00:07.924820053 +0000 UTC StartedAt:2020-10-05 20:00:07.972630569 +0000 UTC FinishedAt:2020-10-05 20:30:17.989063778 +0000 UTC ExitCode:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 Hash:174137120 RestartCount:0 Reason:Completed Message:} {ID:{Type:cri-o ID:a4b729c945d076cd8cc312cdc46c0df732821ddda1db909cb5fdce7f5493e620} Name:compute State:exited CreatedAt:2020-10-05 20:00:07.178925816 +0000 UTC StartedAt:2020-10-05 20:00:07.21124455 +0000 UTC FinishedAt:2020-10-05 20:30:18.525063787 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:4137454197 RestartCount:0 Reason:OOMKilled Message:} {ID:{Type:cri-o ID:1f77c1391a2f82692015bc07cf0ded891abbd7566c1e35c03cf2b1d401124101} Name:container-disk-binary State:exited CreatedAt:2020-10-05 20:00:06.936000415 +0000 UTC StartedAt:2020-10-05 20:00:06.972586276 +0000 UTC FinishedAt:2020-10-05 20:00:06.989032504 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:1161206034 RestartCount:0 Reason:Completed Message:} Oct 05 20:30:19 openshift-worker-1.example.com hyperkube[2223]: I1005 20:30:19.461492 2223 kubelet_pods.go:961] Pod "virt-launcher-fedora-test-vm-q625c_test(0a6d58f0-176c-4d7b-81f4-87d5f2cf46f3)" is terminated, but some containers have not been cleaned up: {ID:{Type:cri-o ID:d4c1b2dd7ea7e089eff7615a30a140fa90c8acfa31b8db7b62b649012c733ca1} Name:volumerootdisk State:exited CreatedAt:2020-10-05 20:00:07.924820053 +0000 UTC StartedAt:2020-10-05 20:00:07.972630569 +0000 UTC FinishedAt:2020-10-05 20:30:17.989063778 +0000 UTC ExitCode:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 Hash:174137120 RestartCount:0 Reason:Completed Message:} {ID:{Type:cri-o ID:a4b729c945d076cd8cc312cdc46c0df732821ddda1db909cb5fdce7f5493e620} Name:compute State:exited CreatedAt:2020-10-05 20:00:07.178925816 +0000 UTC StartedAt:2020-10-05 20:00:07.21124455 +0000 UTC FinishedAt:2020-10-05 20:30:18.525063787 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:4137454197 RestartCount:0 Reason:OOMKilled Message:} {ID:{Type:cri-o ID:1f77c1391a2f82692015bc07cf0ded891abbd7566c1e35c03cf2b1d401124101} Name:container-disk-binary State:exited CreatedAt:2020-10-05 20:00:06.936000415 +0000 UTC StartedAt:2020-10-05 20:00:06.972586276 +0000 UTC FinishedAt:2020-10-05 20:00:06.989032504 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:1161206034 RestartCount:0 Reason:Completed Message:} Oct 05 20:30:19 openshift-worker-1.example.com hyperkube[2223]: I1005 20:30:19.466722 2223 kubelet_pods.go:961] Pod "virt-launcher-fedora-test-vm-q625c_test(0a6d58f0-176c-4d7b-81f4-87d5f2cf46f3)" is terminated, but some containers have not been cleaned up: {ID:{Type:cri-o ID:d4c1b2dd7ea7e089eff7615a30a140fa90c8acfa31b8db7b62b649012c733ca1} Name:volumerootdisk State:exited CreatedAt:2020-10-05 20:00:07.924820053 +0000 UTC StartedAt:2020-10-05 20:00:07.972630569 +0000 UTC FinishedAt:2020-10-05 20:30:17.989063778 +0000 UTC ExitCode:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 Hash:174137120 RestartCount:0 Reason:Completed Message:} {ID:{Type:cri-o ID:a4b729c945d076cd8cc312cdc46c0df732821ddda1db909cb5fdce7f5493e620} Name:compute State:exited CreatedAt:2020-10-05 20:00:07.178925816 +0000 UTC StartedAt:2020-10-05 20:00:07.21124455 +0000 UTC FinishedAt:2020-10-05 20:30:18.525063787 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:4137454197 RestartCount:0 Reason:OOMKilled Message:} {ID:{Type:cri-o ID:1f77c1391a2f82692015bc07cf0ded891abbd7566c1e35c03cf2b1d401124101} Name:container-disk-binary State:exited CreatedAt:2020-10-05 20:00:06.936000415 +0000 UTC StartedAt:2020-10-05 20:00:06.972586276 +0000 UTC FinishedAt:2020-10-05 20:00:06.989032504 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:1161206034 RestartCount:0 Reason:Completed Message:} ^C ~~~ ==================== [0] ~~~ [root@openshift-jumpserver-0 ~]# cat fedora-vm.yaml apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachine metadata: annotations: kubevirt.io/latest-observed-api-version: v1alpha3 kubevirt.io/storage-observed-api-version: v1alpha3 name.os.template.kubevirt.io/fedora31: Fedora 31 labels: app: fedora-test-vm flavor.template.kubevirt.io/Custom: "true" os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/template.namespace: openshift vm.kubevirt.io/template.revision: "1" vm.kubevirt.io/template.version: v0.9.1 workload.template.kubevirt.io/server: "true" name: fedora-test-vm namespace: test spec: running: true template: metadata: labels: flavor.template.kubevirt.io/Custom: "true" kubevirt.io/domain: fedora-test-vm kubevirt.io/size: large os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/name: fedora-test-vm workload.template.kubevirt.io/server: "true" spec: domain: cpu: cores: 3 model: host-model devices: disks: - bootOrder: 1 disk: bus: virtio name: rootdisk - disk: bus: virtio name: cloudinitdisk interfaces: - masquerade: {} model: virtio name: nic0 networkInterfaceMultiqueue: true rng: {} features: acpi: enabled: true machine: type: q35 resources: requests: cpu: "3" memory: 8192Mi limits: cpu: "3" memory: 8704Mi evictionStrategy: LiveMigrate hostname: fedora-test-vm networks: - name: nic0 pod: {} terminationGracePeriodSeconds: 0 volumes: - containerDisk: image: docker.io/kubevirt/fedora-cloud-container-disk-demo name: rootdisk - cloudInitNoCloud: userData: | #cloud-config name: default hostname: fedora-test-vm ssh_authorized_keys: - >- ssh-rsa AAAAB(...)hT/a4jl49IU= name: cloudinitdisk ~~~
Testing with limit = request, without anything else - reproduces the issue, too [0] ~~~ resources: limits: cpu: "3" memory: 8Gi requests: cpu: "3" memory: 8Gi ~ ~~~ Pod description: ~~~ [root@openshift-jumpserver-0 ~]# oc describe pod | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:' Limits: memory: 40M Requests: memory: 40M Containers: compute: Limits: memory: 8579144Ki Requests: memory: 8579144Ki volumerootdisk: Limits: memory: 40M Requests: memory: 40M ~~~ qemu-kvm process memory: ~~~ [root@openshift-worker-1 ~]# pgrep qemu-kvm | while read pid ; do echo === $pid === ; ps aux | grep $pid | grep -o '\-m [0-9]*' ; pmap $pid | tail ; done === 935914 === -m 8192 00007fd1f5b9a000 12K rw-s- [ anon ] 00007fd1f5b9d000 4K rw-s- zero (deleted) 00007fd1f5b9e000 4K r---- ld-2.28.so 00007fd1f5b9f000 4K rw--- ld-2.28.so 00007fd1f5ba0000 4K rw--- [ anon ] 00007fffd8274000 132K rw--- [ stack ] 00007fffd83b3000 12K r---- [ anon ] 00007fffd83b6000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 10582228K ~~~ OOM easily reproduced: ~~~ [root@fedora-test-vm ~]# stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1 stress-ng: debug: [9377] 3 processors online, 3 processors configured stress-ng: info: [9377] dispatching hogs: 1 vm, 1 hdd, 1 io stress-ng: debug: [9377] cache allocate: default cache size: 16384K stress-ng: debug: [9377] starting stressors stress-ng: debug: [9377] 3 stressors spawned stress-ng: debug: [9378] stress-ng-vm: started [9378] (instance 0) stress-ng: debug: [9379] stress-ng-hdd: started [9379] (instance 0) stress-ng: debug: [9380] stress-ng-io: started [9380] (instance 0) stress-ng: debug: [9378] stress-ng-vm using method 'all' client_loop: send disconnect: Broken pipe [core@openshift-worker-1 ~]$ ~~~ ~~~ [core@openshift-worker-1 ~]$ scp -i gss-stack-tools -P 20022 test.txt fedora.54.202: test.txt 100% 1024MB 222.2MB/s 00:04 [core@openshift-worker-1 ~]$ while true ; do scp -i gss-stack-tools -P 20022 test.txt fedora.54.202: ; done test.txt 100% 1024MB 211.0MB/s 00:04 test.txt 100% 1024MB 129.0MB/s 00:07 test.txt 100% 1024MB 147.7MB/s 00:06 ~~~ ~~~ [root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit' ; sleep 2 ; done Mon Oct 5 19:39:31 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 5333508096 total_rss_huge 5257560064 Mon Oct 5 19:39:33 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 6784737280 total_rss_huge 6708789248 Mon Oct 5 19:39:35 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8141832192 total_rss_huge 8065646592 Mon Oct 5 19:39:37 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693645312 total_rss_huge 8617197568 Mon Oct 5 19:39:39 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692658176 total_rss_huge 8619294720 Mon Oct 5 19:39:41 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692658176 total_rss_huge 8619294720 Mon Oct 5 19:39:43 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692682752 total_rss_huge 8619294720 Mon Oct 5 19:39:45 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8694796288 total_rss_huge 8621391872 Mon Oct 5 19:39:47 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8694796288 total_rss_huge 8621391872 Mon Oct 5 19:39:49 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8695336960 total_rss_huge 8621391872 Mon Oct 5 19:39:51 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693264384 total_rss_huge 8619294720 Mon Oct 5 19:39:53 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693284864 total_rss_huge 8619294720 Mon Oct 5 19:39:55 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693329920 total_rss_huge 8619294720 Mon Oct 5 19:39:57 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693374976 total_rss_huge 8619294720 Mon Oct 5 19:39:59 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693411840 total_rss_huge 8619294720 Mon Oct 5 19:40:01 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693411840 total_rss_huge 8619294720 Mon Oct 5 19:40:03 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693411840 total_rss_huge 8619294720 Mon Oct 5 19:40:05 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693411840 total_rss_huge 8615100416 Mon Oct 5 19:40:07 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691314688 total_rss_huge 8608808960 Mon Oct 5 19:40:09 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691445760 total_rss_huge 8608808960 Mon Oct 5 19:40:11 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691490816 total_rss_huge 8608808960 Mon Oct 5 19:40:13 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691490816 total_rss_huge 8608808960 Mon Oct 5 19:40:16 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691527680 total_rss_huge 8608808960 Mon Oct 5 19:40:18 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691539968 total_rss_huge 8608808960 Mon Oct 5 19:40:20 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691539968 total_rss_huge 8608808960 Mon Oct 5 19:40:22 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691539968 total_rss_huge 8608808960 Mon Oct 5 19:40:24 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691539968 total_rss_huge 8608808960 Mon Oct 5 19:40:26 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691539968 total_rss_huge 8608808960 Mon Oct 5 19:40:28 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691539968 total_rss_huge 8608808960 Mon Oct 5 19:40:30 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691539968 total_rss_huge 8608808960 Mon Oct 5 19:40:32 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691539968 total_rss_huge 8608808960 Mon Oct 5 19:40:34 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8689311744 total_rss_huge 8608808960 Mon Oct 5 19:40:36 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8689311744 total_rss_huge 8608808960 Mon Oct 5 19:40:38 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8689311744 total_rss_huge 8608808960 Mon Oct 5 19:40:40 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8689438720 total_rss_huge 8608808960 Mon Oct 5 19:40:42 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8689537024 total_rss_huge 8608808960 Mon Oct 5 19:40:44 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691634176 total_rss_huge 8610906112 Mon Oct 5 19:40:46 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691662848 total_rss_huge 8610906112 Mon Oct 5 19:40:48 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691671040 total_rss_huge 8610906112 Mon Oct 5 19:40:50 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691687424 total_rss_huge 8610906112 Mon Oct 5 19:40:52 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691699712 total_rss_huge 8610906112 Mon Oct 5 19:40:54 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691712000 total_rss_huge 8610906112 Mon Oct 5 19:40:56 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692711424 total_rss_huge 8613003264 Mon Oct 5 19:40:58 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692740096 total_rss_huge 8613003264 Mon Oct 5 19:41:00 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692748288 total_rss_huge 8613003264 Mon Oct 5 19:41:02 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692756480 total_rss_huge 8613003264 Mon Oct 5 19:41:04 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692756480 total_rss_huge 8613003264 Mon Oct 5 19:41:06 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692781056 total_rss_huge 8613003264 Mon Oct 5 19:41:08 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692789248 total_rss_huge 8613003264 Mon Oct 5 19:41:10 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692789248 total_rss_huge 8613003264 Mon Oct 5 19:41:12 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692789248 total_rss_huge 8613003264 Mon Oct 5 19:41:14 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692789248 total_rss_huge 8613003264 Mon Oct 5 19:41:16 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692838400 total_rss_huge 8613003264 Mon Oct 5 19:41:18 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692838400 total_rss_huge 8613003264 Mon Oct 5 19:41:20 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692850688 total_rss_huge 8613003264 Mon Oct 5 19:41:22 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693526528 total_rss_huge 8613003264 Mon Oct 5 19:41:24 UTC 2020 cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod3d96a3d2_ea99_43d4_859f_12fe85b89762.slice/crio-e70a1a1739bb4a41875168459a4208d4726e9d29268f6009f53c8c302d60177a.scope/memory.stat: No such file or directory Mon Oct 5 19:41:26 UTC 2020 ~~~ ~~~ [root@openshift-worker-1 ~]# journalctl -f | grep -i oomkilled Oct 05 19:41:22 openshift-worker-1.example.com hyperkube[2223]: I1005 19:41:22.830995 2223 status_manager.go:570] Patch status for pod "virt-launcher-fedora-test-vm-v8b9l_test(3d96a3d2-ea99-43d4-859f-12fe85b89762)" with "{\"metadata\":{\"uid\":\"3d96a3d2-ea99-43d4-859f-12fe85b89762\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"lastTransitionTime\":\"2020-10-05T19:41:22Z\",\"message\":\"containers with unready status: [compute]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-10-05T19:41:22Z\",\"message\":\"containers with unready status: [compute]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://e70a1a1739bb4a41875168459a4208d4726e9d29268f6009f53c8c302d60177a\",\"image\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"imageID\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"lastState\":{},\"name\":\"compute\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://e70a1a1739bb4a41875168459a4208d4726e9d29268f6009f53c8c302d60177a\",\"exitCode\":137,\"finishedAt\":\"2020-10-05T19:41:22Z\",\"reason\":\"OOMKilled\",\"startedAt\":\"2020-10-05T19:35:56Z\"}}},{\"containerID\":\"cri-o://66027bd3bcaf5374cde0fd9f6d4ef1bfdbf74a987cce4193e483e21b9955564e\",\"image\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo:latest\",\"imageID\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331\",\"lastState\":{},\"name\":\"volumerootdisk\",\"ready\":true,\"restartCount\":0,\"started\":true,\"state\":{\"running\":{\"startedAt\":\"2020-10-05T19:36:01Z\"}}}]}}" Oct 05 19:41:22 openshift-worker-1.example.com hyperkube[2223]: I1005 19:41:22.831057 2223 status_manager.go:578] Status for pod "virt-launcher-fedora-test-vm-v8b9l_test(3d96a3d2-ea99-43d4-859f-12fe85b89762)" updated successfully: (4, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:35:56 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:41:22 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:41:22 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:39:59 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:192.168.123.221 PodIP:172.24.2.17 PodIPs:[{IP:172.24.2.17}] StartTime:2020-10-05 19:35:48 +0000 UTC InitContainerStatuses:[{Name:container-disk-binary State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-05 19:35:55 +0000 UTC,FinishedAt:2020-10-05 19:35:55 +0000 UTC,ContainerID:cri-o://0de49e24f5467bfb0d3131e725367a51acb74265ef629d2e39755b5b5dc91a31,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://0de49e24f5467bfb0d3131e725367a51acb74265ef629d2e39755b5b5dc91a31 Started:<nil>}] ContainerStatuses:[{Name:compute State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:137,Signal:0,Reason:OOMKilled,Message:,StartedAt:2020-10-05 19:35:56 +0000 UTC,FinishedAt:2020-10-05 19:41:22 +0000 UTC,ContainerID:cri-o://e70a1a1739bb4a41875168459a4208d4726e9d29268f6009f53c8c302d60177a,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://e70a1a1739bb4a41875168459a4208d4726e9d29268f6009f53c8c302d60177a Started:0xc001b50ef2} {Name:volumerootdisk State:{Waiting:nil Running:&ContainerStateRunning{StartedAt:2020-10-05 19:36:01 +0000 UTC,} Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://66027bd3bcaf5374cde0fd9f6d4ef1bfdbf74a987cce4193e483e21b9955564e Started:0xc001b50ef3}] QOSClass:Guaranteed EphemeralContainerStatuses:[]}) ^C ~~~ ================================ [0] ~~~ [root@openshift-jumpserver-0 ~]# oc get -f fedora-vm.yaml -o yaml apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachine metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"kubevirt.io/v1alpha3","kind":"VirtualMachine","metadata":{"annotations":{"kubevirt.io/latest-observed-api-version":"v1alpha3","kubevirt.io/storage-observed-api-version":"v1alpha3","name.os.template.kubevirt.io/fedora31":"Fedora 31"},"labels":{"app":"fedora-test-vm","flavor.template.kubevirt.io/Custom":"true","os.template.kubevirt.io/fedora31":"true","vm.kubevirt.io/template.namespace":"openshift","vm.kubevirt.io/template.revision":"1","vm.kubevirt.io/template.version":"v0.9.1","workload.template.kubevirt.io/server":"true"},"name":"fedora-test-vm","namespace":"test"},"spec":{"running":true,"template":{"metadata":{"labels":{"flavor.template.kubevirt.io/Custom":"true","kubevirt.io/domain":"fedora-test-vm","kubevirt.io/size":"large","os.template.kubevirt.io/fedora31":"true","vm.kubevirt.io/name":"fedora-test-vm","workload.template.kubevirt.io/server":"true"}},"spec":{"domain":{"cpu":{"cores":3,"model":"host-model"},"devices":{"disks":[{"bootOrder":1,"disk":{"bus":"virtio"},"name":"rootdisk"},{"disk":{"bus":"virtio"},"name":"cloudinitdisk"}],"interfaces":[{"masquerade":{},"model":"virtio","name":"nic0"}],"networkInterfaceMultiqueue":true,"rng":{}},"features":{"acpi":{"enabled":true}},"machine":{"type":"q35"},"resources":{"limits":{"cpu":"3","memory":"8192Mi"},"requests":{"cpu":"3","memory":"8192Mi"}}},"evictionStrategy":"LiveMigrate","hostname":"fedora-test-vm","networks":[{"name":"nic0","pod":{}}],"terminationGracePeriodSeconds":0,"volumes":[{"containerDisk":{"image":"docker.io/kubevirt/fedora-cloud-container-disk-demo"},"name":"rootdisk"},{"cloudInitNoCloud":{"userData":"#cloud-config\nname: default\nhostname: fedora-test-vm\nssh_authorized_keys:\n - \u003e-\n ssh-rsa\n AAAAB3NzaC1yc2EAAAADAQABAAABgQDpUE2TTSndAH40z9ccaOv4jqBwSFP6M1HGDTp5MlSjuhARj4b5MNnOawqxBLO4DUdYmZtDPkwFSq0JTV59RudOWF+RmShc5WotVBYqLnmGzq5WEVhbo5ZwHey8OQ9env0lJBHzAArzq878B6Y0aEwUV+vBqv7wHZK+J7aF04d/qQm7yPlcuCc0JCE2I2Dqgy6O9dJI/bNLvqWKCt+d9og2nmzJbCPI3w4rscZZizL7d1kk93DiDFFnjfNaChgFcpDASnZOZlceEY7xt2Kc4SwHDfg7ykACDoonleUuF0eDbpCUPo3QIvKGLEzzo4ONQMEfb7qFjQxk+YxlpnXDE5Gl30IK0u/HtmJyA1hZyhJqb8l+Bh0qnm8+fxxctsGcROyXoIYfHeMucOjr0nZitplL6bI+NTI6TqILphfEbNCUJe/Uo9CrojU9dN1pfCZ71+FRaHGDrxoZLyhduxgBrHIlQEPOiYbiGif2unMNERaPdjgB3R5bqiCrhT/a4jl49IU=\n"},"name":"cloudinitdisk"}]}}}} kubevirt.io/latest-observed-api-version: v1alpha3 kubevirt.io/storage-observed-api-version: v1alpha3 name.os.template.kubevirt.io/fedora31: Fedora 31 creationTimestamp: "2020-10-05T19:39:59Z" generation: 3 labels: app: fedora-test-vm flavor.template.kubevirt.io/Custom: "true" os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/template.namespace: openshift vm.kubevirt.io/template.revision: "1" vm.kubevirt.io/template.version: v0.9.1 workload.template.kubevirt.io/server: "true" managedFields: - apiVersion: kubevirt.io/v1alpha3 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:kubectl.kubernetes.io/last-applied-configuration: {} f:kubevirt.io/latest-observed-api-version: {} f:kubevirt.io/storage-observed-api-version: {} f:name.os.template.kubevirt.io/fedora31: {} f:labels: .: {} f:app: {} f:flavor.template.kubevirt.io/Custom: {} f:os.template.kubevirt.io/fedora31: {} f:vm.kubevirt.io/template.namespace: {} f:vm.kubevirt.io/template.revision: {} f:vm.kubevirt.io/template.version: {} f:workload.template.kubevirt.io/server: {} f:spec: .: {} f:running: {} f:template: .: {} f:metadata: .: {} f:labels: .: {} f:flavor.template.kubevirt.io/Custom: {} f:kubevirt.io/domain: {} f:kubevirt.io/size: {} f:os.template.kubevirt.io/fedora31: {} f:vm.kubevirt.io/name: {} f:workload.template.kubevirt.io/server: {} f:spec: .: {} f:domain: .: {} f:cpu: .: {} f:cores: {} f:model: {} f:devices: .: {} f:disks: {} f:interfaces: {} f:networkInterfaceMultiqueue: {} f:rng: {} f:features: .: {} f:acpi: .: {} f:enabled: {} f:machine: .: {} f:type: {} f:resources: .: {} f:limits: .: {} f:cpu: {} f:memory: {} f:requests: .: {} f:cpu: {} f:memory: {} f:evictionStrategy: {} f:hostname: {} f:networks: {} f:terminationGracePeriodSeconds: {} f:volumes: {} manager: oc operation: Update time: "2020-10-05T19:39:59Z" - apiVersion: kubevirt.io/v1alpha3 fieldsType: FieldsV1 fieldsV1: f:status: .: {} f:conditions: {} f:created: {} f:ready: {} manager: virt-controller operation: Update time: "2020-10-05T19:40:14Z" name: fedora-test-vm namespace: test resourceVersion: "252796" selfLink: /apis/kubevirt.io/v1alpha3/namespaces/test/virtualmachines/fedora-test-vm uid: a9378cdc-c270-4092-9692-4bacfb5c0e8a spec: running: true template: metadata: creationTimestamp: null labels: flavor.template.kubevirt.io/Custom: "true" kubevirt.io/domain: fedora-test-vm kubevirt.io/size: large os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/name: fedora-test-vm workload.template.kubevirt.io/server: "true" spec: domain: cpu: cores: 3 model: host-model devices: disks: - bootOrder: 1 disk: bus: virtio name: rootdisk - disk: bus: virtio name: cloudinitdisk interfaces: - masquerade: {} model: virtio name: nic0 networkInterfaceMultiqueue: true rng: {} features: acpi: enabled: true machine: type: q35 resources: limits: cpu: "3" memory: 8Gi requests: cpu: "3" memory: 8Gi evictionStrategy: LiveMigrate hostname: fedora-test-vm networks: - name: nic0 pod: {} terminationGracePeriodSeconds: 0 volumes: - containerDisk: image: docker.io/kubevirt/fedora-cloud-container-disk-demo name: rootdisk - cloudInitNoCloud: userData: | #cloud-config name: default hostname: fedora-test-vm ssh_authorized_keys: - >- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDpUE2TTSndAH40z9ccaOv4jqBwSFP6M1HGDTp5MlSjuhARj4b5MNnOawqxBLO4DUdYmZtDPkwFSq0JTV59RudOWF+RmShc5WotVBYqLnmGzq5WEVhbo5ZwHey8OQ9env0lJBHzAArzq878B6Y0aEwUV+vBqv7wHZK+J7aF04d/qQm7yPlcuCc0JCE2I2Dqgy6O9dJI/bNLvqWKCt+d9og2nmzJbCPI3w4rscZZizL7d1kk93DiDFFnjfNaChgFcpDASnZOZlceEY7xt2Kc4SwHDfg7ykACDoonleUuF0eDbpCUPo3QIvKGLEzzo4ONQMEfb7qFjQxk+YxlpnXDE5Gl30IK0u/HtmJyA1hZyhJqb8l+Bh0qnm8+fxxctsGcROyXoIYfHeMucOjr0nZitplL6bI+NTI6TqILphfEbNCUJe/Uo9CrojU9dN1pfCZ71+FRaHGDrxoZLyhduxgBrHIlQEPOiYbiGif2unMNERaPdjgB3R5bqiCrhT/a4jl49IU= name: cloudinitdisk status: conditions: - lastProbeTime: null lastTransitionTime: "2020-10-05T19:36:02Z" status: "True" type: Ready created: true ready: true ~~~
Repeating the same test but undercommiting the VM [0] and without overcommitGuestOverhead - this way, the issue does not reproduce. ~~~ resources: requests: cpu: "3" memory: 8704Mi limits: cpu: "3" memory: 8704Mi memory: guest: 8192Mi ~~~ Stress-ng inside VM: ~~~ ssh -i gss-stack-tools -p 20022 fedora.54.202 yum install stress-ng -y stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1 ~~~ Network load - from hypervisor run: ~~~ dd if=/dev/zero of=test.txt bs=1G count=1 sudo ip route add 172.30.54.202 via 172.24.2.1 dev ovn-k8s-mp0 src 172.24.2.2 mtu 1400 while true ; do scp -i gss-stack-tools -P 20022 test.txt fedora.54.202: ; done ~~~ ~~~ [root@openshift-jumpserver-0 ~]# oc describe pod | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:' Limits: memory: 40M Requests: memory: 40M Containers: compute: Limits: memory: 9104456Ki Requests: memory: 9104456Ki volumerootdisk: Limits: memory: 40M Requests: memory: 40M ~~~ We can see that qemu-kvm is way below the cgroup limits: ~~~ [root@openshift-worker-1 ~]# pgrep qemu-kvm | while read pid ; do echo === $pid === ; ps aux | grep $pid | grep -o '\-m [0-9]*' ; pmap $pid | tail ; done === 896856 === -m 8192 00007ffa5715a000 12K rw-s- [ anon ] 00007ffa5715d000 4K rw-s- zero (deleted) 00007ffa5715e000 4K r---- ld-2.28.so 00007ffa5715f000 4K rw--- ld-2.28.so 00007ffa57160000 4K rw--- [ anon ] 00007ffd48abe000 132K rw--- [ stack ] 00007ffd48adf000 12K r---- [ anon ] 00007ffd48ae2000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 13127572K ~~~ ~~~ [root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit' ; sleep 2 ; done Mon Oct 5 19:20:54 UTC 2020 hierarchical_memory_limit 9322962944 total_rss 772190208 total_rss_huge 706740224 ~~~ And the OOM does not reproduce within 10 minutes of testing. ======== [0] ~~~ [root@openshift-jumpserver-0 ~]# cat fedora-vm.yaml apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachine metadata: annotations: kubevirt.io/latest-observed-api-version: v1alpha3 kubevirt.io/storage-observed-api-version: v1alpha3 name.os.template.kubevirt.io/fedora31: Fedora 31 labels: app: fedora-test-vm flavor.template.kubevirt.io/Custom: "true" os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/template.namespace: openshift vm.kubevirt.io/template.revision: "1" vm.kubevirt.io/template.version: v0.9.1 workload.template.kubevirt.io/server: "true" name: fedora-test-vm namespace: test spec: running: true template: metadata: labels: flavor.template.kubevirt.io/Custom: "true" kubevirt.io/domain: fedora-test-vm kubevirt.io/size: large os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/name: fedora-test-vm workload.template.kubevirt.io/server: "true" spec: domain: cpu: cores: 3 model: host-model devices: disks: - bootOrder: 1 disk: bus: virtio name: rootdisk - disk: bus: virtio name: cloudinitdisk interfaces: - masquerade: {} model: virtio name: nic0 networkInterfaceMultiqueue: true rng: {} features: acpi: enabled: true machine: type: q35 resources: requests: cpu: "3" memory: 8704Mi limits: cpu: "3" memory: 8704Mi memory: guest: 8192Mi evictionStrategy: LiveMigrate hostname: fedora-test-vm networks: - name: nic0 pod: {} terminationGracePeriodSeconds: 0 volumes: - containerDisk: image: docker.io/kubevirt/fedora-cloud-container-disk-demo name: rootdisk - cloudInitNoCloud: userData: | #cloud-config name: default hostname: fedora-test-vm ssh_authorized_keys: - >- ssh-rsa AA(...)jl49IU= name: cloudinitdisk ~~~
Repeating the same test with [0] ~~~ resources: requests: cpu: "3" memory: 8192Mi limits: cpu: "3" memory: 8192Mi memory: guest: 8192Mi ~~~ But still without: ~~~ overcommitGuestOverhead: true ~~~ This reproduces the issue. We now see: ~~~ [root@openshift-jumpserver-0 ~]# oc describe pod | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:' Limits: memory: 40M Requests: memory: 40M Containers: compute: Limits: memory: 8579144Ki Requests: memory: 8579144Ki volumerootdisk: Limits: memory: 40M Requests: memory: 40M ~~~ ~~~ [root@openshift-worker-1 ~]# pgrep qemu-kvm | while read pid ; do echo === $pid === ; ps aux | grep $pid | grep -o '\-m [0-9]*' ; pmap $pid | tail ; done === 877221 === -m 8192 00007f89ce717000 12K rw-s- [ anon ] 00007f89ce71a000 4K rw-s- zero (deleted) 00007f89ce71b000 4K r---- ld-2.28.so 00007f89ce71c000 4K rw--- ld-2.28.so 00007f89ce71d000 4K rw--- [ anon ] 00007fff0b5df000 132K rw--- [ stack ] 00007fff0b61a000 12K r---- [ anon ] 00007fff0b61d000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 11206356K ~~~ ~~~ [root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit' ; sleep 2 ; done Mon Oct 5 19:13:35 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 1432940544 total_rss_huge 1354760192 Mon Oct 5 19:13:37 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 1426857984 total_rss_huge 1348468736 ~~~ That's in MiB: ~~~ [root@openshift-worker-1 ~]# echo "8785043456 / 1024 /1024" | bc 8378 ~~~ And I can still produce the OOM: ~~~ [root@fedora-test-vm ~]# stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1 stress-ng: debug: [9372] 3 processors online, 3 processors configured stress-ng: info: [9372] dispatching hogs: 1 vm, 1 hdd, 1 io stress-ng: debug: [9372] cache allocate: default cache size: 16384K stress-ng: debug: [9372] starting stressors stress-ng: debug: [9373] stress-ng-vm: started [9373] (instance 0) stress-ng: debug: [9372] 3 stressors spawned stress-ng: debug: [9374] stress-ng-hdd: started [9374] (instance 0) stress-ng: debug: [9373] stress-ng-vm using method 'all' stress-ng: debug: [9375] stress-ng-io: started [9375] (instance 0) ~~~ ~~~ [root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit' ; sleep 2 ; done Mon Oct 5 19:13:35 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 1432940544 total_rss_huge 1354760192 Mon Oct 5 19:13:37 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 1426857984 total_rss_huge 1348468736 Mon Oct 5 19:13:39 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 3285221376 total_rss_huge 3206545408 Mon Oct 5 19:13:41 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 4770004992 total_rss_huge 4691329024 Mon Oct 5 19:13:43 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 6103789568 total_rss_huge 6025117696 Mon Oct 5 19:13:45 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 7917981696 total_rss_huge 7839154176 Mon Oct 5 19:13:47 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8656314368 total_rss_huge 8577351680 Mon Oct 5 19:13:49 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692006912 total_rss_huge 8613003264 Mon Oct 5 19:13:51 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692006912 total_rss_huge 8613003264 Mon Oct 5 19:13:53 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692006912 total_rss_huge 8613003264 Mon Oct 5 19:13:55 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692006912 total_rss_huge 8613003264 Mon Oct 5 19:13:57 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692006912 total_rss_huge 8613003264 Mon Oct 5 19:13:59 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8694104064 total_rss_huge 8615100416 Mon Oct 5 19:14:01 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8694104064 total_rss_huge 8615100416 Mon Oct 5 19:14:03 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8694104064 total_rss_huge 8615100416 Mon Oct 5 19:14:05 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8694104064 total_rss_huge 8615100416 Mon Oct 5 19:14:07 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8694104064 total_rss_huge 8615100416 Mon Oct 5 19:14:09 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8694104064 total_rss_huge 8615100416 Mon Oct 5 19:14:11 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8694841344 total_rss_huge 8617197568 Mon Oct 5 19:14:13 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8692932608 total_rss_huge 8615100416 Mon Oct 5 19:14:15 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693264384 total_rss_huge 8615100416 Mon Oct 5 19:14:17 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8695607296 total_rss_huge 8617197568 Mon Oct 5 19:14:19 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693518336 total_rss_huge 8615100416 Mon Oct 5 19:14:21 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693518336 total_rss_huge 8615100416 Mon Oct 5 19:14:23 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693518336 total_rss_huge 8615100416 Mon Oct 5 19:14:25 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693518336 total_rss_huge 8615100416 Mon Oct 5 19:14:27 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693518336 total_rss_huge 8615100416 Mon Oct 5 19:14:29 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693547008 total_rss_huge 8615100416 Mon Oct 5 19:14:31 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693547008 total_rss_huge 8615100416 Mon Oct 5 19:14:33 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693547008 total_rss_huge 8615100416 Mon Oct 5 19:14:35 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693547008 total_rss_huge 8615100416 Mon Oct 5 19:14:37 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693547008 total_rss_huge 8615100416 Mon Oct 5 19:14:39 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693547008 total_rss_huge 8615100416 Mon Oct 5 19:14:41 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693547008 total_rss_huge 8615100416 Mon Oct 5 19:14:43 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693547008 total_rss_huge 8615100416 Mon Oct 5 19:14:45 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693555200 total_rss_huge 8615100416 Mon Oct 5 19:14:47 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693555200 total_rss_huge 8615100416 Mon Oct 5 19:14:49 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693555200 total_rss_huge 8615100416 Mon Oct 5 19:14:51 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693559296 total_rss_huge 8615100416 Mon Oct 5 19:14:53 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693559296 total_rss_huge 8615100416 Mon Oct 5 19:14:55 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693559296 total_rss_huge 8615100416 Mon Oct 5 19:14:57 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693698560 total_rss_huge 8615100416 Mon Oct 5 19:14:59 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693907456 total_rss_huge 8613003264 Mon Oct 5 19:15:01 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693907456 total_rss_huge 8610906112 Mon Oct 5 19:15:03 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693907456 total_rss_huge 8610906112 Mon Oct 5 19:15:05 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693907456 total_rss_huge 8610906112 Mon Oct 5 19:15:07 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693907456 total_rss_huge 8610906112 Mon Oct 5 19:15:09 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8693907456 total_rss_huge 8610906112 Mon Oct 5 19:15:11 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691810304 total_rss_huge 8608808960 Mon Oct 5 19:15:13 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691810304 total_rss_huge 8608808960 Mon Oct 5 19:15:15 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691810304 total_rss_huge 8608808960 Mon Oct 5 19:15:17 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691810304 total_rss_huge 8608808960 Mon Oct 5 19:15:19 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691810304 total_rss_huge 8608808960 Mon Oct 5 19:15:21 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691810304 total_rss_huge 8608808960 Mon Oct 5 19:15:23 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8691810304 total_rss_huge 8608808960 Mon Oct 5 19:15:25 UTC 2020 cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod12fb4c1b_a719_42dc_92e0_bf2d3b7ff322.slice/crio-5266b88b2b7381299ff5395cd47d5684289c81969ee5cfee7e9e2a25415c3b55.scope/memory.stat: No such file or directory Mon Oct 5 19:15:27 UTC 2020 cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod12fb4c1b_a719_42dc_92e0_bf2d3b7ff322.slice/crio-5266b88b2b7381299ff5395cd47d5684289c81969ee5cfee7e9e2a25415c3b55.scope/memory.stat: No such file or directory Mon Oct 5 19:15:29 UTC 2020 cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod12fb4c1b_a719_42dc_92e0_bf2d3b7ff322.slice/crio-5266b88b2b7381299ff5395cd47d5684289c81969ee5cfee7e9e2a25415c3b55.scope/memory.stat: No such file or directory Mon Oct 5 19:15:31 UTC 2020 cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod12fb4c1b_a719_42dc_92e0_bf2d3b7ff322.slice/crio-5266b88b2b7381299ff5395cd47d5684289c81969ee5cfee7e9e2a25415c3b55.scope/memory.stat: No such file or directory Mon Oct 5 19:15:33 UTC 2020 cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod12fb4c1b_a719_42dc_92e0_bf2d3b7ff322.slice/crio-5266b88b2b7381299ff5395cd47d5684289c81969ee5cfee7e9e2a25415c3b55.scope/memory.stat: No such file or directory ^C [root@openshift-worker-1 ~]# ^C [root@openshift-worker-1 ~]# ~~~ ~~~ [root@openshift-worker-1 ~]# journalctl -f | grep -i oomkilled Oct 05 19:15:24 openshift-worker-1.example.com hyperkube[2223]: I1005 19:15:24.796770 2223 status_manager.go:570] Patch status for pod "virt-launcher-fedora-test-vm-g629t_test(12fb4c1b-a719-42dc-92e0-bf2d3b7ff322)" with "{\"metadata\":{\"uid\":\"12fb4c1b-a719-42dc-92e0-bf2d3b7ff322\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"lastTransitionTime\":\"2020-10-05T19:15:24Z\",\"message\":\"containers with unready status: [compute]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-10-05T19:15:24Z\",\"message\":\"containers with unready status: [compute]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://5266b88b2b7381299ff5395cd47d5684289c81969ee5cfee7e9e2a25415c3b55\",\"image\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"imageID\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"lastState\":{},\"name\":\"compute\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://5266b88b2b7381299ff5395cd47d5684289c81969ee5cfee7e9e2a25415c3b55\",\"exitCode\":137,\"finishedAt\":\"2020-10-05T19:15:24Z\",\"reason\":\"OOMKilled\",\"startedAt\":\"2020-10-05T19:10:49Z\"}}},{\"containerID\":\"cri-o://c3fc850a6e652fcc04d680903945a1f4cd868e0a2e06a0085ee2449c0d77d5e3\",\"image\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo:latest\",\"imageID\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331\",\"lastState\":{},\"name\":\"volumerootdisk\",\"ready\":true,\"restartCount\":0,\"started\":true,\"state\":{\"running\":{\"startedAt\":\"2020-10-05T19:10:54Z\"}}}]}}" Oct 05 19:15:24 openshift-worker-1.example.com hyperkube[2223]: I1005 19:15:24.796833 2223 status_manager.go:578] Status for pod "virt-launcher-fedora-test-vm-g629t_test(12fb4c1b-a719-42dc-92e0-bf2d3b7ff322)" updated successfully: (4, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:10:49 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:15:24 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:15:24 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:14:52 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:192.168.123.221 PodIP:172.24.2.17 PodIPs:[{IP:172.24.2.17}] StartTime:2020-10-05 19:10:41 +0000 UTC InitContainerStatuses:[{Name:container-disk-binary State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-05 19:10:48 +0000 UTC,FinishedAt:2020-10-05 19:10:49 +0000 UTC,ContainerID:cri-o://05d454b3e4456ca0e6a9e1789bd7c5dc1aa6a7167bf43f56e2763ff4825f06a2,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://05d454b3e4456ca0e6a9e1789bd7c5dc1aa6a7167bf43f56e2763ff4825f06a2 Started:<nil>}] ContainerStatuses:[{Name:compute State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:137,Signal:0,Reason:OOMKilled,Message:,StartedAt:2020-10-05 19:10:49 +0000 UTC,FinishedAt:2020-10-05 19:15:24 +0000 UTC,ContainerID:cri-o://5266b88b2b7381299ff5395cd47d5684289c81969ee5cfee7e9e2a25415c3b55,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://5266b88b2b7381299ff5395cd47d5684289c81969ee5cfee7e9e2a25415c3b55 Started:0xc002ac55d2} {Name:volumerootdisk State:{Waiting:nil Running:&ContainerStateRunning{StartedAt:2020-10-05 19:10:54 +0000 UTC,} Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://c3fc850a6e652fcc04d680903945a1f4cd868e0a2e06a0085ee2449c0d77d5e3 Started:0xc002ac55d3}] QOSClass:Guaranteed EphemeralContainerStatuses:[]}) ^C ~~~ ================================================================================ [0] ~~~ [root@openshift-jumpserver-0 ~]# cat fedora-vm.yaml apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachine metadata: annotations: kubevirt.io/latest-observed-api-version: v1alpha3 kubevirt.io/storage-observed-api-version: v1alpha3 name.os.template.kubevirt.io/fedora31: Fedora 31 labels: app: fedora-test-vm flavor.template.kubevirt.io/Custom: "true" os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/template.namespace: openshift vm.kubevirt.io/template.revision: "1" vm.kubevirt.io/template.version: v0.9.1 workload.template.kubevirt.io/server: "true" name: fedora-test-vm namespace: test spec: running: true template: metadata: labels: flavor.template.kubevirt.io/Custom: "true" kubevirt.io/domain: fedora-test-vm kubevirt.io/size: large os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/name: fedora-test-vm workload.template.kubevirt.io/server: "true" spec: domain: cpu: cores: 3 model: host-model devices: disks: - bootOrder: 1 disk: bus: virtio name: rootdisk - disk: bus: virtio name: cloudinitdisk interfaces: - masquerade: {} model: virtio name: nic0 networkInterfaceMultiqueue: true rng: {} features: acpi: enabled: true machine: type: q35 resources: requests: cpu: "3" memory: 8192Mi limits: cpu: "3" memory: 8192Mi memory: guest: 8192Mi evictionStrategy: LiveMigrate hostname: fedora-test-vm networks: - name: nic0 pod: {} terminationGracePeriodSeconds: 0 volumes: - containerDisk: image: docker.io/kubevirt/fedora-cloud-container-disk-demo name: rootdisk - cloudInitNoCloud: userData: | #cloud-config name: default hostname: fedora-test-vm ssh_authorized_keys: - >- ssh-rsa AAA(...)l49IU= name: cloudinitdisk ~~~
I repeated the same test with a VM with 8192Mi and overcommitGuestOverhead: true [0] ~~~ resources: requests: cpu: "3" memory: 8192Mi limits: cpu: "3" memory: 8192Mi overcommitGuestOverhead: true memory: guest: 8192Mi ~~~ The issue still reproduces. The memory limits inside kubernetes DB and on the host match: ~~~ [root@openshift-jumpserver-0 ~]# oc describe pod | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:' Limits: memory: 40M Requests: memory: 40M Containers: compute: Limits: memory: 8579144Ki Requests: memory: 8Gi volumerootdisk: Limits: memory: 40M Requests: memory: 40M ~~~ ~~~ [root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit' ; sleep 2 ; done Mon Oct 5 19:04:45 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 855515136 total_rss_huge 788529152 Mon Oct 5 19:04:47 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 855515136 ~~~ ~~~ [root@openshift-worker-1 ~]# echo "8579144 * 1024" | bc 8785043456 ~~~ So here, we avoid https://github.com/kubevirt/kubevirt/pull/3921 Stress-ng inside VM: ~~~ ssh -i gss-stack-tools -p 20022 fedora.54.202 yum install stress-ng -y stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1 ~~~ Network load - from hypervisor run: ~~~ dd if=/dev/zero of=test.txt bs=1G count=1 sudo ip route add 172.30.54.202 via 172.24.2.1 dev ovn-k8s-mp0 src 172.24.2.2 mtu 1400 while true ; do scp -i gss-stack-tools -P 20022 test.txt fedora.54.202: ; done ~~~ I can still hit the same OOM: ~~~ [root@openshift-worker-1 ~]# journalctl -f | grep -i oomkilled Oct 05 19:08:33 openshift-worker-1.example.com hyperkube[2223]: I1005 19:08:33.003078 2223 status_manager.go:570] Patch status for pod "virt-launcher-fedora-test-vm-bsplz_test(668e7ae5-1830-436e-870d-8dfdbea89e0f)" with "{\"metadata\":{\"uid\":\"668e7ae5-1830-436e-870d-8dfdbea89e0f\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"reason\":\"PodCompleted\",\"type\":\"Initialized\"},{\"lastTransitionTime\":\"2020-10-05T19:08:32Z\",\"reason\":\"PodCompleted\",\"status\":\"False\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-10-05T19:08:32Z\",\"reason\":\"PodCompleted\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://92302e1637ad140832c49abe07c7341d1751e2fe418d27079710f85a965ee09b\",\"image\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"imageID\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"lastState\":{},\"name\":\"compute\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://92302e1637ad140832c49abe07c7341d1751e2fe418d27079710f85a965ee09b\",\"exitCode\":0,\"finishedAt\":\"2020-10-05T19:08:31Z\",\"reason\":\"OOMKilled\",\"startedAt\":\"2020-10-05T19:03:21Z\"}}},{\"containerID\":\"cri-o://5574c2e6ddca338c6d116b869a9f8718c41d3c8905eae5956d1ef15758ca764a\",\"image\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo:latest\",\"imageID\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331\",\"lastState\":{},\"name\":\"volumerootdisk\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://5574c2e6ddca338c6d116b869a9f8718c41d3c8905eae5956d1ef15758ca764a\",\"exitCode\":0,\"finishedAt\":\"2020-10-05T19:08:31Z\",\"reason\":\"Completed\",\"startedAt\":\"2020-10-05T19:03:26Z\"}}}],\"phase\":\"Succeeded\"}}" Oct 05 19:08:33 openshift-worker-1.example.com hyperkube[2223]: I1005 19:08:33.003147 2223 status_manager.go:578] Status for pod "virt-launcher-fedora-test-vm-bsplz_test(668e7ae5-1830-436e-870d-8dfdbea89e0f)" updated successfully: (5, {Phase:Succeeded Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:03:21 +0000 UTC Reason:PodCompleted Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:08:32 +0000 UTC Reason:PodCompleted Message:} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:08:32 +0000 UTC Reason:PodCompleted Message:} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:07:25 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:192.168.123.221 PodIP:172.24.2.17 PodIPs:[{IP:172.24.2.17}] StartTime:2020-10-05 19:03:13 +0000 UTC InitContainerStatuses:[{Name:container-disk-binary State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-05 19:03:20 +0000 UTC,FinishedAt:2020-10-05 19:03:21 +0000 UTC,ContainerID:cri-o://7f05df1530eeecab0ee41b71727bfcdbda86ba08f2907f5a3879354bb02d31b5,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://7f05df1530eeecab0ee41b71727bfcdbda86ba08f2907f5a3879354bb02d31b5 Started:<nil>}] ContainerStatuses:[{Name:compute State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:OOMKilled,Message:,StartedAt:2020-10-05 19:03:21 +0000 UTC,FinishedAt:2020-10-05 19:08:31 +0000 UTC,ContainerID:cri-o://92302e1637ad140832c49abe07c7341d1751e2fe418d27079710f85a965ee09b,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://92302e1637ad140832c49abe07c7341d1751e2fe418d27079710f85a965ee09b Started:0xc00317b28a} {Name:volumerootdisk State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-05 19:03:26 +0000 UTC,FinishedAt:2020-10-05 19:08:31 +0000 UTC,ContainerID:cri-o://5574c2e6ddca338c6d116b869a9f8718c41d3c8905eae5956d1ef15758ca764a,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://5574c2e6ddca338c6d116b869a9f8718c41d3c8905eae5956d1ef15758ca764a Started:0xc00317b28b}] QOSClass:Burstable EphemeralContainerStatuses:[]}) Oct 05 19:08:33 openshift-worker-1.example.com hyperkube[2223]: I1005 19:08:33.003277 2223 kubelet_pods.go:961] Pod "virt-launcher-fedora-test-vm-bsplz_test(668e7ae5-1830-436e-870d-8dfdbea89e0f)" is terminated, but some containers have not been cleaned up: {ID:{Type:cri-o ID:5574c2e6ddca338c6d116b869a9f8718c41d3c8905eae5956d1ef15758ca764a} Name:volumerootdisk State:exited CreatedAt:2020-10-05 19:03:26.178788793 +0000 UTC StartedAt:2020-10-05 19:03:26.472343354 +0000 UTC FinishedAt:2020-10-05 19:08:31.979979057 +0000 UTC ExitCode:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 Hash:1814580024 RestartCount:0 Reason:Completed Message:} {ID:{Type:cri-o ID:92302e1637ad140832c49abe07c7341d1751e2fe418d27079710f85a965ee09b} Name:compute State:exited CreatedAt:2020-10-05 19:03:21.505102416 +0000 UTC StartedAt:2020-10-05 19:03:21.535758318 +0000 UTC FinishedAt:2020-10-05 19:08:31.932979056 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:1362036640 RestartCount:0 Reason:OOMKilled Message:} {ID:{Type:cri-o ID:7f05df1530eeecab0ee41b71727bfcdbda86ba08f2907f5a3879354bb02d31b5} Name:container-disk-binary State:exited CreatedAt:2020-10-05 19:03:20.571269695 +0000 UTC StartedAt:2020-10-05 19:03:20.772646491 +0000 UTC FinishedAt:2020-10-05 19:03:21.270973691 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:793891490 RestartCount:0 Reason:Completed Message:} Oct 05 19:08:33 openshift-worker-1.example.com hyperkube[2223]: I1005 19:08:33.006942 2223 kubelet_pods.go:961] Pod "virt-launcher-fedora-test-vm-bsplz_test(668e7ae5-1830-436e-870d-8dfdbea89e0f)" is terminated, but some containers have not been cleaned up: {ID:{Type:cri-o ID:5574c2e6ddca338c6d116b869a9f8718c41d3c8905eae5956d1ef15758ca764a} Name:volumerootdisk State:exited CreatedAt:2020-10-05 19:03:26.178788793 +0000 UTC StartedAt:2020-10-05 19:03:26.472343354 +0000 UTC FinishedAt:2020-10-05 19:08:31.979979057 +0000 UTC ExitCode:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 Hash:1814580024 RestartCount:0 Reason:Completed Message:} {ID:{Type:cri-o ID:92302e1637ad140832c49abe07c7341d1751e2fe418d27079710f85a965ee09b} Name:compute State:exited CreatedAt:2020-10-05 19:03:21.505102416 +0000 UTC StartedAt:2020-10-05 19:03:21.535758318 +0000 UTC FinishedAt:2020-10-05 19:08:31.932979056 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:1362036640 RestartCount:0 Reason:OOMKilled Message:} {ID:{Type:cri-o ID:7f05df1530eeecab0ee41b71727bfcdbda86ba08f2907f5a3879354bb02d31b5} Name:container-disk-binary State:exited CreatedAt:2020-10-05 19:03:20.571269695 +0000 UTC StartedAt:2020-10-05 19:03:20.772646491 +0000 UTC FinishedAt:2020-10-05 19:03:21.270973691 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:793891490 RestartCount:0 Reason:Completed Message:} Oct 05 19:08:33 openshift-worker-1.example.com hyperkube[2223]: I1005 19:08:33.010683 2223 kubelet_pods.go:961] Pod "virt-launcher-fedora-test-vm-bsplz_test(668e7ae5-1830-436e-870d-8dfdbea89e0f)" is terminated, but some containers have not been cleaned up: {ID:{Type:cri-o ID:5574c2e6ddca338c6d116b869a9f8718c41d3c8905eae5956d1ef15758ca764a} Name:volumerootdisk State:exited CreatedAt:2020-10-05 19:03:26.178788793 +0000 UTC StartedAt:2020-10-05 19:03:26.472343354 +0000 UTC FinishedAt:2020-10-05 19:08:31.979979057 +0000 UTC ExitCode:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 Hash:1814580024 RestartCount:0 Reason:Completed Message:} {ID:{Type:cri-o ID:92302e1637ad140832c49abe07c7341d1751e2fe418d27079710f85a965ee09b} Name:compute State:exited CreatedAt:2020-10-05 19:03:21.505102416 +0000 UTC StartedAt:2020-10-05 19:03:21.535758318 +0000 UTC FinishedAt:2020-10-05 19:08:31.932979056 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:1362036640 RestartCount:0 Reason:OOMKilled Message:} {ID:{Type:cri-o ID:7f05df1530eeecab0ee41b71727bfcdbda86ba08f2907f5a3879354bb02d31b5} Name:container-disk-binary State:exited CreatedAt:2020-10-05 19:03:20.571269695 +0000 UTC StartedAt:2020-10-05 19:03:20.772646491 +0000 UTC FinishedAt:2020-10-05 19:03:21.270973691 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:793891490 RestartCount:0 Reason:Completed Message:} ^C ~~~ ~~~ [root@fedora-test-vm ~]# stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1 stress-ng: debug: [1278] 3 processors online, 3 processors configured stress-ng: info: [1278] dispatching hogs: 1 vm, 1 hdd, 1 io stress-ng: debug: [1278] cache allocate: default cache size: 16384K stress-ng: debug: [1278] starting stressors stress-ng: debug: [1278] 3 stressors spawned stress-ng: debug: [1280] stress-ng-hdd: started [1280] (instance 0) stress-ng: debug: [1281] stress-ng-io: started [1281] (instance 0) stress-ng: debug: [1279] stress-ng-vm: started [1279] (instance 0) stress-ng: debug: [1279] stress-ng-vm using method 'all' ~~~ ~~~ [root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit' ; sleep 2 ; done Mon Oct 5 19:04:45 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 855515136 total_rss_huge 788529152 Mon Oct 5 19:04:47 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 855515136 total_rss_huge 788529152 Mon Oct 5 19:04:49 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 857620480 total_rss_huge 790626304 Mon Oct 5 19:04:51 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 878592000 total_rss_huge 811597824 Mon Oct 5 19:04:53 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 949895168 total_rss_huge 882900992 Mon Oct 5 19:04:55 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 1002323968 total_rss_huge 935329792 Mon Oct 5 19:04:57 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 1073627136 total_rss_huge 1006632960 Mon Oct 5 19:04:59 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 1075724288 total_rss_huge 1008730112 Mon Oct 5 19:05:01 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 1075724288 total_rss_huge 1008730112 Mon Oct 5 19:05:03 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 1077878784 total_rss_huge 1010827264 Mon Oct 5 19:05:05 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 1077878784 total_rss_huge 1010827264 Mon Oct 5 19:05:07 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 1126113280 total_rss_huge 1059061760 Mon Oct 5 19:05:09 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 1182736384 total_rss_huge 1115684864 Mon Oct 5 19:05:11 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 1228873728 total_rss_huge 1161822208 Mon Oct 5 19:05:13 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 1256136704 total_rss_huge 1189085184 Mon Oct 5 19:05:15 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 1719635968 total_rss_huge 1652555776 Mon Oct 5 19:05:17 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2212573184 total_rss_huge 2145386496 Mon Oct 5 19:05:19 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2424594432 total_rss_huge 2357198848 Mon Oct 5 19:05:21 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2491797504 total_rss_huge 2424307712 Mon Oct 5 19:05:23 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2506485760 total_rss_huge 2438987776 Mon Oct 5 19:05:25 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2542268416 total_rss_huge 2474639360 Mon Oct 5 19:05:27 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2544500736 total_rss_huge 2472542208 Mon Oct 5 19:05:29 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559508480 total_rss_huge 2483027968 Mon Oct 5 19:05:31 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559508480 total_rss_huge 2478833664 Mon Oct 5 19:05:33 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559508480 total_rss_huge 2478833664 Mon Oct 5 19:05:35 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559508480 total_rss_huge 2478833664 Mon Oct 5 19:05:37 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559643648 total_rss_huge 2478833664 Mon Oct 5 19:05:39 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559643648 total_rss_huge 2478833664 Mon Oct 5 19:05:41 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559643648 total_rss_huge 2478833664 Mon Oct 5 19:05:43 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559643648 total_rss_huge 2478833664 Mon Oct 5 19:05:45 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559643648 total_rss_huge 2478833664 Mon Oct 5 19:05:47 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559643648 total_rss_huge 2478833664 Mon Oct 5 19:05:49 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559643648 total_rss_huge 2478833664 Mon Oct 5 19:05:51 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559651840 total_rss_huge 2478833664 Mon Oct 5 19:05:53 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559660032 total_rss_huge 2478833664 Mon Oct 5 19:05:55 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559660032 total_rss_huge 2478833664 Mon Oct 5 19:05:57 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559737856 total_rss_huge 2478833664 Mon Oct 5 19:05:59 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559778816 total_rss_huge 2478833664 Mon Oct 5 19:06:01 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559799296 total_rss_huge 2478833664 Mon Oct 5 19:06:03 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559877120 total_rss_huge 2478833664 Mon Oct 5 19:06:05 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559877120 total_rss_huge 2478833664 Mon Oct 5 19:06:07 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559877120 total_rss_huge 2478833664 Mon Oct 5 19:06:09 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559877120 total_rss_huge 2478833664 Mon Oct 5 19:06:11 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2559877120 total_rss_huge 2478833664 Mon Oct 5 19:06:13 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2560098304 total_rss_huge 2478833664 Mon Oct 5 19:06:15 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2560098304 total_rss_huge 2478833664 Mon Oct 5 19:06:17 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2560229376 total_rss_huge 2478833664 Mon Oct 5 19:06:19 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2560229376 total_rss_huge 2478833664 Mon Oct 5 19:06:21 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2560229376 total_rss_huge 2478833664 Mon Oct 5 19:06:23 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2560229376 total_rss_huge 2478833664 Mon Oct 5 19:06:25 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2562351104 total_rss_huge 2480930816 Mon Oct 5 19:06:27 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2562498560 total_rss_huge 2480930816 Mon Oct 5 19:06:29 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 2801614848 total_rss_huge 2720006144 Mon Oct 5 19:06:31 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 4674666496 total_rss_huge 4592762880 Mon Oct 5 19:06:33 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 6388121600 total_rss_huge 6306136064 Mon Oct 5 19:06:35 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 7944278016 total_rss_huge 7862222848 Mon Oct 5 19:06:37 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8672026624 total_rss_huge 8589934592 Mon Oct 5 19:06:39 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8696958976 total_rss_huge 8617197568 Mon Oct 5 19:06:41 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697098240 total_rss_huge 8617197568 Mon Oct 5 19:06:43 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697167872 total_rss_huge 8617197568 Mon Oct 5 19:06:45 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697188352 total_rss_huge 8617197568 Mon Oct 5 19:06:47 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697188352 total_rss_huge 8617197568 Mon Oct 5 19:06:49 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697294848 total_rss_huge 8617197568 Mon Oct 5 19:06:51 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697331712 total_rss_huge 8617197568 Mon Oct 5 19:06:53 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697331712 total_rss_huge 8617197568 Mon Oct 5 19:06:55 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697356288 total_rss_huge 8617197568 Mon Oct 5 19:06:57 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697339904 total_rss_huge 8617197568 Mon Oct 5 19:06:59 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697356288 total_rss_huge 8617197568 Mon Oct 5 19:07:01 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8696741888 total_rss_huge 8617197568 Mon Oct 5 19:07:03 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8696741888 total_rss_huge 8617197568 Mon Oct 5 19:07:05 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8696741888 total_rss_huge 8617197568 Mon Oct 5 19:07:07 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8696741888 total_rss_huge 8617197568 Mon Oct 5 19:07:09 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8696741888 total_rss_huge 8617197568 Mon Oct 5 19:07:11 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8696741888 total_rss_huge 8617197568 Mon Oct 5 19:07:13 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8696741888 total_rss_huge 8617197568 Mon Oct 5 19:07:15 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8696741888 total_rss_huge 8617197568 Mon Oct 5 19:07:17 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8696819712 total_rss_huge 8617197568 Mon Oct 5 19:07:19 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8696844288 total_rss_huge 8617197568 Mon Oct 5 19:07:21 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8696844288 total_rss_huge 8617197568 Mon Oct 5 19:07:23 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8696889344 total_rss_huge 8617197568 Mon Oct 5 19:07:25 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697004032 total_rss_huge 8617197568 Mon Oct 5 19:07:27 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697180160 total_rss_huge 8617197568 Mon Oct 5 19:07:29 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697212928 total_rss_huge 8617197568 Mon Oct 5 19:07:31 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697315328 total_rss_huge 8613003264 Mon Oct 5 19:07:33 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697352192 total_rss_huge 8610906112 Mon Oct 5 19:07:35 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697352192 total_rss_huge 8610906112 Mon Oct 5 19:07:37 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697364480 total_rss_huge 8610906112 Mon Oct 5 19:07:39 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697372672 total_rss_huge 8610906112 Mon Oct 5 19:07:41 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697372672 total_rss_huge 8610906112 Mon Oct 5 19:07:43 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697372672 total_rss_huge 8610906112 Mon Oct 5 19:07:45 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8695275520 total_rss_huge 8608808960 Mon Oct 5 19:07:47 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8695304192 total_rss_huge 8608808960 Mon Oct 5 19:07:49 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8695410688 total_rss_huge 8608808960 Mon Oct 5 19:07:51 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8695410688 total_rss_huge 8608808960 Mon Oct 5 19:07:53 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8695504896 total_rss_huge 8608808960 Mon Oct 5 19:07:55 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8695504896 total_rss_huge 8608808960 Mon Oct 5 19:07:57 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8695517184 total_rss_huge 8608808960 Mon Oct 5 19:07:59 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8695566336 total_rss_huge 8608808960 Mon Oct 5 19:08:01 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8695566336 total_rss_huge 8608808960 Mon Oct 5 19:08:03 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697663488 total_rss_huge 8610906112 Mon Oct 5 19:08:05 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697663488 total_rss_huge 8610906112 Mon Oct 5 19:08:07 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697712640 total_rss_huge 8610906112 Mon Oct 5 19:08:09 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697733120 total_rss_huge 8610906112 Mon Oct 5 19:08:11 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697798656 total_rss_huge 8610906112 Mon Oct 5 19:08:13 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697798656 total_rss_huge 8610906112 Mon Oct 5 19:08:15 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697798656 total_rss_huge 8610906112 Mon Oct 5 19:08:17 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697798656 total_rss_huge 8610906112 Mon Oct 5 19:08:19 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697798656 total_rss_huge 8610906112 Mon Oct 5 19:08:21 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697798656 total_rss_huge 8610906112 Mon Oct 5 19:08:23 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697798656 total_rss_huge 8610906112 Mon Oct 5 19:08:25 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697798656 total_rss_huge 8610906112 Mon Oct 5 19:08:27 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697798656 total_rss_huge 8610906112 Mon Oct 5 19:08:29 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697798656 total_rss_huge 8610906112 Mon Oct 5 19:08:31 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 0 total_rss_huge 0 Mon Oct 5 19:08:33 UTC 2020 cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod668e7ae5_1830_436e_870d_8dfdbea89e0f.slice/crio-92302e1637ad140832c49abe07c7341d1751e2fe418d27079710f85a965ee09b.scope/memory.stat: No such file or directory Mon Oct 5 19:08:35 UTC 2020 cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod668e7ae5_1830_436e_870d_8dfdbea89e0f.slice/crio-92302e1637ad140832c49abe07c7341d1751e2fe418d27079710f85a965ee09b.scope/memory.stat: No such file or directory ^C ~~~ ~~~ [core@openshift-worker-1 ~]$ while true ; do scp -i gss-stack-tools -P 20022 test.txt fedora.54.202: ; done test.txt 100% 1024MB 208.9MB/s 00:04 test.txt 100% 1024MB 233.3MB/s 00:04 test.txt 100% 1024MB 183.4MB/s 00:05 test.txt 100% 1024MB 233.3MB/s 00:04 test.txt 100% 1024MB 181.2MB/s 00:05 test.txt 100% 1024MB 213.4MB/s 00:04 (...) ~~~ ------------------ [0] ~~~ [root@openshift-jumpserver-0 ~]# cat fedora-vm.yaml apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachine metadata: annotations: kubevirt.io/latest-observed-api-version: v1alpha3 kubevirt.io/storage-observed-api-version: v1alpha3 name.os.template.kubevirt.io/fedora31: Fedora 31 labels: app: fedora-test-vm flavor.template.kubevirt.io/Custom: "true" os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/template.namespace: openshift vm.kubevirt.io/template.revision: "1" vm.kubevirt.io/template.version: v0.9.1 workload.template.kubevirt.io/server: "true" name: fedora-test-vm namespace: test spec: running: true template: metadata: labels: flavor.template.kubevirt.io/Custom: "true" kubevirt.io/domain: fedora-test-vm kubevirt.io/size: large os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/name: fedora-test-vm workload.template.kubevirt.io/server: "true" spec: domain: cpu: cores: 3 model: host-model devices: disks: - bootOrder: 1 disk: bus: virtio name: rootdisk - disk: bus: virtio name: cloudinitdisk interfaces: - masquerade: {} model: virtio name: nic0 networkInterfaceMultiqueue: true rng: {} features: acpi: enabled: true machine: type: q35 resources: requests: cpu: "3" memory: 8192Mi limits: cpu: "3" memory: 8192Mi overcommitGuestOverhead: true memory: guest: 8192Mi evictionStrategy: LiveMigrate hostname: fedora-test-vm networks: - name: nic0 pod: {} terminationGracePeriodSeconds: 0 volumes: - containerDisk: image: docker.io/kubevirt/fedora-cloud-container-disk-demo name: rootdisk - cloudInitNoCloud: userData: | #cloud-config name: default hostname: fedora-test-vm ssh_authorized_keys: - >- ssh-rsa AAAAB(...)jl49IU= name: cloudinitdisk ~~~
Now, I tested with MB instead of MiB and run into https://github.com/kubevirt/kubevirt/pull/3921 on top of that [0] ~~~ resources: requests: cpu: "3" memory: 10240M limits: cpu: "3" memory: 10240M overcommitGuestOverhead: true memory: guest: 10240M ~~~ Pod description: ~~~ [root@openshift-jumpserver-0 ~]# oc describe pod | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:' Limits: memory: 40M Requests: memory: 40M Containers: compute: Limits: memory: 10438331648 Requests: memory: 10240M volumerootdisk: Limits: memory: 40M Requests: memory: 40M ~~~ But now we see a mismatch between kubernetes' limits and the cgroup limits: ~~~ [root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit' ; sleep 2 ; done Mon Oct 5 21:03:03 UTC 2020 hierarchical_memory_limit 10438328320 total_rss 831369216 total_rss_huge 765460480 ~~~ The VM: ~~~ [root@openshift-worker-1 ~]# pgrep qemu-kvm | while read pid ; do echo === $pid === ; ps aux | grep $pid | grep -o '\-m [0-9]*' ; pmap $pid | tail ; done === 1172052 === -m 9766 00007ffb4d2d3000 12K rw-s- [ anon ] 00007ffb4d2d6000 4K rw-s- zero (deleted) 00007ffb4d2d7000 4K r---- ld-2.28.so 00007ffb4d2d8000 4K rw--- ld-2.28.so 00007ffb4d2d9000 4K rw--- [ anon ] 00007ffde233d000 132K rw--- [ stack ] 00007ffde2368000 12K r---- [ anon ] 00007ffde236b000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 15162840K ~~~ The mismatch is small in this case, only 3 K, but it's there: ~~~ [root@openshift-worker-1 ~]# echo "10438328320 / 1024 / 1024" | bc 9954 [root@openshift-worker-1 ~]# echo "10438331648 / 1024 / 1024" | bc 9954 [root@openshift-worker-1 ~]# echo "10438328320 / 1024" | bc 10193680 [root@openshift-worker-1 ~]# echo "10438331648 / 1024" | bc 10193683 ~~~ The OOM can easily be reproduced: ~~~ [root@openshift-worker-1 ~]# cgroup=$(cat /proc/$(pgrep qemu-kvm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | grep total_rss ; sleep 2 ; done Mon Oct 5 18:59:00 UTC 2020 total_rss 1483591680 total_rss_huge 1400897536 Mon Oct 5 18:59:02 UTC 2020 total_rss 1483591680 total_rss_huge 1400897536 Mon Oct 5 18:59:05 UTC 2020 total_rss 1483591680 total_rss_huge 1400897536 Mon Oct 5 18:59:07 UTC 2020 total_rss 1479483392 total_rss_huge 1396703232 Mon Oct 5 18:59:09 UTC 2020 total_rss 1479483392 total_rss_huge 1396703232 Mon Oct 5 18:59:11 UTC 2020 total_rss 1479483392 total_rss_huge 1396703232 Mon Oct 5 18:59:13 UTC 2020 total_rss 1479483392 total_rss_huge 1396703232 Mon Oct 5 18:59:15 UTC 2020 total_rss 1743835136 total_rss_huge 1660944384 Mon Oct 5 18:59:17 UTC 2020 total_rss 3922923520 total_rss_huge 3839885312 Mon Oct 5 18:59:19 UTC 2020 total_rss 5042941952 total_rss_huge 4959764480 Mon Oct 5 18:59:21 UTC 2020 total_rss 6917820416 total_rss_huge 6834618368 Mon Oct 5 18:59:23 UTC 2020 total_rss 8656658432 total_rss_huge 8573157376 Mon Oct 5 18:59:25 UTC 2020 total_rss 9718091776 total_rss_huge 9634316288 Mon Oct 5 18:59:27 UTC 2020 total_rss 10336915456 total_rss_huge 10255073280 Mon Oct 5 18:59:29 UTC 2020 total_rss 10339106816 total_rss_huge 10257170432 Mon Oct 5 18:59:31 UTC 2020 total_rss 10339176448 total_rss_huge 10257170432 Mon Oct 5 18:59:33 UTC 2020 total_rss 10337443840 total_rss_huge 10257170432 Mon Oct 5 18:59:35 UTC 2020 total_rss 10338512896 total_rss_huge 10259267584 Mon Oct 5 18:59:37 UTC 2020 total_rss 10338521088 total_rss_huge 10259267584 Mon Oct 5 18:59:39 UTC 2020 total_rss 10338705408 total_rss_huge 10259267584 Mon Oct 5 18:59:41 UTC 2020 total_rss 10339704832 total_rss_huge 10259267584 Mon Oct 5 18:59:43 UTC 2020 total_rss 10339807232 total_rss_huge 10259267584 Mon Oct 5 18:59:45 UTC 2020 total_rss 10339876864 total_rss_huge 10259267584 Mon Oct 5 18:59:47 UTC 2020 total_rss 10339921920 total_rss_huge 10259267584 Mon Oct 5 18:59:49 UTC 2020 total_rss 10339938304 total_rss_huge 10259267584 Mon Oct 5 18:59:51 UTC 2020 total_rss 10340093952 total_rss_huge 10259267584 Mon Oct 5 18:59:53 UTC 2020 total_rss 10340093952 total_rss_huge 10259267584 Mon Oct 5 18:59:55 UTC 2020 total_rss 10340171776 total_rss_huge 10259267584 Mon Oct 5 18:59:57 UTC 2020 total_rss 10340315136 total_rss_huge 10259267584 Mon Oct 5 18:59:59 UTC 2020 total_rss 10340315136 total_rss_huge 10259267584 Mon Oct 5 19:00:01 UTC 2020 total_rss 10340315136 total_rss_huge 10259267584 Mon Oct 5 19:00:03 UTC 2020 total_rss 10340315136 total_rss_huge 10259267584 Mon Oct 5 19:00:05 UTC 2020 cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podd70e792d_be77_4568_ab04_da9590844ce5.slice/crio-6ec77a2e338be7a63eb57a695452a538abd0ab41675a4156f4fd7fde0dad6264.scope/memory.stat: No such file or directory Mon Oct 5 19:00:07 UTC 2020 cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podd70e792d_be77_4568_ab04_da9590844ce5.slice/crio-6ec77a2e338be7a63eb57a695452a538abd0ab41675a4156f4fd7fde0dad6264.scope/memory.stat: No such file or directory Mon Oct 5 19:00:09 UTC 2020 cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podd70e792d_be77_4568_ab04_da9590844ce5.slice/crio-6ec77a2e338be7a63eb57a695452a538abd0ab41675a4156f4fd7fde0dad6264.scope/memory.stat: No such file or directory ^C ~~~ ~~~ [root@openshift-worker-1 ~]# journalctl -f | grep -i oomkilled Oct 05 19:00:05 openshift-worker-1.example.com hyperkube[2223]: I1005 19:00:05.997108 2223 status_manager.go:570] Patch status for pod "virt-launcher-fedora-test-vm-nlbfm_test(d70e792d-be77-4568-ab04-da9590844ce5)" with "{\"metadata\":{\"uid\":\"d70e792d-be77-4568-ab04-da9590844ce5\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"reason\":\"PodCompleted\",\"type\":\"Initialized\"},{\"lastTransitionTime\":\"2020-10-05T19:00:05Z\",\"reason\":\"PodCompleted\",\"status\":\"False\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-10-05T19:00:05Z\",\"reason\":\"PodCompleted\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://6ec77a2e338be7a63eb57a695452a538abd0ab41675a4156f4fd7fde0dad6264\",\"image\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"imageID\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93\",\"lastState\":{},\"name\":\"compute\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://6ec77a2e338be7a63eb57a695452a538abd0ab41675a4156f4fd7fde0dad6264\",\"exitCode\":0,\"finishedAt\":\"2020-10-05T19:00:05Z\",\"reason\":\"OOMKilled\",\"startedAt\":\"2020-10-05T18:55:56Z\"}}},{\"containerID\":\"cri-o://3a22d8cbe650ab7cd316f2f8ffe5066e431555a375e9dbe5cf5d6feddec00877\",\"image\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo:latest\",\"imageID\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331\",\"lastState\":{},\"name\":\"volumerootdisk\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://3a22d8cbe650ab7cd316f2f8ffe5066e431555a375e9dbe5cf5d6feddec00877\",\"exitCode\":0,\"finishedAt\":\"2020-10-05T19:00:04Z\",\"reason\":\"Completed\",\"startedAt\":\"2020-10-05T18:56:02Z\"}}}],\"phase\":\"Succeeded\"}}" Oct 05 19:00:05 openshift-worker-1.example.com hyperkube[2223]: I1005 19:00:05.997177 2223 status_manager.go:578] Status for pod "virt-launcher-fedora-test-vm-nlbfm_test(d70e792d-be77-4568-ab04-da9590844ce5)" updated successfully: (6, {Phase:Succeeded Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 18:55:56 +0000 UTC Reason:PodCompleted Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:00:05 +0000 UTC Reason:PodCompleted Message:} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 19:00:05 +0000 UTC Reason:PodCompleted Message:} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-05 18:59:58 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:192.168.123.221 PodIP:172.24.2.17 PodIPs:[{IP:172.24.2.17}] StartTime:2020-10-05 18:55:47 +0000 UTC InitContainerStatuses:[{Name:container-disk-binary State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-05 18:55:55 +0000 UTC,FinishedAt:2020-10-05 18:55:55 +0000 UTC,ContainerID:cri-o://427a6f43f61c73abeb7b5e5a142678c11cab2413d0519c98349d60c1f3d9492c,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://427a6f43f61c73abeb7b5e5a142678c11cab2413d0519c98349d60c1f3d9492c Started:<nil>}] ContainerStatuses:[{Name:compute State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:OOMKilled,Message:,StartedAt:2020-10-05 18:55:56 +0000 UTC,FinishedAt:2020-10-05 19:00:05 +0000 UTC,ContainerID:cri-o://6ec77a2e338be7a63eb57a695452a538abd0ab41675a4156f4fd7fde0dad6264,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ContainerID:cri-o://6ec77a2e338be7a63eb57a695452a538abd0ab41675a4156f4fd7fde0dad6264 Started:0xc001b3317c} {Name:volumerootdisk State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-05 18:56:02 +0000 UTC,FinishedAt:2020-10-05 19:00:04 +0000 UTC,ContainerID:cri-o://3a22d8cbe650ab7cd316f2f8ffe5066e431555a375e9dbe5cf5d6feddec00877,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://3a22d8cbe650ab7cd316f2f8ffe5066e431555a375e9dbe5cf5d6feddec00877 Started:0xc001b3317d}] QOSClass:Burstable EphemeralContainerStatuses:[]}) Oct 05 19:00:05 openshift-worker-1.example.com hyperkube[2223]: I1005 19:00:05.997317 2223 kubelet_pods.go:961] Pod "virt-launcher-fedora-test-vm-nlbfm_test(d70e792d-be77-4568-ab04-da9590844ce5)" is terminated, but some containers have not been cleaned up: {ID:{Type:cri-o ID:3a22d8cbe650ab7cd316f2f8ffe5066e431555a375e9dbe5cf5d6feddec00877} Name:volumerootdisk State:exited CreatedAt:2020-10-05 18:56:02.171592428 +0000 UTC StartedAt:2020-10-05 18:56:02.372442071 +0000 UTC FinishedAt:2020-10-05 19:00:04.948970301 +0000 UTC ExitCode:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 Hash:1204204984 RestartCount:0 Reason:Completed Message:} {ID:{Type:cri-o ID:6ec77a2e338be7a63eb57a695452a538abd0ab41675a4156f4fd7fde0dad6264} Name:compute State:exited CreatedAt:2020-10-05 18:55:56.669982263 +0000 UTC StartedAt:2020-10-05 18:55:56.70232152 +0000 UTC FinishedAt:2020-10-05 19:00:05.066970303 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:988105243 RestartCount:0 Reason:OOMKilled Message:} {ID:{Type:cri-o ID:427a6f43f61c73abeb7b5e5a142678c11cab2413d0519c98349d60c1f3d9492c} Name:container-disk-binary State:exited CreatedAt:2020-10-05 18:55:54.971599116 +0000 UTC StartedAt:2020-10-05 18:55:55.271974426 +0000 UTC FinishedAt:2020-10-05 18:55:55.769965998 +0000 UTC ExitCode:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:51ad4131442b52b79397d10e2654aad7abdea13899a2b490b3684c18620cdf93 Hash:793891490 RestartCount:0 Reason:Completed Message:} ^C ~~~ ~~~ [root@fedora-test-vm ~]# stress-ng --vm 1 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1 stress-ng: debug: [9457] 3 processors online, 3 processors configured stress-ng: info: [9457] dispatching hogs: 1 vm, 1 hdd, 1 io stress-ng: debug: [9457] cache allocate: default cache size: 16384K stress-ng: debug: [9457] starting stressors stress-ng: debug: [9458] stress-ng-vm: started [9458] (instance 0) stress-ng: debug: [9457] 3 stressors spawned stress-ng: debug: [9459] stress-ng-hdd: started [9459] (instance 0) stress-ng: debug: [9460] stress-ng-io: started [9460] (instance 0) stress-ng: debug: [9458] stress-ng-vm using method 'all' ~~~ ~~~ [core@openshift-worker-1 ~]$ while true ; do scp -i gss-stack-tools -P 20022 test.txt fedora.54.202: ; done test.txt 60% 616MB 156.7MB/s 00:02 ETA test.txt 68% 702MB 149.8MB/s 00:02 ETA test.txt 100% 1024MB 62.8MB/s 00:16 test.txt 100% 1024MB 156.8MB/s 00:06 test.txt 100% 1024MB 150.7MB/s 00:06 test.txt 100% 1024MB 166.4MB/s 00:06 ^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C [core@openshift-worker-1 ~]$ ^C ~~~ ======================================= [0] VM definition was: ~~~ [root@openshift-jumpserver-0 ~]# [root@openshift-jumpserver-0 ~]# cat fedora-vm.yaml apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachine metadata: annotations: kubevirt.io/latest-observed-api-version: v1alpha3 kubevirt.io/storage-observed-api-version: v1alpha3 name.os.template.kubevirt.io/fedora31: Fedora 31 labels: app: fedora-test-vm flavor.template.kubevirt.io/Custom: "true" os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/template.namespace: openshift vm.kubevirt.io/template.revision: "1" vm.kubevirt.io/template.version: v0.9.1 workload.template.kubevirt.io/server: "true" name: fedora-test-vm namespace: test spec: running: true template: metadata: labels: flavor.template.kubevirt.io/Custom: "true" kubevirt.io/domain: fedora-test-vm kubevirt.io/size: large os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/name: fedora-test-vm workload.template.kubevirt.io/server: "true" spec: domain: cpu: cores: 3 model: host-model devices: disks: - bootOrder: 1 disk: bus: virtio name: rootdisk - disk: bus: virtio name: cloudinitdisk interfaces: - masquerade: {} model: virtio name: nic0 networkInterfaceMultiqueue: true rng: {} features: acpi: enabled: true machine: type: q35 resources: requests: cpu: "3" memory: 10240M limits: cpu: "3" memory: 10240M overcommitGuestOverhead: true memory: guest: 10240M evictionStrategy: LiveMigrate hostname: fedora-test-vm networks: - name: nic0 pod: {} terminationGracePeriodSeconds: 0 volumes: - containerDisk: image: docker.io/kubevirt/fedora-cloud-container-disk-demo name: rootdisk - cloudInitNoCloud: userData: | #cloud-config name: default hostname: fedora-test-vm ssh_authorized_keys: - >- ssh-rsa AA(...)IU= name: cloudinitdisk ~~~
One issue here is definitely that we had a miscalculation which got fixed in https://github.com/kubevirt/kubevirt/pull/3921. That will be part of CNV 2.5. We should definitely try to reproduce these very good examples on newer kubevirt releases on OCP and k8s to see if it works now. One workaround, as already elaborated above by Andreas, is explicitly undercommitting the VM until it stops crashing: https://bugzilla.redhat.com/show_bug.cgi?id=1885418#c5
One general not on overcommitGuestOverhead: It makes only sense to use this option when swap is enabled. CNV VMs have right now no way to give back memory once they touched it. When there is no swap space available the VM will sooner or later operate above its request. This means that one always risks that the kubelet may evict the workload, even if staying below the limits. Since swap is normally disabled on openshift (and most other k8s), this should not be used here. Undercommiting can be done without this option.
Adding TestOnly flag to this BZ as it should be fixed by https://github.com/kubevirt/kubevirt/pull/3921
Repeated the exact same issue on: ~~~ [root@cnvqe-01 ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-10-03-051134 True False 3d22h Cluster version is 4.6.0-0.nightly-2020-10-03-051134 ~~~ ~~~ [root@cnvqe-01 ~]# oc get csv -n openshift-cnv NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v2.5.0 OpenShift Virtualization 2.5.0 kubevirt-hyperconverged-operator.v2.4.1 Succeeded ~~~ ~~~ [root@cnvqe-01 ~]# cat akaris-vm.yaml apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachine metadata: annotations: kubevirt.io/latest-observed-api-version: v1alpha3 kubevirt.io/storage-observed-api-version: v1alpha3 name.os.template.kubevirt.io/fedora31: Fedora 31 labels: app: akaris-test-vm flavor.template.kubevirt.io/Custom: "true" os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/template.namespace: openshift vm.kubevirt.io/template.revision: "1" vm.kubevirt.io/template.version: v0.9.1 workload.template.kubevirt.io/server: "true" name: akaris-test-vm spec: running: true template: metadata: labels: flavor.template.kubevirt.io/Custom: "true" kubevirt.io/domain: akaris-test-vm kubevirt.io/size: large os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/name: akaris-test-vm workload.template.kubevirt.io/server: "true" spec: domain: cpu: cores: 3 model: host-model devices: disks: - bootOrder: 1 disk: bus: virtio name: rootdisk - disk: bus: virtio name: cloudinitdisk interfaces: - masquerade: {} model: virtio name: nic0 networkInterfaceMultiqueue: true rng: {} features: acpi: enabled: true machine: type: q35 resources: requests: cpu: "3" memory: 8Gi limits: cpu: "3" memory: 8Gi # overcommitGuestOverhead: true # memory: # guest: 8Gi evictionStrategy: LiveMigrate hostname: akaris-test-vm networks: - name: nic0 pod: {} terminationGracePeriodSeconds: 0 volumes: - containerDisk: image: docker.io/kubevirt/fedora-cloud-container-disk-demo name: rootdisk - cloudInitNoCloud: userData: | #cloud-config name: default hostname: akaris-test-vm ssh_authorized_keys: - >- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDEn6wI5YZbCbwUX8RlBYkkYGFK8vTXyeuOJWf6WrWcLi2rKfCoXWNxAw2ekLqIQRpc2xI5wCquWPeZ3aiHpqlocbFgLuGEBIqciJtGnI1IGOZtdccfEe8MRKJMjy3EnktGnHiFkxPXE7UYglujq1GN2BAzgGKigjOLu+k56tVWs+U96KtZz81P3sE4wye4rCm0R0lS4kaEX5bQ8ZAQG4KnWVuuW/PMm2g5n8FImjBz2t7kW3w4N9O4/yvUpgrsR75FXwdZ4gfBLxcKm2oqXHker31UFo4R5uuov5KO43OEOswGu/JmbcBm5T9ozyE/9RE+iuMrYIDWV9v8SgwT7Tpm24dKpcRWO968jDzaWPgGL8EN4y9uDYx7oIlYmwzJ5xFiDgQh+kwWi5p67TD2Iuxmxp5yOjv9ZKJKSk+wU8Z3407kB/AoempquSVXPU6jfZzAxFWlsQdJOd1ETmgcAjbF/WhFGR1LjJrzK6Nm3Gr2HDA2VAmOnLzyh7BBIHZSqLM= core.eng.tlv2.redhat.com name: cloudinitdisk ~~~ ~~~ oc create -f akaris-vm.yaml ~~~ ~~~ virtctl expose vmi akaris-test-vm --port=20022 --target-port=22 --name=akaris-test-vm-ssh ~~~ Resulting VM and pod: ~~~ [root@cnvqe-01 ~]# oc describe pod virt-launcher-akaris-test-vm-nb4fg | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:' Limits: memory: 40M Requests: memory: 40M Limits: memory: 40M Requests: memory: 40M Containers: compute: Limits: memory: 8579144Ki Requests: memory: 8579144Ki volumerootdisk: Limits: memory: 40M Requests: memory: 40M ~~~ On the worker node: ~~~ [root@cnvqe-07 ~]# pgrep -f guest=test-bz-1885418_akaris-test-vm | while read pid ; do echo === $pid === ; ps aux | grep $pid | grep -o '\-m [0-9]*' ; pmap $pid | tail ; done === 1633178 === -m 8192 00007f0d54e16000 12K rw-s- [ anon ] 00007f0d54e19000 4K rw-s- zero (deleted) 00007f0d54e1a000 4K r---- ld-2.28.so 00007f0d54e1b000 4K rw--- ld-2.28.so 00007f0d54e1c000 4K rw--- [ anon ] 00007fffc86ef000 136K rw--- [ stack ] 00007fffc879b000 12K r---- [ anon ] 00007fffc879e000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 13640628K [root@cnvqe-07 ~]# cgroup=$(cat /proc/$(pgrep -f guest=test-bz-1885418_akaris-test-vm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit' Thu Oct 8 09:58:48 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8704077824 total_rss_huge 8625586176 ~~~ Compare this to https://bugzilla.redhat.com/show_bug.cgi?id=1885418#c4 It's clear at this point that the issue will reproduce, even without testing, given that the cgroup limits and VM memory are exactly the same. Connect to VM in one terminal: ~~~ ssh -i test-key -p 20022 fedora.95.83 sudo -i # MUST BE ROOT yum install stress-ng -y stress-ng --vm 4 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1 # we are memory ballooning now, so I'm a bit more aggressive here ~~~ In other terminal run: ~~~ while true ; do scp -i test-key -P 20022 test.txt fedora.95.83: ; done ~~~ Check journal and wait: ~~~ journalctl -f | grep -i oomkilled ~~~ Check in other terminal: ~~~ [root@cnvqe-07 ~]# cgroup=$(cat /proc/$(pgrep -f guest=test-bz-1885418_akaris-test-vm)/cgroup | grep memory | awk -F ':' '{print $NF}') ; while true; do date ; cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit' ; sleep 2 ; done Thu Oct 8 09:58:48 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8704077824 total_rss_huge 8625586176 (...) ~~~ FYI: With the new version of CNV, we are ballooning now: ~~~ [root@cnvqe-07 ~]# ps aux | grep akaris | grep qemu | grep ball 107 1633178 53.8 6.4 14091404 8452648 ? Sl 09:32 20:53 /usr/libexec/qemu-kvm -name guest=test-bz-1885418_akaris-test-vm,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-test-bz-1885418_akar/master-key.aes -machine pc-q35-rhel8.2.0,accel=kvm,usb=off,dump-guest-core=off -cpu Cascadelake-Server,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on -m 8192 -overcommit mem-lock=off -smp 3,sockets=1,dies=1,cores=3,threads=1 -object iothread,id=iothread1 -uuid c7ba2145-f452-55ea-93e8-bbe46f64e795 -smbios type=1,manufacturer=Red Hat,product=Container-native virtualization,version=2.5.0,uuid=c7ba2145-f452-55ea-93e8-bbe46f64e795,sku=2.5.0,family=Red Hat -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=20,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x0 -blockdev {"driver":"file","filename":"/var/run/kubevirt/container-disks/disk_0.img","node-name":"libvirt-3-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-3-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-3-storage","backing":null} -blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/disk-data/rootdisk/disk.qcow2","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"} -device virtio-blk-pci,scsi=off,bus=pci.3,addr=0x0,drive=libvirt-2-format,id=ua-rootdisk,bootindex=1,write-cache=on -blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/cloud-init-data/test-bz-1885418/akaris-test-vm/noCloud.iso","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"} -device virtio-blk-pci,scsi=off,bus=pci.4,addr=0x0,drive=libvirt-1-format,id=ua-cloudinitdisk,write-cache=on -netdev tap,fds=22:23:24,id=hostua-nic0,vhost=on,vhostfds=25:26:27 -device virtio-net-pci,mq=on,vectors=8,host_mtu=1450,netdev=hostua-nic0,id=ua-nic0,mac=02:00:00:1d:9e:07,bus=pci.1,addr=0x0,romfile= -chardev socket,id=charserial0,fd=28,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=29,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc vnc=unix:/var/run/kubevirt-private/403f1b15-67c9-4755-b8a9-167cc069a0c7/virt-vnc -device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 -device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.6,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on ~~~ Before, we were not doing this: ~~~ [root@openshift-worker-1 ~]# ps aux | grep qemu-kvm 107 22602 0.2 0.9 13820176 1305248 ? Sl Oct07 2:58 /usr/libexec/qemu-kvm -name guest=test_fedora-test-vm,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-test_fedora-test-vm/master-key.aes -machine pc-q35-rhel8.2.0,accel=kvm,usb=off,dump-guest-core=off -cpu Broadwell,vme=on,ss=on,vmx=on,f16c=on,rdrand=on,hypervisor=on,arat=on,tsc-adjust=on,umip=on,arch-capabilities=on,xsaveopt=on,pdpe1gb=on,abm=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,rtm=on,hle=on -m 9766 -overcommit mem-lock=off -smp 3,sockets=1,dies=1,cores=3,threads=1 -object iothread,id=iothread1 -uuid 7871df6c-d00f-5202-84d5-788821cd576c -smbios type=1,manufacturer=Red Hat,product=Container-native virtualization,version=2.4.2,uuid=7871df6c-d00f-5202-84d5-788821cd576c,sku=2.4.2,family=Red Hat -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=21,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 -device virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x0 -blockdev {"driver":"file","filename":"/var/run/kubevirt/container-disks/disk_0.img","node-name":"libvirt-3-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-3-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-3-storage","backing":null} -blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/disk-data/rootdisk/disk.qcow2","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"} -device virtio-blk-pci,scsi=off,bus=pci.3,addr=0x0,drive=libvirt-2-format,id=ua-rootdisk,bootindex=1,write-cache=on -blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/cloud-init-data/test/fedora-test-vm/noCloud.iso","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"} -device virtio-blk-pci,scsi=off,bus=pci.4,addr=0x0,drive=libvirt-1-format,id=ua-cloudinitdisk,write-cache=on -netdev tap,fds=23:24:25,id=hostua-nic0,vhost=on,vhostfds=26:27:28 -device virtio-net-pci,mq=on,vectors=8,host_mtu=8900,netdev=hostua-nic0,id=ua-nic0,mac=02:00:00:21:b0:6c,bus=pci.1,addr=0x0 -chardev socket,id=charserial0,fd=29,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=30,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc vnc=unix:/var/run/kubevirt-private/9d236936-6f31-41a0-a5e4-bd374e11a580/virt-vnc -device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.5,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on ~~~ After a while, I get the OOM kill. Note that I needed to run stress-ng as root and with --vm 4 to achieve this, this time: ~~~ [root@akaris-test-vm ~]# stress-ng --vm 4 --vm-bytes 100% --vm-method all --verify -t 10m -v --hdd 1 --io 1 stress-ng: debug: [11310] 3 processors online, 3 processors configured stress-ng: info: [11310] dispatching hogs: 4 vm, 1 hdd, 1 io stress-ng: debug: [11310] cache allocate: default cache size: 16384K stress-ng: debug: [11310] starting stressors stress-ng: debug: [11311] stress-ng-vm: started [11311] (instance 0) stress-ng: debug: [11312] stress-ng-hdd: started [11312] (instance 0) stress-ng: debug: [11311] stress-ng-vm using method 'all' stress-ng: debug: [11310] 6 stressors spawned stress-ng: debug: [11317] stress-ng-vm: started [11317] (instance 3) stress-ng: debug: [11314] stress-ng-vm: started [11314] (instance 1) stress-ng: debug: [11317] stress-ng-vm using method 'all' stress-ng: debug: [11315] stress-ng-vm: started [11315] (instance 2) stress-ng: debug: [11314] stress-ng-vm using method 'all' stress-ng: debug: [11315] stress-ng-vm using method 'all' stress-ng: debug: [11313] stress-ng-io: started [11313] (instance 0) ~~~ ~~~ (...) Thu Oct 8 10:17:47 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8697028608 total_rss_huge 8623489024 Thu Oct 8 10:17:49 UTC 2020 hierarchical_memory_limit 8785043456 total_rss 8649449472 total_rss_huge 8613003264 Thu Oct 8 10:17:51 UTC 2020 cat: /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod13874f96_492a_4672_a6d9_45df3b521b6e.slice/crio-1e7eb162cadbda84b46f3a018e1f6576fd9e9521c44d6286f69d2f12eb7509b7.scope/memory.stat: No such file or directory Thu Oct 8 10:17:53 UTC 2020 ~~~ ~~~ [root@cnvqe-07 ~]# journalctl -f | grep -i oomkilled Oct 08 10:17:49 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:17:49.861166 3275 status_manager.go:564] Patch status for pod "virt-launcher-akaris-test-vm-nb4fg_test-bz-1885418(13874f96-492a-4672-a6d9-45df3b521b6e)" with "{\"metadata\":{\"uid\":\"13874f96-492a-4672-a6d9-45df3b521b6e\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"lastTransitionTime\":\"2020-10-08T10:17:49Z\",\"message\":\"containers with unready status: [compute]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-10-08T10:17:49Z\",\"message\":\"containers with unready status: [compute]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://1e7eb162cadbda84b46f3a018e1f6576fd9e9521c44d6286f69d2f12eb7509b7\",\"image\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:c64a9aa0ba5c598e999929049d52afb5959e91debf38aed6937f53fe4ecbc84c\",\"imageID\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:b2ecfb71c9ff2e898f9864b56573f724f4cc7ae510c038f3642e99b351676d56\",\"lastState\":{},\"name\":\"compute\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://1e7eb162cadbda84b46f3a018e1f6576fd9e9521c44d6286f69d2f12eb7509b7\",\"exitCode\":137,\"finishedAt\":\"2020-10-08T10:17:49Z\",\"reason\":\"OOMKilled\",\"startedAt\":\"2020-10-08T09:31:58Z\"}}},{\"containerID\":\"cri-o://784e1ae476b77353011a988c4d9cc783d69595f540b31099f67ca678267c12e3\",\"image\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo:latest\",\"imageID\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331\",\"lastState\":{},\"name\":\"volumerootdisk\",\"ready\":true,\"restartCount\":0,\"started\":true,\"state\":{\"running\":{\"startedAt\":\"2020-10-08T09:32:05Z\"}}}]}}" Oct 08 10:17:49 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:17:49.861217 3275 status_manager.go:572] Status for pod "virt-launcher-akaris-test-vm-nb4fg_test-bz-1885418(13874f96-492a-4672-a6d9-45df3b521b6e)" updated successfully: (6, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 09:31:58 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 10:17:49 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 10:17:49 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 09:31:43 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.46.41.22 PodIP:10.131.0.249 PodIPs:[{IP:10.131.0.249}] StartTime:2020-10-08 09:31:43 +0000 UTC InitContainerStatuses:[{Name:container-disk-binary State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-08 09:31:49 +0000 UTC,FinishedAt:2020-10-08 09:31:49 +0000 UTC,ContainerID:cri-o://72f571fc2ddf077ee3972a36adee2e4b1f5e0eed6b677219a079aa1d1f5d1053,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:c64a9aa0ba5c598e999929049d52afb5959e91debf38aed6937f53fe4ecbc84c ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:b2ecfb71c9ff2e898f9864b56573f724f4cc7ae510c038f3642e99b351676d56 ContainerID:cri-o://72f571fc2ddf077ee3972a36adee2e4b1f5e0eed6b677219a079aa1d1f5d1053 Started:<nil>} {Name:volumerootdisk-init State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-08 09:31:58 +0000 UTC,FinishedAt:2020-10-08 09:31:58 +0000 UTC,ContainerID:cri-o://7240f8f53aa37c4fb002ae4ea2b0986e8beec3e2bd51ba965925c80a9ebab8eb,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://7240f8f53aa37c4fb002ae4ea2b0986e8beec3e2bd51ba965925c80a9ebab8eb Started:<nil>}] ContainerStatuses:[{Name:compute State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:137,Signal:0,Reason:OOMKilled,Message:,StartedAt:2020-10-08 09:31:58 +0000 UTC,FinishedAt:2020-10-08 10:17:49 +0000 UTC,ContainerID:cri-o://1e7eb162cadbda84b46f3a018e1f6576fd9e9521c44d6286f69d2f12eb7509b7,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:c64a9aa0ba5c598e999929049d52afb5959e91debf38aed6937f53fe4ecbc84c ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:b2ecfb71c9ff2e898f9864b56573f724f4cc7ae510c038f3642e99b351676d56 ContainerID:cri-o://1e7eb162cadbda84b46f3a018e1f6576fd9e9521c44d6286f69d2f12eb7509b7 Started:0xc004faa9ca} {Name:volumerootdisk State:{Waiting:nil Running:&ContainerStateRunning{StartedAt:2020-10-08 09:32:05 +0000 UTC,} Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://784e1ae476b77353011a988c4d9cc783d69595f540b31099f67ca678267c12e3 Started:0xc004faa9cb}] QOSClass:Guaranteed EphemeralContainerStatuses:[]}) Oct 08 10:18:27 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:18:27.972579 3275 status_manager.go:564] Patch status for pod "virt-launcher-akaris-test-vm-nb4fg_test-bz-1885418(13874f96-492a-4672-a6d9-45df3b521b6e)" with "{\"metadata\":{\"uid\":\"13874f96-492a-4672-a6d9-45df3b521b6e\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"message\":\"containers with unready status: [compute volumerootdisk]\",\"type\":\"Ready\"},{\"message\":\"containers with unready status: [compute volumerootdisk]\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://1e7eb162cadbda84b46f3a018e1f6576fd9e9521c44d6286f69d2f12eb7509b7\",\"image\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:c64a9aa0ba5c598e999929049d52afb5959e91debf38aed6937f53fe4ecbc84c\",\"imageID\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:b2ecfb71c9ff2e898f9864b56573f724f4cc7ae510c038f3642e99b351676d56\",\"lastState\":{},\"name\":\"compute\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://1e7eb162cadbda84b46f3a018e1f6576fd9e9521c44d6286f69d2f12eb7509b7\",\"exitCode\":137,\"finishedAt\":\"2020-10-08T10:17:49Z\",\"reason\":\"OOMKilled\",\"startedAt\":\"2020-10-08T09:31:58Z\"}}},{\"image\":\"docker.io/kubevirt/fedoraA-cloud-container-disk-demo\",\"imageID\":\"\",\"lastState\":{},\"name\":\"volumerootdisk\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"waiting\":{\"reason\":\"PodInitializing\"}}}],\"phase\":\"Pending\"}}" Oct 08 10:18:27 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:18:27.972622 3275 status_manager.go:572] Status for pod "virt-launcher-akaris-test-vm-nb4fg_test-bz-1885418(13874f96-492a-4672-a6d9-45df3b521b6e)" updated successfully: (8, {Phase:Pending Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 09:31:58 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 10:17:49 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute volumerootdisk]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 10:17:49 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute volumerootdisk]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 09:31:43 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.46.41.22 PodIP:10.131.0.249 PodIPs:[{IP:10.131.0.249}] StartTime:2020-10-08 09:31:43 +0000 UTC InitContainerStatuses:[{Name:container-disk-binary State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-08 09:31:49 +0000 UTC,FinishedAt:2020-10-08 09:31:49 +0000 UTC,ContainerID:cri-o://72f571fc2ddf077ee3972a36adee2e4b1f5e0eed6b677219a079aa1d1f5d1053,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:c64a9aa0ba5c598e999929049d52afb5959e91debf38aed6937f53fe4ecbc84c ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:b2ecfb71c9ff2e898f9864b56573f724f4cc7ae510c038f3642e99b351676d56 ContainerID:cri-o://72f571fc2ddf077ee3972a36adee2e4b1f5e0eed6b677219a079aa1d1f5d1053 Started:<nil>} {Name:volumerootdisk-init State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-08 09:31:58 +0000 UTC,FinishedAt:2020-10-08 09:31:58 +0000 UTC,ContainerID:cri-o://7240f8f53aa37c4fb002ae4ea2b0986e8beec3e2bd51ba965925c80a9ebab8eb,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://7240f8f53aa37c4fb002ae4ea2b0986e8beec3e2bd51ba965925c80a9ebab8eb Started:<nil>}] ContainerStatuses:[{Name:compute State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:137,Signal:0,Reason:OOMKilled,Message:,StartedAt:2020-10-08 09:31:58 +0000 UTC,FinishedAt:2020-10-08 10:17:49 +0000 UTC,ContainerID:cri-o://1e7eb162cadbda84b46f3a018e1f6576fd9e9521c44d6286f69d2f12eb7509b7,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:c64a9aa0ba5c598e999929049d52afb5959e91debf38aed6937f53fe4ecbc84c ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:b2ecfb71c9ff2e898f9864b56573f724f4cc7ae510c038f3642e99b351676d56 ContainerID:cri-o://1e7eb162cadbda84b46f3a018e1f6576fd9e9521c44d6286f69d2f12eb7509b7 Started:0xc0046d47ad} {Name:volumerootdisk State:{Waiting:&ContainerStateWaiting{Reason:PodInitializing,Message:,} Running:nil Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo ImageID: ContainerID: Started:0xc0046d47ae}] QOSClass:Guaranteed EphemeralContainerStatuses:[]}) ~~~
So it has really gotten more difficult to get here. I'm still trying to find the correct stressor. E.g. ...: stress-ng --vm 4 --vm-bytes 100% --vm-method all --vm-locked --verify -t 10m -v --hdd 1 --io 1 --tun 2
[root@cnvqe-07 ~]# journalctl -f | grep -i oomkilled Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.856521 3275 status_manager.go:564] Patch status for pod "virt-launcher-akaris-test-vm-xkqdw_test-bz-1885418(64516f15-013a-4c1d-8f03-9b0f1ac98ace)" with "{\"metadata\":{\"uid\":\"64516f15-013a-4c1d-8f03-9b0f1ac98ace\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"lastTransitionTime\":\"2020-10-08T10:40:12Z\",\"message\":\"containers with unready status: [compute]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-10-08T10:40:12Z\",\"message\":\"containers with unready status: [compute]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c\",\"image\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:c64a9aa0ba5c598e999929049d52afb5959e91debf38aed6937f53fe4ecbc84c\",\"imageID\":\"registry.redhat.io/container-native-virtualization/virt-launcher@sha256:b2ecfb71c9ff2e898f9864b56573f724f4cc7ae510c038f3642e99b351676d56\",\"lastState\":{},\"name\":\"compute\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c\",\"exitCode\":137,\"finishedAt\":\"2020-10-08T10:40:12Z\",\"reason\":\"OOMKilled\",\"startedAt\":\"2020-10-08T10:18:55Z\"}}},{\"containerID\":\"cri-o://2341b1474239dc3ed0a1755dfb51b7836248edc08e1682e60fbb0ad9995cee85\",\"image\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo:latest\",\"imageID\":\"docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331\",\"lastState\":{},\"name\":\"volumerootdisk\",\"ready\":true,\"restartCount\":0,\"started\":true,\"state\":{\"running\":{\"startedAt\":\"2020-10-08T10:19:02Z\"}}}]}}" Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.856567 3275 status_manager.go:572] Status for pod "virt-launcher-akaris-test-vm-xkqdw_test-bz-1885418(64516f15-013a-4c1d-8f03-9b0f1ac98ace)" updated successfully: (6, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 10:18:55 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 10:40:12 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 10:40:12 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [compute]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-10-08 10:18:38 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.46.41.22 PodIP:10.131.1.9 PodIPs:[{IP:10.131.1.9}] StartTime:2020-10-08 10:18:38 +0000 UTC InitContainerStatuses:[{Name:container-disk-binary State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-08 10:18:45 +0000 UTC,FinishedAt:2020-10-08 10:18:45 +0000 UTC,ContainerID:cri-o://fb3dbd220dea2b2dc5d10837774ac75071fbd33fdf9ed2d3990f7110a18e1191,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:c64a9aa0ba5c598e999929049d52afb5959e91debf38aed6937f53fe4ecbc84c ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:b2ecfb71c9ff2e898f9864b56573f724f4cc7ae510c038f3642e99b351676d56 ContainerID:cri-o://fb3dbd220dea2b2dc5d10837774ac75071fbd33fdf9ed2d3990f7110a18e1191 Started:<nil>} {Name:volumerootdisk-init State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2020-10-08 10:18:53 +0000 UTC,FinishedAt:2020-10-08 10:18:54 +0000 UTC,ContainerID:cri-o://178dc90fd4221e4b20e6ed80187acde6bd17671c94d807f606eafd51519aa5df,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://178dc90fd4221e4b20e6ed80187acde6bd17671c94d807f606eafd51519aa5df Started:<nil>}] ContainerStatuses:[{Name:compute State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:137,Signal:0,Reason:OOMKilled,Message:,StartedAt:2020-10-08 10:18:55 +0000 UTC,FinishedAt:2020-10-08 10:40:12 +0000 UTC,ContainerID:cri-o://8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:c64a9aa0ba5c598e999929049d52afb5959e91debf38aed6937f53fe4ecbc84c ImageID:registry.redhat.io/container-native-virtualization/virt-launcher@sha256:b2ecfb71c9ff2e898f9864b56573f724f4cc7ae510c038f3642e99b351676d56 ContainerID:cri-o://8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c Started:0xc0025ac75a} {Name:volumerootdisk State:{Waiting:nil Running:&ContainerStateRunning{StartedAt:2020-10-08 10:19:02 +0000 UTC,} Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:docker.io/kubevirt/fedora-cloud-container-disk-demo:latest ImageID:docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4afe57d5a843ecde64baf0ab58ca78f9ac4aebfb04d1c9533c23c133bb040331 ContainerID:cri-o://2341b1474239dc3ed0a1755dfb51b7836248edc08e1682e60fbb0ad9995cee85 Started:0xc0025ac75b}] QOSClass:Guaranteed EphemeralContainerStatuses:[]}) ^C And here from the journal: ~~~ Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.028188 3275 config.go:278] Setting pods for source file Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.340260 3275 prober.go:174] HTTP-Probe Host: https://10.131.0.2, Port: 8443, Path: /healthz Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.340291 3275 prober.go:177] HTTP-Probe Headers: map[] Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.344666 3275 http.go:128] Probe succeeded for https://10.131.0.2:8443/healthz, Response: {200 OK 200 HTTP/1.1 1 1 map[Content-Length:[381] Content-Type:[application/json] Date:[Thu, 08 Oct 2020 10:40:11 GMT]] 0xc002afa200 381 [] true false map[] 0xc001596700 0xc00801fd90} Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.344706 3275 prober.go:126] Liveness probe for "virt-handler-w5slc_openshift-cnv(b1111e19-148c-41c0-bc65-8b9d7cc65bf4):virt-handler" succeeded Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.371453 3275 prober.go:159] Exec-Probe Pod: mdns-publisher-cnvqe-07.lab.eng.tlv2.redhat.com, Container: mdns-publisher, Command: [pgrep mdns-publisher] Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.406596 3275 handler.go:125] Unable to get network stats from pid 1806771: couldn't read network stats: failure opening /proc/1806771/net/dev: open /proc/1806771/net/dev: no such file or directory Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.406623 3275 handler.go:181] Unable to get Process Stats: couldn't open cpu cgroup procs file /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod668ab052_7ce8_4264_b726_a363bbf35f6a.slice/crio-3ae79fe3f553052068e9c85d724e67003bd8eb903d20dd2b46a0f15d03bf903c.scope/cgroup.procs : open /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod668ab052_7ce8_4264_b726_a363bbf35f6a.slice/crio-3ae79fe3f553052068e9c85d724e67003bd8eb903d20dd2b46a0f15d03bf903c.scope/cgroup.procs: no such file or directory Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.435575 3275 exec.go:60] Exec probe response: "1\n" Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.435598 3275 prober.go:126] Liveness probe for "mdns-publisher-cnvqe-07.lab.eng.tlv2.redhat.com_openshift-kni-infra(4c3626876428bd9e1291590fe6f28e21):mdns-publisher" succeeded Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.463212 3275 httplog.go:94] "HTTP" verb="GET" URI="/metrics" latency="12.735098ms" userAgent="Prometheus/2.21.0" srcIP="10.46.41.21:60926" resp=200 Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.485477 3275 handler.go:181] Unable to get Process Stats: couldn't open cpu cgroup procs file /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod48b967a9_8729_459a_abfc_b5b5b4416480.slice/crio-5eeddbf7dce3039dfc11bf0f27e905be721eb9025d8a80c2f2f3cad9622e6ade.scope/cgroup.procs : open /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod48b967a9_8729_459a_abfc_b5b5b4416480.slice/crio-5eeddbf7dce3039dfc11bf0f27e905be721eb9025d8a80c2f2f3cad9622e6ade.scope/cgroup.procs: no such file or directory Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.490357 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: hostpath-provisioner-gnnl9, container: hostpath-provisioner, container id: 3e4e7e4097a34df1e1826bd58197ffc02b06b44461f398c06e3ffc59cc71dcd5, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: W1008 10:40:11.506689 3275 cpu_manager.go:397] [cpumanager] reconcileState: ignoring terminated container (pod: kubevirt-node-labeller-qjb28, container id: 0423170fc39fe5eb26dc807855f98177d3d399e4262cc952051a9a9444f9278f) Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: W1008 10:40:11.506731 3275 cpu_manager.go:397] [cpumanager] reconcileState: ignoring terminated container (pod: kubevirt-node-labeller-qjb28, container id: 308fe4998c826d377c54856ddfc7a8705abcd778fac873b3105f4f25c43044e7) Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.506757 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: kubevirt-node-labeller-qjb28, container: kubevirt-node-labeller-sleeper, container id: c996bcd36104cdc6ccadead0c79fc424b4f0a995fe0749e944745dfbe0a11c3b, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.522146 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: csi-cephfsplugin-provisioner-67bff7547f-b6md6, container: csi-attacher, container id: 9069342d674f6a88be93ecb59a1a9b5eeff08c6bf0d53fe237733eb8f3bc9f43, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.537146 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: csi-cephfsplugin-provisioner-67bff7547f-b6md6, container: csi-snapshotter, container id: 954749d9b47d9c5452d45ed2dcf0567b7cd5df381205b0f0f9cbfe34ebcd9bcc, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.552307 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: csi-cephfsplugin-provisioner-67bff7547f-b6md6, container: csi-resizer, container id: 5608bfd23cc9882204fb978468632ddfc56277f8e59c00764faeda191dfd084e, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.567834 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: csi-cephfsplugin-provisioner-67bff7547f-b6md6, container: csi-provisioner, container id: c04860b544555b35a5f8fe955d91eb85403c32de155cc082800cedc1ef67c2dc, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.583181 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: csi-cephfsplugin-provisioner-67bff7547f-b6md6, container: csi-cephfsplugin, container id: df791cf3d2f8393c96f55af3910218e578c689f42f6bfc56cb25aded66046894, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.598351 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: csi-cephfsplugin-provisioner-67bff7547f-b6md6, container: liveness-prometheus, container id: 62c0f7e98cd6afa0bb4577188a840bd8d0fd8f5ff25956d8aa5134fd5132fb38, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.613563 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: ocs-operator-bvgqw, container: registry-server, container id: 72f75b212f5d14b6c8d7c255ce087faa05cd201a6556c073215c926154a28694, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.628764 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: virt-handler-w5slc, container: virt-handler, container id: 3d58b7e2e240c569dfd59116b2623a0e2562b6c953237274f84672fe9ae14268, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.645032 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: machine-config-daemon-btktx, container: machine-config-daemon, container id: 9ac5976e0319c61607dca9e3274fc554590c217da294df302cf1b3362a094f32, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.661151 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: machine-config-daemon-btktx, container: oauth-proxy, container id: 5db20c6c3b2e428427a9e57791105d9597ff99c06d95fa80b5712481744b55a4, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.676577 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: dns-default-wcxvm, container: dns, container id: 5b4d29eadc10c23d68d9f664bf9f8464d4bba417396c9a97c697a7ad83e94c56, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.691408 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: dns-default-wcxvm, container: kube-rbac-proxy, container id: 8936feb35a6682d4e5ddcafdee04f281d3a0ad304e1cd86f6886b793f2516ff6, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.706884 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: dns-default-wcxvm, container: dns-node-resolver, container id: 55e088c60c37ac0602c713c17c566c9fba0077e5b9e0c09cd914fe3d6bd58fdd, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.722852 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: ocs-operator-69d556ccb6-s777s, container: ocs-operator, container id: a864ecda66a0bb0b97f9546026f8c6028d4fcdcd52f96d4b8600dc892e4268c5, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.737401 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: noobaa-operator-77b6d6445c-t65l6, container: noobaa-operator, container id: 61cd959579299c58b0820e792a71eefe093ed4867858900aec6a8c841be47d30, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.741087 3275 handler.go:181] Unable to get Process Stats: couldn't open cpu cgroup procs file /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod4c3626876428bd9e1291590fe6f28e21.slice/crio-2c199aed83f27eeedb7da8c60fb32ca8ada55f1453bcdf66a1228f57e0c62940.scope/cgroup.procs : open /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod4c3626876428bd9e1291590fe6f28e21.slice/crio-2c199aed83f27eeedb7da8c60fb32ca8ada55f1453bcdf66a1228f57e0c62940.scope/cgroup.procs: no such file or directory Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.751851 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: rook-ceph-crashcollector-cnvqe-07.lab.eng.tlv2.redhat.com-hppzj, container: ceph-crash, container id: b624c2fb9f59d3841504a52106a5471308fe0e882cdc29a441110eefac1d550f, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.767803 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: mdns-publisher-cnvqe-07.lab.eng.tlv2.redhat.com, container: mdns-publisher, container id: d1807cb8cfa4057f973ab471177c7dc4f8a4ddd299b0043288f611ca37914a24, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.782778 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: virt-launcher-vm-container-disk-rdcks, container: compute, container id: 0e0fe148383277041fdd1933388713e83ca42c7f7c6a300128313e0a6ee3c86e, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.798077 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: virt-launcher-vm-container-disk-rdcks, container: volumecontainerdisk, container id: 94ec3ad58be1fd9c3f5f0f70774e56b162315ee05e1bfaba519da4f92c115cd0, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.812621 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: ovs-xmhwn, container: openvswitch, container id: a7ff48ad7b26fa87c0949fc47d27ec1ebe904da375db46e0d738eb401844dfdd, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: W1008 10:40:11.831760 3275 cpu_manager.go:397] [cpumanager] reconcileState: ignoring terminated container (pod: multus-q2sgs, container id: a2585c1864c6c5fbc0b2c68a94e3d7b6f4c9150b0ead3db67a5389e1c3f6b2c6) Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.831816 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: multus-q2sgs, container: kube-multus, container id: 5df4e4361a321cb7b7c87e4dc15319ef6110c0f0df8f803179a951cdc61e692b, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com kernel: CPU 2/KVM invoked oom-killer: gfp_mask=0x7000c0(GFP_KERNEL_ACCOUNT), nodemask=(null), order=0, oom_score_adj=-998 Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com kernel: CPU 2/KVM cpuset=crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope mems_allowed=0-1 Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com kernel: CPU: 2 PID: 1985360 Comm: CPU 2/KVM Not tainted 4.18.0-193.24.1.el8_2.dt1.x86_64 #1 Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.897728 3275 kubelet.go:1914] SyncLoop (housekeeping) Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:11.915913 3275 prober.go:159] Exec-Probe Pod: keepalived-cnvqe-07.lab.eng.tlv2.redhat.com, Container: keepalived, Command: [/bin/bash -c kill -s SIGUSR1 "$(pgrep -o keepalived)" && ! grep -q "State = FAULT" /tmp/keepalived.data Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: ] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Hardware name: Dell Inc. PowerEdge R440/04JN2K, BIOS 2.4.8 11/27/2019 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Call Trace: Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: dump_stack+0x5c/0x80 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: dump_header+0x6e/0x27a Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: oom_kill_process.cold.28+0xb/0x10 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: out_of_memory+0x1ba/0x490 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: mem_cgroup_out_of_memory+0x49/0x80 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: try_charge+0x6fa/0x780 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: __memcg_kmem_charge_memcg+0x33/0x90 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: __memcg_kmem_charge+0x80/0x190 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: __alloc_pages_nodemask+0x1eb/0x280 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: __get_free_pages+0xa/0x30 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: mmu_topup_memory_caches+0x8c/0xf0 [kvm] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: tdp_page_fault+0x69/0x2b0 [kvm] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: ? hrtimer_cancel+0x15/0x20 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: ? restart_apic_timer+0x9c/0x130 [kvm] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: kvm_mmu_page_fault+0x75/0x5d0 [kvm] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: ? vmx_set_msr+0x2f2/0xcd0 [kvm_intel] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: ? x86_virt_spec_ctrl+0x68/0xe0 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: vcpu_enter_guest+0x315/0x1680 [kvm] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: ? __apic_accept_irq+0x1a1/0x2e0 [kvm] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: kvm_arch_vcpu_ioctl_run+0xcf/0x540 [kvm] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: kvm_vcpu_ioctl+0x232/0x5f0 [kvm] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: ? __wake_up_common+0x7a/0x190 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: do_vfs_ioctl+0xa4/0x630 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: ksys_ioctl+0x60/0x90 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: __x64_sys_ioctl+0x16/0x20 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: do_syscall_64+0x5b/0x1a0 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: entry_SYSCALL_64_after_hwframe+0x65/0xca Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RIP: 0033:0x7f1a499a787b Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Code: 0f 1e fa 48 8b 05 0d 96 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 95 2c 00 f7 d8 64 89 01 48 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RSP: 002b:00007f1a327fb618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RAX: ffffffffffffffda RBX: 0000563f87375200 RCX: 00007f1a499a787b Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000027 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RBP: 0000000000000000 R08: 0000563f856b14f0 R09: 00007f1a24472890 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000001 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: R13: 0000563f8568d220 R14: 0000000000000000 R15: 00007f1a4eb44000 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Task in /kubepods.slice/kubepods-pod64516f15_013a_4c1d_8f03_9b0f1ac98ace.slice/crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope killed as a result of limit of /kubepods.slice/kubepods-pod64516f15_013a_4c1d_8f03_9b0f1ac98ace.slice/crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.046780 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: nmstate-handler-tsbgl, container: nmstate-handler, container id: c3338de77ed8ba0a40955fcd0fdfaba649b073451edae442efa385bd4d07d776, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: memory: usage 8579104kB, limit 8579144kB, failcnt 0 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: memory+swap: usage 8579104kB, limit 8579144kB, failcnt 665 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.079855 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: csi-rbdplugin-kcpbn, container: driver-registrar, container id: ff817195edfbec3867c0df1f9a379100648f55695cd6437132aa669b2836cb9b, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: kmem: usage 86108kB, limit 9007199254740988kB, failcnt 0 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.100622 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: csi-rbdplugin-kcpbn, container: csi-rbdplugin, container id: 72550366a822834701df2ea3167de02feeeed7c1bdc2ca7d9fa4cda8e9f7059d, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Memory cgroup stats for /kubepods.slice/kubepods-pod64516f15_013a_4c1d_8f03_9b0f1ac98ace.slice/crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope: cache:108KB rss:8492728KB rss_huge:8419328KB shmem:0KB mapped_file:0KB dirty:264KB writeback:132KB swap:0KB inactive_anon:0KB active_anon:8492928KB inactive_file:0KB active_file:0KB unevictable:0KB Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.136583 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: csi-rbdplugin-kcpbn, container: liveness-prometheus, container id: 14294bb7416c897001aa24d2423e2242a7d2f74cb2487cd2e2ecbe78f506028e, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1984292] 0 1984292 245291 12973 598016 0 -998 virt-launcher Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1984339] 0 1984339 337463 15075 651264 0 -998 virt-launcher Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.156332 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: coredns-cnvqe-07.lab.eng.tlv2.redhat.com, container: coredns, container id: b1453b5bac68a1fde489b22b969b31065ee711a442e1254df5450ea6f8990d98, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.158656 3275 handler.go:181] Unable to get Process Stats: couldn't open cpu cgroup procs file /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod004679a3_085a_4f6f_b08f_4a4c6da44a42.slice/crio-a8e507612e327d36f7a5ebc5b84f6860457261479af9f6be6be2e8bb03922ebc.scope/cgroup.procs : open /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod004679a3_085a_4f6f_b08f_4a4c6da44a42.slice/crio-a8e507612e327d36f7a5ebc5b84f6860457261479af9f6be6be2e8bb03922ebc.scope/cgroup.procs: no such file or directory Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1984363] 0 1984363 343158 6943 544768 0 -998 libvirtd Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1984365] 0 1984365 43479 3798 376832 0 -998 virtlogd Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.176353 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: coredns-cnvqe-07.lab.eng.tlv2.redhat.com, container: coredns-monitor, container id: 73dd8accf980daee53a78e94704242410ac16ad04f8d6a7384234394eaa955f4, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1985348] 107 1985348 3522840 2114432 18427904 0 -998 qemu-kvm Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.195926 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: bridge-marker-964s6, container: bridge-marker, container id: ae62e771f7f74749a1d0264937d4d590cae9494a5f170cac9b1d01cb3578a3e0, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Memory cgroup out of memory: Killed process 1984292 (virt-launcher) total-vm:981164kB, anon-rss:18012kB, file-rss:33880kB, shmem-rss:0kB, UID:0 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.201850 3275 handler.go:325] Added event &{/kubepods.slice/kubepods-pod64516f15_013a_4c1d_8f03_9b0f1ac98ace.slice/crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope 2020-10-08 10:40:09.228958979 +0000 UTC m=+256864.081089745 oom {<nil>}} Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.201893 3275 manager.go:1215] Created an OOM event in container "/kubepods.slice/kubepods-pod64516f15_013a_4c1d_8f03_9b0f1ac98ace.slice/crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope" at 2020-10-08 10:40:09.228958979 +0000 UTC m=+256864.081089745 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.201911 3275 handler.go:325] Added event &{/kubepods.slice/kubepods-pod64516f15_013a_4c1d_8f03_9b0f1ac98ace.slice/crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope 2020-10-08 10:40:09.228958979 +0000 UTC m=+256864.081089745 oomKill {0xc002e91860}} Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.210805 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: router-default-568dc569b-d2t9d, container: router, container id: 8f7f51de94c044721209b6173e8da29d713e0a436d6c15273a1a3bfda3b05ce1, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: oom_reaper: reaped process 1984292 (virt-launcher), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.225396 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-84d8f468xqc5, container: rgw, container id: 070ba8479a2ccbb89ad10fed07b60f22790f9dd07a640f763bf8a0c70ae4860a, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.242433 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: rook-ceph-operator-7857679f9f-gf5sr, container: rook-ceph-operator, container id: f86aa7c908f92664ecd0a0ce60f03bf4eba7cf89549ad8fa5b019f2782b1d750, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.246656 3275 exec.go:60] Exec probe response: "" Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.246672 3275 prober.go:126] Liveness probe for "keepalived-cnvqe-07.lab.eng.tlv2.redhat.com_openshift-kni-infra(05a6eab180a118d7691d3126b4a9049f):keepalived" succeeded Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.258137 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: sdn-cdqgf, container: sdn, container id: 8af1326acb147162ee28cb0544d3e2fc267a184292f133aa037d0fadf137452b, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.274528 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: sdn-cdqgf, container: kube-rbac-proxy, container id: 33a3f2d5469f250467f6d2c8d38f1293da8c58e10a00b00c4ba2068bea596c98, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.289553 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: node-ca-4jzhx, container: node-ca, container id: 11f6d6956d4f0aaf6b932c5f69920a626b34442ee19eccc217757fcb0c61db74, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.305485 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: ovs-cni-amd64-j7qmp, container: ovs-cni-marker, container id: e23761376bdbc124731dcb0e2a36ad9aecbd66445b61a010ecbc6eddc05f781b, cpuset: "0-1,3-31,33,35-63") Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com hyperkube[3275]: I1008 10:40:12.320968 3275 cpu_manager.go:415] [cpumanager] reconcileState: updating container (pod: noobaa-core-0, container: core, container id: f680e10e10d3f2d02f6916a59f1c8a67e38021680a28d95b4a6a9be5c890dad5, cpuset: "0-1,3-31,33,35-63") ~~~
[root@cnvqe-07 ~]# journalctl --since "15 minutes ago" | grep cnvqe-07.lab.eng.tlv2.redhat.com kernel: grep: kernel:: No such file or directory [root@cnvqe-07 ~]# journalctl --since "15 minutes ago" | grep "cnvqe-07.lab.eng.tlv2.redhat.com kernel:" Oct 08 10:35:54 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready Oct 08 10:35:54 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_UP): vethb5fd8ac6: link is not ready Oct 08 10:35:54 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethb5fd8ac6: link becomes ready Oct 08 10:35:54 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Oct 08 10:35:54 cnvqe-07.lab.eng.tlv2.redhat.com kernel: device vethb5fd8ac6 entered promiscuous mode Oct 08 10:36:20 cnvqe-07.lab.eng.tlv2.redhat.com kernel: device vethb5fd8ac6 left promiscuous mode Oct 08 10:37:49 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready Oct 08 10:37:49 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_UP): vethe45f748b: link is not ready Oct 08 10:37:49 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethe45f748b: link becomes ready Oct 08 10:37:49 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Oct 08 10:37:49 cnvqe-07.lab.eng.tlv2.redhat.com kernel: device vethe45f748b entered promiscuous mode Oct 08 10:38:22 cnvqe-07.lab.eng.tlv2.redhat.com kernel: device vethe45f748b left promiscuous mode Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com kernel: CPU 2/KVM invoked oom-killer: gfp_mask=0x7000c0(GFP_KERNEL_ACCOUNT), nodemask=(null), order=0, oom_score_adj=-998 Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com kernel: CPU 2/KVM cpuset=crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope mems_allowed=0-1 Oct 08 10:40:11 cnvqe-07.lab.eng.tlv2.redhat.com kernel: CPU: 2 PID: 1985360 Comm: CPU 2/KVM Not tainted 4.18.0-193.24.1.el8_2.dt1.x86_64 #1 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Hardware name: Dell Inc. PowerEdge R440/04JN2K, BIOS 2.4.8 11/27/2019 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Call Trace: Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: dump_stack+0x5c/0x80 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: dump_header+0x6e/0x27a Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: oom_kill_process.cold.28+0xb/0x10 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: out_of_memory+0x1ba/0x490 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: mem_cgroup_out_of_memory+0x49/0x80 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: try_charge+0x6fa/0x780 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: __memcg_kmem_charge_memcg+0x33/0x90 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: __memcg_kmem_charge+0x80/0x190 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: __alloc_pages_nodemask+0x1eb/0x280 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: __get_free_pages+0xa/0x30 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: mmu_topup_memory_caches+0x8c/0xf0 [kvm] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: tdp_page_fault+0x69/0x2b0 [kvm] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: ? hrtimer_cancel+0x15/0x20 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: ? restart_apic_timer+0x9c/0x130 [kvm] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: kvm_mmu_page_fault+0x75/0x5d0 [kvm] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: ? vmx_set_msr+0x2f2/0xcd0 [kvm_intel] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: ? x86_virt_spec_ctrl+0x68/0xe0 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: vcpu_enter_guest+0x315/0x1680 [kvm] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: ? __apic_accept_irq+0x1a1/0x2e0 [kvm] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: kvm_arch_vcpu_ioctl_run+0xcf/0x540 [kvm] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: kvm_vcpu_ioctl+0x232/0x5f0 [kvm] Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: ? __wake_up_common+0x7a/0x190 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: do_vfs_ioctl+0xa4/0x630 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: ksys_ioctl+0x60/0x90 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: __x64_sys_ioctl+0x16/0x20 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: do_syscall_64+0x5b/0x1a0 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: entry_SYSCALL_64_after_hwframe+0x65/0xca Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RIP: 0033:0x7f1a499a787b Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Code: 0f 1e fa 48 8b 05 0d 96 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 95 2c 00 f7 d8 64 89 01 48 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RSP: 002b:00007f1a327fb618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RAX: ffffffffffffffda RBX: 0000563f87375200 RCX: 00007f1a499a787b Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000027 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: RBP: 0000000000000000 R08: 0000563f856b14f0 R09: 00007f1a24472890 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000001 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: R13: 0000563f8568d220 R14: 0000000000000000 R15: 00007f1a4eb44000 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Task in /kubepods.slice/kubepods-pod64516f15_013a_4c1d_8f03_9b0f1ac98ace.slice/crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope killed as a result of limit of /kubepods.slice/kubepods-pod64516f15_013a_4c1d_8f03_9b0f1ac98ace.slice/crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: memory: usage 8579104kB, limit 8579144kB, failcnt 0 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: memory+swap: usage 8579104kB, limit 8579144kB, failcnt 665 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: kmem: usage 86108kB, limit 9007199254740988kB, failcnt 0 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Memory cgroup stats for /kubepods.slice/kubepods-pod64516f15_013a_4c1d_8f03_9b0f1ac98ace.slice/crio-8d9cdbdf36dc3a030a7aabdb1259b1a241e82cd71e6a84764ed34a3b7fcab40c.scope: cache:108KB rss:8492728KB rss_huge:8419328KB shmem:0KB mapped_file:0KB dirty:264KB writeback:132KB swap:0KB inactive_anon:0KB active_anon:8492928KB inactive_file:0KB active_file:0KB unevictable:0KB Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1984292] 0 1984292 245291 12973 598016 0 -998 virt-launcher Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1984339] 0 1984339 337463 15075 651264 0 -998 virt-launcher Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1984363] 0 1984363 343158 6943 544768 0 -998 libvirtd Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1984365] 0 1984365 43479 3798 376832 0 -998 virtlogd Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: [1985348] 107 1985348 3522840 2114432 18427904 0 -998 qemu-kvm Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: Memory cgroup out of memory: Killed process 1984292 (virt-launcher) total-vm:981164kB, anon-rss:18012kB, file-rss:33880kB, shmem-rss:0kB, UID:0 Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: oom_reaper: reaped process 1984292 (virt-launcher), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB Oct 08 10:40:12 cnvqe-07.lab.eng.tlv2.redhat.com kernel: k6t-eth0: port 2(tap0) entered disabled state Oct 08 10:40:17 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready Oct 08 10:40:17 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_UP): vetheff84337: link is not ready Oct 08 10:40:17 cnvqe-07.lab.eng.tlv2.redhat.com kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vetheff84337: link becomes ready
I'm going off on a tangent but .... The memory calculation issue also affected only the request calculation. While the output of `oc describe pod` is different and indeed calculates a different value for the requested memory, we see the same memory for limits as in my earlier tests: ~~~ resources: requests: cpu: "3" memory: 10240M limits: cpu: "3" memory: 10240M # overcommitGuestOverhead: true memory: guest: 10240M ~~~ ~~~ [root@cnvqe-01 ~]# oc apply -f akaris-vm.yaml virtualmachine.kubevirt.io/akaris-test-vm created [root@cnvqe-01 ~]# cat akaris-vm.yaml apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachine metadata: annotations: kubevirt.io/latest-observed-api-version: v1alpha3 kubevirt.io/storage-observed-api-version: v1alpha3 name.os.template.kubevirt.io/fedora31: Fedora 31 labels: app: akaris-test-vm flavor.template.kubevirt.io/Custom: "true" os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/template.namespace: openshift vm.kubevirt.io/template.revision: "1" vm.kubevirt.io/template.version: v0.9.1 workload.template.kubevirt.io/server: "true" name: akaris-test-vm spec: running: true template: metadata: labels: flavor.template.kubevirt.io/Custom: "true" kubevirt.io/domain: akaris-test-vm kubevirt.io/size: large os.template.kubevirt.io/fedora31: "true" vm.kubevirt.io/name: akaris-test-vm workload.template.kubevirt.io/server: "true" spec: domain: cpu: cores: 3 model: host-model devices: disks: - bootOrder: 1 disk: bus: virtio name: rootdisk - disk: bus: virtio name: cloudinitdisk interfaces: - masquerade: {} model: virtio name: nic0 networkInterfaceMultiqueue: true rng: {} features: acpi: enabled: true machine: type: q35 resources: requests: cpu: "3" memory: 10240M limits: cpu: "3" memory: 10240M # overcommitGuestOverhead: true memory: guest: 10240M evictionStrategy: LiveMigrate hostname: akaris-test-vm networks: - name: nic0 pod: {} terminationGracePeriodSeconds: 0 volumes: - containerDisk: image: docker.io/kubevirt/fedora-cloud-container-disk-demo name: rootdisk - cloudInitNoCloud: userData: | #cloud-config name: default hostname: akaris-test-vm ssh_authorized_keys: - >- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDEn6wI5YZbCbwUX8RlBYkkYGFK8vTXyeuOJWf6WrWcLi2rKfCoXWNxAw2ekLqIQRpc2xI5wCquWPeZ3aiHpqlocbFgLuGEBIqciJtGnI1IGOZtdccfEe8MRKJMjy3EnktGnHiFkxPXE7UYglujq1GN2BAzgGKigjOLu+k56tVWs+U96KtZz81P3sE4wye4rCm0R0lS4kaEX5bQ8ZAQG4KnWVuuW/PMm2g5n8FImjBz2t7kW3w4N9O4/yvUpgrsR75FXwdZ4gfBLxcKm2oqXHker31UFo4R5uuov5KO43OEOswGu/JmbcBm5T9ozyE/9RE+iuMrYIDWV9v8SgwT7Tpm24dKpcRWO968jDzaWPgGL8EN4y9uDYx7oIlYmwzJ5xFiDgQh+kwWi5p67TD2Iuxmxp5yOjv9ZKJKSk+wU8Z3407kB/AoempquSVXPU6jfZzAxFWlsQdJOd1ETmgcAjbF/WhFGR1LjJrzK6Nm3Gr2HDA2VAmOnLzyh7BBIHZSqLM= core.eng.tlv2.redhat.com name: cloudinitdisk ~~~ ~~~ [root@cnvqe-01 ~]# oc describe pod virt-launcher-akaris-test-vm-q8rf4 | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:|10240|overcommitGuestOverhead|10438331648|guest' Limits: memory: 40M Requests: memory: 40M Limits: memory: 40M Requests: memory: 40M Containers: compute: Limits: memory: 10438331648 Requests: memory: 10438331648 volumerootdisk: Limits: memory: 40M Requests: memory: 40M ~~~ Compare this to: https://bugzilla.redhat.com/show_bug.cgi?id=1885418#c8 where we saw: ~~~ compute: Limits: memory: 10438331648 Requests: memory: 10240M ~~~ 10240M should always translate to 10240M though, meaning 10240000000 Bytes. 10240Mi on the other hand should translate to 10737418240 Bytes. 10438331648 - 10240000000 = 198331648 (189 MiB for overhead) So it seems that the Request memory is not fixed. However, the actual difference in cgroup limits and in pod output is still there, so the upstream fix isn't working right (?). Or is this a kubernetes issue or am I being overly pedantic? ~~~ [root@cnvqe-07 ~]# cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit' hierarchical_memory_limit 10438328320 total_rss 839790592 total_rss_huge 761266176 ~~~ 10438328320 - 10438331648 [akaris@linux ~]$ echo "10438328320 - 10438331648" | bc -3328 The spawned VM memory is also still exactly the same amount as before: ~~~ [root@cnvqe-07 ~]# pgrep -f guest=test-bz-1885418_akaris-test-vm | while read pid ; do echo === $pid === ; ps aux | grep $pid | grep -o '\-m [0-9]*' ; pmap $pid | tail ; done === 2379976 === -m 9766 00007fc117288000 12K rw-s- [ anon ] 00007fc11728b000 4K rw-s- zero (deleted) 00007fc11728c000 4K r---- ld-2.28.so 00007fc11728d000 4K rw--- ld-2.28.so 00007fc11728e000 4K rw--- [ anon ] 00007fff3783a000 136K rw--- [ stack ] 00007fff37875000 12K r---- [ anon ] 00007fff37878000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 15318000K ~~~ 10240×1000×1000÷1024÷1024=~9766 That's indeed what was requested manually. For comparison, I spawned the same VM with: ~~~ resources: requests: cpu: "3" memory: 10240M limits: cpu: "3" memory: 10240M # overcommitGuestOverhead: true # memory: # guest: 10240M ~~~ So without explicit memory request - I get *exactly* the same values: ~~~ [root@cnvqe-07 ~]# pgrep -f guest=test-bz-1885418_akaris-test-vm | while read pid ; do echo === $pid === ; ps aux | grep $pid | grep -o '\-m [0-9]*' ; pmap $pid | tail ; done === 2491260 === -m 9766 00007fab74dfe000 12K rw-s- [ anon ] 00007fab74e01000 4K rw-s- zero (deleted) 00007fab74e02000 4K r---- ld-2.28.so 00007fab74e03000 4K rw--- ld-2.28.so 00007fab74e04000 4K rw--- [ anon ] 00007ffd71ca5000 136K rw--- [ stack ] 00007ffd71cd7000 12K r---- [ anon ] 00007ffd71cda000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 11297228K ~~~ ~~~ [root@cnvqe-07 ~]# cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit' hierarchical_memory_limit 10438328320 total_rss 836296704 total_rss_huge 769654784 ~~~ ~~~ [root@cnvqe-01 ~]# oc describe pod virt-launcher-akaris-test-vm-zkksm | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:|10240|overcommitGuestOverhead|10438331648|guest' Limits: memory: 40M Requests: memory: 40M Limits: memory: 40M Requests: memory: 40M Containers: compute: Limits: memory: 10438331648 Requests: memory: 10438331648 volumerootdisk: Limits: memory: 40M Requests: memory: 40M ~~~ I also spawned a VM with 8G for comparison: ~~~ [root@cnvqe-01 ~]# oc describe pod virt-launcher-akaris-test-vm-m2bg2 | egrep -i '^containers:|volumerootdisk:|compute:|limits:|requests:|memory:|10240|overcommitGuestOverhead|10438331648|guest' Limits: memory: 40M Requests: memory: 40M Limits: memory: 40M Requests: memory: 40M Containers: compute: Limits: memory: 8193956648 Requests: memory: 8193956648 volumerootdisk: Limits: memory: 40M Requests: memory: 40M ~~~ ~~~ [root@cnvqe-07 ~]# pgrep -f guest=test-bz-1885418_akaris-test-vm | while read pid ; do echo === $pid === ; ps aux | grep $pid | grep -o '\-m [0-9]*' ; pmap $pid | tail ; done === 2516957 === -m 7630 00007fcf336b4000 12K rw-s- [ anon ] 00007fcf336b7000 4K rw-s- zero (deleted) 00007fcf336b8000 4K r---- ld-2.28.so 00007fcf336b9000 4K rw--- ld-2.28.so 00007fcf336ba000 4K rw--- [ anon ] 00007ffc63046000 136K rw--- [ stack ] 00007ffc630a2000 12K r---- [ anon ] 00007ffc630a5000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 13285652K ~~~ ~~~ [root@cnvqe-07 ~]# cat /sys/fs/cgroup/memory$cgroup/memory.stat | egrep 'total_rss|memory_limit' hierarchical_memory_limit 8193953792 total_rss 775507968 total_rss_huge 708837376 ~~~ ~~~ [root@cnvqe-07 ~]# cat /sys/fs/cgroup/memory$cgroup/memory.stat cache 811008 rss 758087680 rss_huge 692060160 shmem 0 mapped_file 0 dirty 0 writeback 0 swap 0 pgpgin 37686 pgpgout 25613 pgfault 46101 pgmajfault 0 inactive_anon 0 active_anon 741584896 inactive_file 17473536 active_file 73728 unevictable 0 hierarchical_memory_limit 8193953792 hierarchical_memsw_limit 8193953792 total_cache 811008 total_rss 758087680 total_rss_huge 692060160 total_shmem 0 total_mapped_file 0 total_dirty 0 total_writeback 0 total_swap 0 total_pgpgin 37686 total_pgpgout 25613 total_pgfault 46101 total_pgmajfault 0 total_inactive_anon 0 total_active_anon 741584896 total_inactive_file 17473536 total_active_file 73728 total_unevictable 0 [root@cnvqe-07 ~]# ~~~ 8193956648 - 8193953792 = 2856 Again, there's an offset here. 8×1000×1000×1000−8193956648=−193956648 −193956648÷1024÷1024=−184.9715 MiB of buffer/overhead that we reserve for qemu
In my earlier comment, I meant to say: ~~~ So it seems that the Request memory is noW fixed. ~~~ I had written: ~~~ So it seems that the Request memory is not fixed. ~~~
Stu and I took closer into this issue. I'm pretty confident at this point that this issue right now is just about adjusting our memory overhead calculation. So far it was just a rough estimate. More research is needed for us to converge on the numbers. We've ran multiple combinations of stressed VMs, with and without memory limits. Looking at the cgroup memory.stat of the compute container we can see that the memory stays well within range on the limits, although there are spikes that lead to OOM. So far we've seen that OOM does occur when adding ~5Mi overhead. Obviously, there can be multiple factors that may cause qemu to consume more memory (i/o threads, emulator thread, vcpus, network...), however, so far we've seen that our previous estimate wasn't off by much. As a workaround, we can manually request a higher memory limit (resources.limits.memory) than the guest's memory. This will effectively increase the memory overhead. Here is an example which adds 10Mi overhead to the limits. Guest memory is 8Gi. spec: domain: cpu: cores: 16 memory: guest: 8Gi .... .... resources: requests: cpu: "16" memory: 8202Mi limits: cpu: "16" memory: 8202Mi
In my previous comment, I meant to say that OOM does NOT occur when adding ~5Mi overhead.
We've added 10Mi to our base overhead calculation, along with some other minor improvements, in this PR: https://github.com/kubevirt/kubevirt/pull/4409
Vladik, can this be moved to ON_QA?
(In reply to Adam Litke from comment #39) > Vladik, can this be moved to ON_QA? Yes. I think so.
Based on the recommendation we verify the if Requests & Limits are > Guest Memory by 100Mi we don't see the OOM. Costumer approved it.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0799
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days