Hide Forgot
Created attachment 1661546 [details] causes an OOMkill / eviction for memory Creating a memory hogger pod (that should be evicted / OOM killed) instead of being safely handled by the node causes the node to become unreachable for >10m. On the node, the kubelet appears to be running but can't heartbeat the apiserver. Also, the node appears to think that the apiserver deleted all the pods (DELETE("api") in logs) which is not correct - no pods except the oomkilled one should be evicted / deleted. Recreate 1. Create the attached kill-node.yaml on the cluster (oc create -f kill-node.yaml) 2. Wait 2-3 minutes while memory fills up on the worker Expected: 1. memory-hog pod is oomkilled and/or evicted (either would be acceptable) 2. the node remains ready Actual: 1. Node is tainted as unreachable, heartbeats stop, and it takes >10m for it to recover 2. After recovery, events are delivered As part of fixing this, we need to add an e2e tests to the origin disruptive suite that triggers this (and add eviction tests, because this doesn't seem to evict anything).
Once this is fixed we need to test against 4.3 and 4.2 and backport if it happens - this can DoS a node.
The OOMKiller on Z is so active in the cases we tested on z/VM clusters that it fills the 4GB z/VM console message spool with OOMkiller messages, which requires manual intervention at the z/VM x3270 console to conduct a sometimes required complete re-installation. It might be worth testing https://bugzilla.redhat.com/show_bug.cgi?id=1800319 with long-running forked multiprocess memory hogs such as `stress-ng --mmap 64 &`. On Z, this memory hog not only completely disables the node and the node's monitoring cluster operator and dns/routing operator, but does so over the better part of a day, only falling into complete catatonia late at night. It appears that the eviction starts, but the OOMKiller is killing so many things at such a rate, that it appears to even kill the service conducting the eviction process. We're trying to convince the owners of OCP 4.3 test clusters on Power architecture at IBM to try these workloads, too.
Going to reopen while we engage the kernel team, we believe this is a fundamental issue with OOM kill and would manifest regardless of our default reservation if the kubelet had enough usage.
Kernel issue is Red Hathttps://bugzilla.redhat.com/show_bug.cgi?id=1803217
*** Bug 1803239 has been marked as a duplicate of this bug. ***
on version :4.4.0-0.nightly-2020-02-19-213909 , this bug is reproduced Time starts from creating memory-hog pod, 3 minutes later, the node becomes Unknown status and is tainted unreachable; 8 minutes later, the pod becomes Terminating status; 15 minutes later, the pod disappeared (oomkilled) and the node becomes Ready status. So there are at least 12 minutes in which the node keeps Unknown, and heartbeat stoped. after node recover, the events are delivered [lyman@localhost env]$ oc describe node ip-10-0-153-107.us-east-2.compute.internal Name: ip-10-0-153-107.us-east-2.compute.internal Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=m4.large beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=us-east-2 failure-domain.beta.kubernetes.io/zone=us-east-2b kubernetes.io/arch=amd64 kubernetes.io/hostname=ip-10-0-153-107 kubernetes.io/os=linux node-role.kubernetes.io/worker= node.kubernetes.io/instance-type=m4.large node.openshift.io/os_id=rhcos topology.kubernetes.io/region=us-east-2 topology.kubernetes.io/zone=us-east-2b Annotations: machine.openshift.io/machine: openshift-machine-api/minmli-0220-zgsmn-worker-us-east-2b-zwwcf machineconfiguration.openshift.io/currentConfig: rendered-worker-7438f0d51b46b0f81add0bf8ec2fbe1a machineconfiguration.openshift.io/desiredConfig: rendered-worker-7438f0d51b46b0f81add0bf8ec2fbe1a machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 20 Feb 2020 11:19:31 +0800 Taints: node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unreachable:NoSchedule Unschedulable: false Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure Unknown Fri, 21 Feb 2020 14:40:41 +0800 Fri, 21 Feb 2020 14:43:31 +0800 NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown Fri, 21 Feb 2020 14:40:41 +0800 Fri, 21 Feb 2020 14:43:31 +0800 NodeStatusUnknown Kubelet stopped posting node status. PIDPressure Unknown Fri, 21 Feb 2020 14:40:41 +0800 Fri, 21 Feb 2020 14:43:31 +0800 NodeStatusUnknown Kubelet stopped posting node status. Ready Unknown Fri, 21 Feb 2020 14:40:41 +0800 Fri, 21 Feb 2020 14:43:31 +0800 NodeStatusUnknown Kubelet stopped posting node status. Addresses: InternalIP: 10.0.153.107 Hostname: ip-10-0-153-107.us-east-2.compute.internal InternalDNS: ip-10-0-153-107.us-east-2.compute.internal Capacity: attachable-volumes-aws-ebs: 39 cpu: 2 ephemeral-storage: 125277164Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8161840Ki pods: 250 Allocatable: attachable-volumes-aws-ebs: 39 cpu: 1500m ephemeral-storage: 114381692328 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 7010864Ki pods: 250 System Info: Machine ID: 8efca36c2ad941a7a0e84909f882c4ed System UUID: ec2d4765-d316-147c-37a8-a4fc04bd9239 Boot ID: 029d4d02-8598-4bcd-aaa0-f23bb12e88c5 Kernel Version: 4.18.0-147.5.1.el8_1.x86_64 OS Image: Red Hat Enterprise Linux CoreOS 44.81.202002191330-0 (Ootpa) Operating System: linux Architecture: amd64 Container Runtime Version: cri-o://1.17.0-4.dev.rhaos4.4.gitc3436cc.el8 Kubelet Version: v1.17.1 Kube-Proxy Version: v1.17.1 ProviderID: aws:///us-east-2b/i-042d34751c5d914e5 Non-terminated Pods: (14 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- minmli memory-hog-pod 0 (0%) 0 (0%) 0 (0%) 0 (0%) 13m openshift-cluster-node-tuning-operator tuned-b7jdj 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 27h openshift-dns dns-default-b7pxh 110m (7%) 0 (0%) 70Mi (1%) 512Mi (7%) 27h openshift-image-registry node-ca-bvcfz 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 27h openshift-ingress router-default-5cd6c75986-vchdp 100m (6%) 0 (0%) 256Mi (3%) 0 (0%) 27h openshift-machine-config-operator machine-config-daemon-mlp2c 40m (2%) 0 (0%) 100Mi (1%) 0 (0%) 27h openshift-monitoring alertmanager-main-0 110m (7%) 100m (6%) 245Mi (3%) 25Mi (0%) 27h openshift-monitoring grafana-755b7df4f9-tph2m 110m (7%) 0 (0%) 120Mi (1%) 0 (0%) 27h openshift-monitoring node-exporter-khpq5 112m (7%) 0 (0%) 200Mi (2%) 0 (0%) 27h openshift-monitoring prometheus-adapter-d64c8db56-c2ww8 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 3h49m openshift-monitoring prometheus-k8s-1 480m (32%) 200m (13%) 1234Mi (18%) 50Mi (0%) 27h openshift-multus multus-bfhxb 10m (0%) 0 (0%) 150Mi (2%) 0 (0%) 27h openshift-sdn ovs-m8zhh 200m (13%) 0 (0%) 400Mi (5%) 0 (0%) 27h openshift-sdn sdn-pxlfz 100m (6%) 0 (0%) 200Mi (2%) 0 (0%) 27h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 1402m (93%) 300m (20%) memory 3055Mi (44%) 587Mi (8%) ephemeral-storage 0 (0%) 0 (0%) attachable-volumes-aws-ebs 0 0 Events: <none> After node recover: [lyman@localhost env]$ oc describe node ip-10-0-153-107.us-east-2.compute.internal Name: ip-10-0-153-107.us-east-2.compute.internal Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=m4.large beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=us-east-2 failure-domain.beta.kubernetes.io/zone=us-east-2b kubernetes.io/arch=amd64 kubernetes.io/hostname=ip-10-0-153-107 kubernetes.io/os=linux node-role.kubernetes.io/worker= node.kubernetes.io/instance-type=m4.large node.openshift.io/os_id=rhcos topology.kubernetes.io/region=us-east-2 topology.kubernetes.io/zone=us-east-2b Annotations: machine.openshift.io/machine: openshift-machine-api/minmli-0220-zgsmn-worker-us-east-2b-zwwcf machineconfiguration.openshift.io/currentConfig: rendered-worker-7438f0d51b46b0f81add0bf8ec2fbe1a machineconfiguration.openshift.io/desiredConfig: rendered-worker-7438f0d51b46b0f81add0bf8ec2fbe1a machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 20 Feb 2020 11:19:31 +0800 Taints: <none> Unschedulable: false Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Fri, 21 Feb 2020 14:56:47 +0800 Fri, 21 Feb 2020 14:56:37 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Fri, 21 Feb 2020 14:56:47 +0800 Fri, 21 Feb 2020 14:56:37 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Fri, 21 Feb 2020 14:56:47 +0800 Fri, 21 Feb 2020 14:56:37 +0800 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Fri, 21 Feb 2020 14:56:47 +0800 Fri, 21 Feb 2020 14:56:47 +0800 KubeletReady kubelet is posting ready status Addresses: InternalIP: 10.0.153.107 Hostname: ip-10-0-153-107.us-east-2.compute.internal InternalDNS: ip-10-0-153-107.us-east-2.compute.internal Capacity: attachable-volumes-aws-ebs: 39 cpu: 2 ephemeral-storage: 125277164Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8161840Ki pods: 250 Allocatable: attachable-volumes-aws-ebs: 39 cpu: 1500m ephemeral-storage: 114381692328 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 7010864Ki pods: 250 System Info: Machine ID: 8efca36c2ad941a7a0e84909f882c4ed System UUID: ec2d4765-d316-147c-37a8-a4fc04bd9239 Boot ID: 029d4d02-8598-4bcd-aaa0-f23bb12e88c5 Kernel Version: 4.18.0-147.5.1.el8_1.x86_64 OS Image: Red Hat Enterprise Linux CoreOS 44.81.202002191330-0 (Ootpa) Operating System: linux Architecture: amd64 Container Runtime Version: cri-o://1.17.0-4.dev.rhaos4.4.gitc3436cc.el8 Kubelet Version: v1.17.1 Kube-Proxy Version: v1.17.1 ProviderID: aws:///us-east-2b/i-042d34751c5d914e5 Non-terminated Pods: (10 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- openshift-cluster-node-tuning-operator tuned-b7jdj 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 27h openshift-dns dns-default-b7pxh 110m (7%) 0 (0%) 70Mi (1%) 512Mi (7%) 27h openshift-image-registry node-ca-bvcfz 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 27h openshift-machine-config-operator machine-config-daemon-mlp2c 40m (2%) 0 (0%) 100Mi (1%) 0 (0%) 27h openshift-monitoring alertmanager-main-0 110m (7%) 100m (6%) 245Mi (3%) 25Mi (0%) 2m21s openshift-monitoring node-exporter-khpq5 112m (7%) 0 (0%) 200Mi (2%) 0 (0%) 27h openshift-monitoring prometheus-k8s-1 480m (32%) 200m (13%) 1234Mi (18%) 50Mi (0%) 2m21s openshift-multus multus-bfhxb 10m (0%) 0 (0%) 150Mi (2%) 0 (0%) 27h openshift-sdn ovs-m8zhh 200m (13%) 0 (0%) 400Mi (5%) 0 (0%) 27h openshift-sdn sdn-pxlfz 100m (6%) 0 (0%) 200Mi (2%) 0 (0%) 27h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 1182m (78%) 300m (20%) memory 2659Mi (38%) 587Mi (8%) ephemeral-storage 0 (0%) 0 (0%) attachable-volumes-aws-ebs 0 0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning SystemOOM 2m44s kubelet, ip-10-0-153-107.us-east-2.compute.internal System OOM encountered, victim process: stress, pid: 3170956 Normal NodeHasSufficientMemory 2m43s (x9 over 27h) kubelet, ip-10-0-153-107.us-east-2.compute.internal Node ip-10-0-153-107.us-east-2.compute.internal status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 2m43s (x9 over 27h) kubelet, ip-10-0-153-107.us-east-2.compute.internal Node ip-10-0-153-107.us-east-2.compute.internal status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 2m43s (x9 over 27h) kubelet, ip-10-0-153-107.us-east-2.compute.internal Node ip-10-0-153-107.us-east-2.compute.internal status is now: NodeHasSufficientPID Normal NodeNotReady 2m43s kubelet, ip-10-0-153-107.us-east-2.compute.internal Node ip-10-0-153-107.us-east-2.compute.internal status is now: NodeNotReady Normal NodeReady 2m33s (x2 over 27h) kubelet, ip-10-0-153-107.us-east-2.compute.internal Node ip-10-0-153-107.us-east-2.compute.internal status is now: NodeReady
*** Bug 1801771 has been marked as a duplicate of this bug. ***
QE: This patch is in 4.5. I'm not sure of a great way of testing because the kubelet gets injected into RHCOS.
*** Bug 1802944 has been marked as a duplicate of this bug. ***
*** Red HatBug 1766237 has been marked as a duplicate of this bug. ***
*** Bug 1767284 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581