Bug 1857446 - ARO/Azure: excessive pod memory allocation causes node lockup [NEEDINFO]
Summary: ARO/Azure: excessive pod memory allocation causes node lockup
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: Harshal Patil
QA Contact: Sunil Choudhary
URL:
Whiteboard:
: 1873816 1877059 1889734 1890684 1892909 1904051 1910086 1910801 1915023 (view as bug list)
Depends On: 1896327
Blocks: 1908661 1860031 1873114 1882116 1904051 1909062
TreeView+ depends on / blocked
 
Reported: 2020-07-15 21:04 UTC by Jim Minter
Modified: 2021-08-30 12:27 UTC (History)
45 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1860031 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:13:58 UTC
Target Upstream Version:
miabbott: needinfo? (decarr)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5853471 0 None None None 2021-04-19 16:01:25 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:15:02 UTC

Comment 4 Mike Gahagan 2020-07-16 18:18:08 UTC
>1. Is this issue specific to ARO or there is similar test run on other cloud providers that runs successfully?
   I have reproduced the same behavior on OCP 4.3.27 in Azure (not ARO), I was not able to reproduce it on AWS although the AWS cluster I used had 3 rather than 2 worker nodes. Worker nodes in both cloud providers have 8GiB of RAM.
>2.  The container is growing array in a tight-loop, is this a realistic scenario? I don't know how quickly Operating system on ARO can detect and kill a process for OOM. My concern is, depending on capacity and other load on node, this script may leave very little CPU for other activities.
In my OCP on Azure cluster I'm able to see oom-kills kill the python process but I also saw a number of hung task warnings from the kernel as well. I believe this is impacting ARO customers running more realistic workloads but Jim can speak to that better than I can.

Comment 5 Mike Gahagan 2020-07-16 18:50:54 UTC
For both my test clusters:
system-reserved:
  cpu: 500m
  memory: 1Gi
  ephemeral-storage: 1Gi

I don't see kube-reserved defined anywhere on either cluster but I'll keep looking.

Comment 6 Mike Gahagan 2020-07-16 19:34:14 UTC
One difference I'm seeing on the Azure cluster during the oom-kill events is several rcu_sched stall warnings which might point to why the nodes are going NotReady (possibly due to i/o performance differences)

[ 7315.479017] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 7315.541774] Memory cgroup stats for 
[ 7315.541655] rcu:     0-....: (144 ticks this GP) idle=7f2/1/0x4000000000000000 softirq=1385795/1385810 fqs=14822 
[ 7315.541655] rcu:     (detected by 1, t=60002 jiffies, g=2694213, q=38010)
[ 7315.541655] Sending NMI from CPU 1 to CPUs 0:
[ 7315.542014] NMI backtrace for cpu 0
[ 7315.542014] CPU: 0 PID: 2195 Comm: ovs-vswitchd Not tainted 4.18.0-147.13.2.el8_1.x86_64 #1
[ 7315.542014] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
[ 7315.542014] RIP: 0010:cpuacct_account_field+0x27/0x50
[ 7315.542014] Code: 00 00 00 0f 1f 44 00 00 48 8b 87 08 0d 00 00 48 8b 48 10 48 81 f9 20 93 25 91 74 2a 48 63 f6 48 c1 e6 03 48 8b 81 38 01 00 00 <65>48 03 05 61 97 f0 6f 48 01 14 30 48 8b 89 28 01 00 00 48 
81 f9
[ 7315.542014] RSP: 0018:ffff961cf7a03ea8 EFLAGS: 00000083
[ 7315.542014] RAX: 0000326e88221278 RBX: ffff961cb699af80 RCX: ffff961cf614e600
[ 7315.542014] RDX: 000000000003b9bd RSI: 0000000000000010 RDI: ffff961cb699af80
[ 7315.542014] RBP: 000000000003b9bd R08: 0000000000000002 R09: 011d94d851f61f2c
[ 7315.542014] R10: 00000f47e7c5b1a8 R11: 0000000000000000 R12: 0000000000000002
[ 7315.542014] R13: ffff961cf7a1cf80 R14: ffffffff90146710 R15: ffff961cf7a1d0b8
[ 7315.542014] FS:  00007f197bbb7d00(0000) GS:ffff961cf7a00000(0000) knlGS:0000000000000000
[ 7315.542014] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7315.542014] CR2: 00000000004bad80 CR3: 0000000276e0c003 CR4: 00000000003606f0
[ 7315.542014] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7315.542014] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7315.542014] Call Trace:
[ 7315.542014]  <IRQ>
[ 7315.542014]  account_system_index_time+0x63/0x90
[ 7315.542014]  update_process_times+0x1c/0x60
[ 7315.542014]  tick_sched_handle+0x22/0x60
[ 7315.542014]  tick_sched_timer+0x37/0x70
[ 7315.542014]  __hrtimer_run_queues+0x100/0x280
[ 7315.542014]  hrtimer_interrupt+0x100/0x220
[ 7315.542014]  ? sched_clock+0x5/0x10
[ 7315.542014]  hv_stimer0_isr+0x20/0x30 [hv_vmbus]
[ 7315.542014]  hv_stimer0_vector_handler+0x3b/0x70
[ 7315.542014]  hv_stimer0_callback_vector+0xf/0x20
[ 7315.542014]  </IRQ>
[ 7315.542014] RIP: 0010:vprintk_emit+0x3a4/0x450
[ 7315.542014] Code: 90 84 d2 74 6d 0f b6 15 da 38 92 01 48 c7 c0 20 a7 a3 91 84 d2 74 09 f3 90 0f b6 10 84 d2 75 f7 e8 31 0b 00 00 48 89 df 57 9d <0f>1f 44 00 00 e8 f2 e3 ff ff e9 28 fe ff ff 80 3d aa e9 2e 
01 00

Comment 8 Mike Gahagan 2020-07-17 14:53:11 UTC
This is also happening on 4.5.2 on Azure:

mgahagan-cr9wt-worker-northcentralus-c57nw   NotReady   worker   99m    v1.18.3+b74c5ed   10.0.32.4     <none>        Red Hat Enterprise Linux CoreOS 45.82.202007141718-0 (Ootpa)   4.18.0-193.13.2.el8_2.x86_64   cri-o://1.18.2-18.rhaos4.5.git754d46b.el8

Comment 9 Mike Gahagan 2020-07-17 16:11:42 UTC
Also tried a possible workaround of setting system-reserved.memory to 1250Gi on my 4.5.2 cluster and it didn't help on 4.5.2.

Comment 17 Harshal Patil 2020-07-21 13:54:01 UTC
Moving to RHCOS to take a closer look at the dmesg output.

Comment 19 Colin Walters 2020-07-21 15:56:04 UTC
OOM is an OpenShift wide topic that impacts multiple teams, among them RHCOS and Node.  At the present time you must configure pod limits:
https://docs.openshift.com/container-platform/4.5/nodes/clusters/nodes-cluster-resource-configure.html

Some clusters may want a mutating admission webhook to enforce this.

As I understand things, if you're not applying limits, then none of the system reserved bits come into effect.

If this bug stays against RHCOS, since it's not actionable we'll close it.

Comment 47 Seth Jennings 2020-09-21 19:51:02 UTC
xref bug for adding alert when we go over the system memory reservation
https://bugzilla.redhat.com/show_bug.cgi?id=1881208

Comment 49 Clayton Coleman 2020-10-15 18:15:52 UTC
Summarizing because someone was confused:

No amount of manual tuning can prevent this problem from happening in all cases. Identifying and eliminating the underlying hang is the key outcome now that we have an alert to identify the minimal and acceptable short-term workaround.

Comment 53 Ryan Phillips 2020-11-03 19:11:36 UTC
*** Bug 1877059 has been marked as a duplicate of this bug. ***

Comment 55 Ryan Phillips 2020-11-10 15:09:01 UTC
*** Bug 1889734 has been marked as a duplicate of this bug. ***

Comment 56 Ryan Phillips 2020-11-10 15:21:42 UTC
*** Bug 1890684 has been marked as a duplicate of this bug. ***

Comment 67 Ryan Phillips 2021-01-04 23:09:51 UTC
*** Bug 1910086 has been marked as a duplicate of this bug. ***

Comment 68 Harshal Patil 2021-01-05 03:01:34 UTC
*** Bug 1892909 has been marked as a duplicate of this bug. ***

Comment 69 Sunil Choudhary 2021-01-05 10:18:25 UTC
Verified on 4.7.0-0.nightly-2021-01-04-215816.

Tested this on 2 nodes. Cordoned the node and created RC and I could see pod being evicted due to System OOM and then node trying to reclaim memory without going into NotReady state.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-01-04-215816   True        False         45m     Cluster version is 4.7.0-0.nightly-2021-01-04-215816


$ oc get nodes -o wide
NAME                                         STATUS                     ROLES    AGE   VERSION           INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
ip-10-0-139-84.us-east-2.compute.internal    Ready                      master   68m   v1.20.0+87544c5   10.0.139.84    <none>        Red Hat Enterprise Linux CoreOS 47.83.202101041743-0 (Ootpa)   4.18.0-240.10.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-146-230.us-east-2.compute.internal   Ready                      worker   63m   v1.20.0+87544c5   10.0.146.230   <none>        Red Hat Enterprise Linux CoreOS 47.83.202101041743-0 (Ootpa)   4.18.0-240.10.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-164-104.us-east-2.compute.internal   Ready,SchedulingDisabled   worker   62m   v1.20.0+87544c5   10.0.164.104   <none>        Red Hat Enterprise Linux CoreOS 47.83.202101041743-0 (Ootpa)   4.18.0-240.10.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-169-245.us-east-2.compute.internal   Ready                      master   68m   v1.20.0+87544c5   10.0.169.245   <none>        Red Hat Enterprise Linux CoreOS 47.83.202101041743-0 (Ootpa)   4.18.0-240.10.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-192-137.us-east-2.compute.internal   Ready                      master   68m   v1.20.0+87544c5   10.0.192.137   <none>        Red Hat Enterprise Linux CoreOS 47.83.202101041743-0 (Ootpa)   4.18.0-240.10.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-218-188.us-east-2.compute.internal   Ready,SchedulingDisabled   worker   62m   v1.20.0+87544c5   10.0.218.188   <none>        Red Hat Enterprise Linux CoreOS 47.83.202101041743-0 (Ootpa)   4.18.0-240.10.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39


$ oc create -f rc.yaml 
replicationcontroller/badmem created

$ oc get rc
NAME     DESIRED   CURRENT   READY   AGE
badmem   1         1         0       3s

$ oc get pods
NAME           READY   STATUS              RESTARTS   AGE
badmem-tjhjd   0/1     ContainerCreating   0          6s

$ oc get pods -o wide
NAME           READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
badmem-tjhjd   1/1     Running   0          20s   10.131.0.42   ip-10-0-146-230.us-east-2.compute.internal   <none>           <none>

$ oc get pods -o wide
NAME           READY   STATUS             RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
badmem-fk9x5   0/1     CrashLoopBackOff   7          22m   10.131.0.51   ip-10-0-146-230.us-east-2.compute.internal   <none>           <none>
badmem-tjhjd   0/1     Evicted            0          23m   <none>        ip-10-0-146-230.us-east-2.compute.internal   <none>           <none>

$ oc describe pod badmem-fk9x5
Name:         badmem-fk9x5
Namespace:    app
Priority:     0
Node:         ip-10-0-146-230.us-east-2.compute.internal/10.0.146.230
Start Time:   Tue, 05 Jan 2021 13:09:05 +0530
Labels:       app=badmem
Annotations:  k8s.v1.cni.cncf.io/network-status:
                [{
                    "name": "",
                    "interface": "eth0",
                    "ips": [
                        "10.131.0.51"
                    ],
                    "default": true,
                    "dns": {}
                }]
              k8s.v1.cni.cncf.io/networks-status:
                [{
                    "name": "",
                    "interface": "eth0",
                    "ips": [
                        "10.131.0.51"
                    ],
                    "default": true,
                    "dns": {}
                }]
              openshift.io/scc: restricted
Status:       Running
IP:           10.131.0.51
IPs:
  IP:           10.131.0.51
Controlled By:  ReplicationController/badmem
Containers:
  badmem:
    Container ID:  cri-o://6474e9157e9ee59730590413eebbcf2316fae85d3de6237ebd5221f54e77bd33
    Image:         registry.redhat.io/rhel7:latest
    Image ID:      registry.redhat.io/rhel7@sha256:110e61d28c1bfa1aad79e0413b98a70679a070baafb70e122fda4d105651599e
    Port:          <none>
    Host Port:     <none>
    Args:
      python
      -c
      x = []
      while True:
        x.append("x" * 1048576)
      
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Tue, 05 Jan 2021 13:21:26 +0530
      Finished:     Tue, 05 Jan 2021 13:21:34 +0530
    Ready:          False
    Restart Count:  7
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-gz2d7 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-gz2d7:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-gz2d7
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                   From                                                 Message
  ----     ------            ----                  ----                                                 -------
  Warning  FailedScheduling  <unknown>                                                                  0/6 nodes are available: 1 node(s) had taint {node.kubernetes.io/memory-pressure: }, that the pod didn't tolerate, 2 node(s) were unschedulable, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>                                                                  0/6 nodes are available: 1 node(s) had taint {node.kubernetes.io/memory-pressure: }, that the pod didn't tolerate, 2 node(s) were unschedulable, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
  Normal   Scheduled         <unknown>                                                                  Successfully assigned app/badmem-fk9x5 to ip-10-0-146-230.us-east-2.compute.internal
  Normal   AddedInterface    17m                   multus                                               Add eth0 [10.131.0.51/23]
  Normal   Pulled            17m                   kubelet, ip-10-0-146-230.us-east-2.compute.internal  Successfully pulled image "registry.redhat.io/rhel7:latest" in 889.354606ms
  Normal   Pulled            17m                   kubelet, ip-10-0-146-230.us-east-2.compute.internal  Successfully pulled image "registry.redhat.io/rhel7:latest" in 935.541782ms
  Normal   Pulled            16m                   kubelet, ip-10-0-146-230.us-east-2.compute.internal  Successfully pulled image "registry.redhat.io/rhel7:latest" in 878.485939ms
  Normal   Created           16m (x4 over 17m)     kubelet, ip-10-0-146-230.us-east-2.compute.internal  Created container badmem
  Normal   Started           16m (x4 over 17m)     kubelet, ip-10-0-146-230.us-east-2.compute.internal  Started container badmem
  Normal   Pulled            16m                   kubelet, ip-10-0-146-230.us-east-2.compute.internal  Successfully pulled image "registry.redhat.io/rhel7:latest" in 819.24038ms
  Normal   Pulling           15m (x5 over 17m)     kubelet, ip-10-0-146-230.us-east-2.compute.internal  Pulling image "registry.redhat.io/rhel7:latest"
  Normal   Pulled            15m                   kubelet, ip-10-0-146-230.us-east-2.compute.internal  Successfully pulled image "registry.redhat.io/rhel7:latest" in 1.221594963s
  Warning  BackOff           2m25s (x61 over 17m)  kubelet, ip-10-0-146-230.us-east-2.compute.internal  Back-off restarting failed container


$ oc get nodes
NAME                                         STATUS                     ROLES    AGE   VERSION
ip-10-0-139-84.us-east-2.compute.internal    Ready                      master   94m   v1.20.0+87544c5
ip-10-0-146-230.us-east-2.compute.internal   Ready                      worker   88m   v1.20.0+87544c5
ip-10-0-164-104.us-east-2.compute.internal   Ready,SchedulingDisabled   worker   88m   v1.20.0+87544c5
ip-10-0-169-245.us-east-2.compute.internal   Ready                      master   94m   v1.20.0+87544c5
ip-10-0-192-137.us-east-2.compute.internal   Ready                      master   94m   v1.20.0+87544c5
ip-10-0-218-188.us-east-2.compute.internal   Ready,SchedulingDisabled   worker   88m   v1.20.0+87544c5


$ oc describe node ip-10-0-146-230.us-east-2.compute.internal
Name:               ip-10-0-146-230.us-east-2.compute.internal
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m5.large
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-east-2
                    failure-domain.beta.kubernetes.io/zone=us-east-2a
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-0-146-230
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=
                    node.kubernetes.io/instance-type=m5.large
                    node.openshift.io/os_id=rhcos
                    topology.ebs.csi.aws.com/zone=us-east-2a
                    topology.kubernetes.io/region=us-east-2
                    topology.kubernetes.io/zone=us-east-2a
Annotations:        csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-0b76f16895b9950f4"}
                    machine.openshift.io/machine: openshift-machine-api/sunilc0501-q8g7n-worker-us-east-2a-s8sh9
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-3b5bd44448e8d9aa6de4000b0f64c1d7
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-3b5bd44448e8d9aa6de4000b0f64c1d7
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Done
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Tue, 05 Jan 2021 11:58:58 +0530
Taints:             node.kubernetes.io/memory-pressure:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  ip-10-0-146-230.us-east-2.compute.internal
  AcquireTime:     <unset>
  RenewTime:       Tue, 05 Jan 2021 13:28:00 +0530
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                         Message
  ----             ------  -----------------                 ------------------                ------                         -------
  MemoryPressure   True    Tue, 05 Jan 2021 13:26:56 +0530   Tue, 05 Jan 2021 13:26:56 +0530   KubeletHasInsufficientMemory   kubelet has insufficient memory available
  DiskPressure     False   Tue, 05 Jan 2021 13:26:56 +0530   Tue, 05 Jan 2021 11:58:58 +0530   KubeletHasNoDiskPressure       kubelet has no disk pressure
  PIDPressure      False   Tue, 05 Jan 2021 13:26:56 +0530   Tue, 05 Jan 2021 11:58:58 +0530   KubeletHasSufficientPID        kubelet has sufficient PID available
  Ready            True    Tue, 05 Jan 2021 13:26:56 +0530   Tue, 05 Jan 2021 11:59:48 +0530   KubeletReady                   kubelet is posting ready status
Addresses:
  InternalIP:   10.0.146.230
  Hostname:     ip-10-0-146-230.us-east-2.compute.internal
  InternalDNS:  ip-10-0-146-230.us-east-2.compute.internal
Capacity:
  attachable-volumes-aws-ebs:  25
  cpu:                         2
  ephemeral-storage:           125293548Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      7934684Ki
  pods:                        250
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         1500m
  ephemeral-storage:           114396791822
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      6783708Ki
  pods:                        250
System Info:
  Machine ID:                              ec29e18d242aa4cd9260b6285abe896e
  System UUID:                             ec29e18d-242a-a4cd-9260-b6285abe896e
  Boot ID:                                 793835d0-a758-4fba-9c1f-9a82685497f1
  Kernel Version:                          4.18.0-240.10.1.el8_3.x86_64
  OS Image:                                Red Hat Enterprise Linux CoreOS 47.83.202101041743-0 (Ootpa)
  Operating System:                        linux
  Architecture:                            amd64
  Container Runtime Version:               cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
  Kubelet Version:                         v1.20.0+87544c5
  Kube-Proxy Version:                      v1.20.0+87544c5
ProviderID:                                aws:///us-east-2a/i-0b76f16895b9950f4
Non-terminated Pods:                       (28 in total)
  Namespace                                Name                                       CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                                ----                                       ------------  ----------  ---------------  -------------  ---
  app                                      badmem-fk9x5                               0 (0%)        0 (0%)      0 (0%)           0 (0%)         24m
  openshift-cluster-csi-drivers            aws-ebs-csi-driver-node-8cvqs              30m (2%)      0 (0%)      150Mi (2%)       0 (0%)         84m
  openshift-cluster-node-tuning-operator   tuned-vchjb                                10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         84m
  openshift-console                        downloads-6d7bb8f56d-zw8fl                 10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         68m
  openshift-dns                            dns-default-ts8jw                          65m (4%)      0 (0%)      110Mi (1%)       0 (0%)         88m
  openshift-image-registry                 image-registry-59b74c4947-ld2ql            100m (6%)     0 (0%)      256Mi (3%)       0 (0%)         68m
  openshift-image-registry                 node-ca-7n5rh                              10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         84m
  openshift-ingress-canary                 ingress-canary-rl4rx                       10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         84m
  openshift-ingress                        router-default-7854b58d84-p64n9            100m (6%)     0 (0%)      256Mi (3%)       0 (0%)         68m
  openshift-kube-storage-version-migrator  migrator-777f85c94f-spws6                  100m (6%)     0 (0%)      200Mi (3%)       0 (0%)         68m
  openshift-machine-config-operator        machine-config-daemon-54jtc                40m (2%)      0 (0%)      100Mi (1%)       0 (0%)         89m
  openshift-marketplace                    certified-operators-9l674                  10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         68m
  openshift-marketplace                    community-operators-77rzq                  10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         80s
  openshift-marketplace                    qe-app-registry-55vfw                      10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         68m
  openshift-marketplace                    redhat-marketplace-qm5g4                   10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         14m
  openshift-marketplace                    redhat-operators-7bb2s                     10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         68m
  openshift-monitoring                     alertmanager-main-0                        8m (0%)       0 (0%)      270Mi (4%)       0 (0%)         68m
  openshift-monitoring                     grafana-56f75b4dfd-8l9k9                   5m (0%)       0 (0%)      120Mi (1%)       0 (0%)         68m
  openshift-monitoring                     node-exporter-wqb52                        9m (0%)       0 (0%)      210Mi (3%)       0 (0%)         84m
  openshift-monitoring                     openshift-state-metrics-8dcd45497-6x7zq    3m (0%)       0 (0%)      190Mi (2%)       0 (0%)         68m
  openshift-monitoring                     prometheus-adapter-8649fb987f-k9jt4        1m (0%)       0 (0%)      25Mi (0%)        0 (0%)         68m
  openshift-monitoring                     prometheus-k8s-1                           76m (5%)      0 (0%)      1204Mi (18%)     0 (0%)         68m
  openshift-monitoring                     thanos-querier-89cbbf9b8-6s987             9m (0%)       0 (0%)      92Mi (1%)        0 (0%)         68m
  openshift-multus                         multus-dld6h                               10m (0%)      0 (0%)      150Mi (2%)       0 (0%)         89m
  openshift-multus                         network-metrics-daemon-bjd6m               20m (1%)      0 (0%)      120Mi (1%)       0 (0%)         88m
  openshift-network-diagnostics            network-check-target-hfqmw                 10m (0%)      0 (0%)      150Mi (2%)       0 (0%)         84m
  openshift-sdn                            ovs-4m9ng                                  100m (6%)     0 (0%)      400Mi (6%)       0 (0%)         88m
  openshift-sdn                            sdn-vrt5v                                  110m (7%)     0 (0%)      220Mi (3%)       0 (0%)         88m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests      Limits
  --------                    --------      ------
  cpu                         886m (59%)    0 (0%)
  memory                      4603Mi (69%)  0 (0%)
  ephemeral-storage           0 (0%)        0 (0%)
  hugepages-1Gi               0 (0%)        0 (0%)
  hugepages-2Mi               0 (0%)        0 (0%)
  attachable-volumes-aws-ebs  0             0
Events:
  Type     Reason                     Age                   From                                                 Message
  ----     ------                     ----                  ----                                                 -------
  Normal   NodeHasNoDiskPressure      89m (x7 over 89m)     kubelet, ip-10-0-146-230.us-east-2.compute.internal  Node ip-10-0-146-230.us-east-2.compute.internal status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID       89m (x7 over 89m)     kubelet, ip-10-0-146-230.us-east-2.compute.internal  Node ip-10-0-146-230.us-east-2.compute.internal status is now: NodeHasSufficientPID
  Normal   NodeReady                  88m                   kubelet, ip-10-0-146-230.us-east-2.compute.internal  Node ip-10-0-146-230.us-east-2.compute.internal status is now: NodeReady
  Normal   NodeNotSchedulable         69m                   kubelet, ip-10-0-146-230.us-east-2.compute.internal  Node ip-10-0-146-230.us-east-2.compute.internal status is now: NodeNotSchedulable
  Normal   NodeSchedulable            68m                   kubelet, ip-10-0-146-230.us-east-2.compute.internal  Node ip-10-0-146-230.us-east-2.compute.internal status is now: NodeSchedulable
  Warning  SystemOOM                  25m                   kubelet, ip-10-0-146-230.us-east-2.compute.internal  System OOM encountered, victim process: python, pid: 153197
  Warning  SystemOOM                  25m                   kubelet, ip-10-0-146-230.us-east-2.compute.internal  System OOM encountered, victim process: opm, pid: 47565
  Warning  SystemOOM                  25m                   kubelet, ip-10-0-146-230.us-east-2.compute.internal  System OOM encountered, victim process: opm, pid: 47871
  Warning  SystemOOM                  25m                   kubelet, ip-10-0-146-230.us-east-2.compute.internal  System OOM encountered, victim process: opm, pid: 49284
  Warning  SystemOOM                  24m                   kubelet, ip-10-0-146-230.us-east-2.compute.internal  System OOM encountered, victim process: python, pid: 154084
  Normal   NodeHasInsufficientMemory  24m                   kubelet, ip-10-0-146-230.us-east-2.compute.internal  Node ip-10-0-146-230.us-east-2.compute.internal status is now: NodeHasInsufficientMemory
  Warning  SystemOOM                  24m                   kubelet, ip-10-0-146-230.us-east-2.compute.internal  System OOM encountered, victim process: opm, pid: 47671
  Warning  SystemOOM                  24m                   kubelet, ip-10-0-146-230.us-east-2.compute.internal  System OOM encountered, victim process: opm, pid: 155463
  Warning  SystemOOM                  24m                   kubelet, ip-10-0-146-230.us-east-2.compute.internal  System OOM encountered, victim process: opm, pid: 154420
  Normal   NodeHasSufficientMemory    19m (x8 over 89m)     kubelet, ip-10-0-146-230.us-east-2.compute.internal  Node ip-10-0-146-230.us-east-2.compute.internal status is now: NodeHasSufficientMemory
  Warning  SystemOOM                  18m                   kubelet, ip-10-0-146-230.us-east-2.compute.internal  System OOM encountered, victim process: python, pid: 171167
  Warning  SystemOOM                  6m34s (x15 over 18m)  kubelet, ip-10-0-146-230.us-east-2.compute.internal  (combined from similar events): System OOM encountered, victim process: python, pid: 200649
  Warning  EvictionThresholdMet       80s (x5 over 25m)     kubelet, ip-10-0-146-230.us-east-2.compute.internal  Attempting to reclaim memory

$ oc get pods
NAME           READY   STATUS             RESTARTS   AGE
badmem-fk9x5   0/1     CrashLoopBackOff   14         55m
badmem-tjhjd   0/1     Evicted            0          56m

$ oc get pods -o wide
NAME           READY   STATUS      RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
badmem-r2rfx   0/1     OOMKilled   1          29s   10.129.2.33   ip-10-0-164-104.us-east-2.compute.internal   <none>           <none>
badmem-tjhjd   0/1     Evicted     0          57m   <none>        ip-10-0-146-230.us-east-2.compute.internal   <none>           <none>

$ oc get nodes -o wide
NAME                                         STATUS                     ROLES    AGE    VERSION           INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
ip-10-0-139-84.us-east-2.compute.internal    Ready                      master   126m   v1.20.0+87544c5   10.0.139.84    <none>        Red Hat Enterprise Linux CoreOS 47.83.202101041743-0 (Ootpa)   4.18.0-240.10.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-146-230.us-east-2.compute.internal   Ready,SchedulingDisabled   worker   121m   v1.20.0+87544c5   10.0.146.230   <none>        Red Hat Enterprise Linux CoreOS 47.83.202101041743-0 (Ootpa)   4.18.0-240.10.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-164-104.us-east-2.compute.internal   Ready                      worker   120m   v1.20.0+87544c5   10.0.164.104   <none>        Red Hat Enterprise Linux CoreOS 47.83.202101041743-0 (Ootpa)   4.18.0-240.10.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-169-245.us-east-2.compute.internal   Ready                      master   126m   v1.20.0+87544c5   10.0.169.245   <none>        Red Hat Enterprise Linux CoreOS 47.83.202101041743-0 (Ootpa)   4.18.0-240.10.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-192-137.us-east-2.compute.internal   Ready                      master   126m   v1.20.0+87544c5   10.0.192.137   <none>        Red Hat Enterprise Linux CoreOS 47.83.202101041743-0 (Ootpa)   4.18.0-240.10.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-218-188.us-east-2.compute.internal   Ready,SchedulingDisabled   worker   120m   v1.20.0+87544c5   10.0.218.188   <none>        Red Hat Enterprise Linux CoreOS 47.83.202101041743-0 (Ootpa)   4.18.0-240.10.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39

$ oc get pods -o wide
NAME           READY   STATUS             RESTARTS   AGE    IP            NODE                                         NOMINATED NODE   READINESS GATES
badmem-4c9jt   0/1     Evicted            0          91m    <none>        ip-10-0-164-104.us-east-2.compute.internal   <none>           <none>
badmem-65pk6   0/1     Evicted            0          91m    <none>        ip-10-0-164-104.us-east-2.compute.internal   <none>           <none>
badmem-8g5vk   0/1     Evicted            0          91m    <none>        ip-10-0-164-104.us-east-2.compute.internal   <none>           <none>
badmem-b8zvb   0/1     Evicted            0          91m    <none>        ip-10-0-164-104.us-east-2.compute.internal   <none>           <none>
badmem-cq7hh   0/1     Evicted            0          91m    <none>        ip-10-0-164-104.us-east-2.compute.internal   <none>           <none>
badmem-d8mcg   0/1     Evicted            0          91m    <none>        ip-10-0-164-104.us-east-2.compute.internal   <none>           <none>
badmem-gmkdr   0/1     Evicted            0          91m    <none>        ip-10-0-164-104.us-east-2.compute.internal   <none>           <none>
badmem-hx9k7   0/1     Evicted            0          91m    <none>        ip-10-0-164-104.us-east-2.compute.internal   <none>           <none>
badmem-j2lqx   0/1     CrashLoopBackOff   21         91m    10.129.2.37   ip-10-0-164-104.us-east-2.compute.internal   <none>           <none>
badmem-llqjc   0/1     Evicted            0          91m    <none>        ip-10-0-164-104.us-east-2.compute.internal   <none>           <none>
badmem-r2rfx   0/1     Evicted            0          94m    <none>        ip-10-0-164-104.us-east-2.compute.internal   <none>           <none>
badmem-tjhjd   0/1     Evicted            0          151m   <none>        ip-10-0-146-230.us-east-2.compute.internal   <none>           <none>
badmem-wkbj4   0/1     Evicted            0          91m    <none>        ip-10-0-164-104.us-east-2.compute.internal   <none>           <none>
badmem-wtmxh   0/1     Evicted            0          91m    <none>        ip-10-0-164-104.us-east-2.compute.internal   <none>           <none>
badmem-x95vs   0/1     Evicted            0          91m    <none>        ip-10-0-164-104.us-east-2.compute.internal   <none>           <none>
badmem-z7hkj   0/1     Evicted            0          91m    <none>        ip-10-0-164-104.us-east-2.compute.internal   <none>           <none>

$ oc describe pod badmem-j2lqx
Name:         badmem-j2lqx
Namespace:    app
Priority:     0
Node:         ip-10-0-164-104.us-east-2.compute.internal/10.0.164.104
Start Time:   Tue, 05 Jan 2021 14:07:14 +0530
Labels:       app=badmem
Annotations:  k8s.v1.cni.cncf.io/network-status:
                [{
                    "name": "",
                    "interface": "eth0",
                    "ips": [
                        "10.129.2.37"
                    ],
                    "default": true,
                    "dns": {}
                }]
              k8s.v1.cni.cncf.io/networks-status:
                [{
                    "name": "",
                    "interface": "eth0",
                    "ips": [
                        "10.129.2.37"
                    ],
                    "default": true,
                    "dns": {}
                }]
              openshift.io/scc: restricted
Status:       Running
IP:           10.129.2.37
IPs:
  IP:           10.129.2.37
Controlled By:  ReplicationController/badmem
Containers:
  badmem:
    Container ID:  cri-o://4c435138458dc16988352f95fe9653bf0315c263cf709266b601a667fc21c832
    Image:         registry.redhat.io/rhel7:latest
    Image ID:      registry.redhat.io/rhel7@sha256:110e61d28c1bfa1aad79e0413b98a70679a070baafb70e122fda4d105651599e
    Port:          <none>
    Host Port:     <none>
    Args:
      python
      -c
      x = []
      while True:
        x.append("x" * 1048576)
      
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Tue, 05 Jan 2021 15:31:56 +0530
      Finished:     Tue, 05 Jan 2021 15:32:02 +0530
    Ready:          False
    Restart Count:  21
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-gz2d7 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-gz2d7:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-gz2d7
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                   From                                                 Message
  ----     ------            ----                  ----                                                 -------
  Warning  FailedScheduling  <unknown>                                                                  0/6 nodes are available: 1 node(s) had taint {node.kubernetes.io/memory-pressure: }, that the pod didn't tolerate, 2 node(s) were unschedulable, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>                                                                  0/6 nodes are available: 1 node(s) had taint {node.kubernetes.io/memory-pressure: }, that the pod didn't tolerate, 2 node(s) were unschedulable, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
  Normal   Scheduled         <unknown>                                                                  Successfully assigned app/badmem-j2lqx to ip-10-0-164-104.us-east-2.compute.internal
  Normal   AddedInterface    86m                   multus                                               Add eth0 [10.129.2.37/23]
  Normal   Pulled            86m                   kubelet, ip-10-0-164-104.us-east-2.compute.internal  Successfully pulled image "registry.redhat.io/rhel7:latest" in 964.725482ms
  Normal   Pulled            86m                   kubelet, ip-10-0-164-104.us-east-2.compute.internal  Successfully pulled image "registry.redhat.io/rhel7:latest" in 1.759608201s
  Normal   Pulled            86m                   kubelet, ip-10-0-164-104.us-east-2.compute.internal  Successfully pulled image "registry.redhat.io/rhel7:latest" in 939.182997ms
  Normal   Pulled            85m                   kubelet, ip-10-0-164-104.us-east-2.compute.internal  Successfully pulled image "registry.redhat.io/rhel7:latest" in 919.173303ms
  Normal   Started           85m (x4 over 86m)     kubelet, ip-10-0-164-104.us-east-2.compute.internal  Started container badmem
  Normal   Pulled            85m                   kubelet, ip-10-0-164-104.us-east-2.compute.internal  Successfully pulled image "registry.redhat.io/rhel7:latest" in 934.860053ms
  Normal   Pulling           85m (x5 over 86m)     kubelet, ip-10-0-164-104.us-east-2.compute.internal  Pulling image "registry.redhat.io/rhel7:latest"
  Normal   Created           85m (x5 over 86m)     kubelet, ip-10-0-164-104.us-east-2.compute.internal  Created container badmem
  Warning  BackOff           100s (x382 over 86m)  kubelet, ip-10-0-164-104.us-east-2.compute.internal  Back-off restarting failed container

$ oc describe node ip-10-0-164-104.us-east-2.compute.internal
Name:               ip-10-0-164-104.us-east-2.compute.internal
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m5.large
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-east-2
                    failure-domain.beta.kubernetes.io/zone=us-east-2b
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-0-164-104
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=
                    node.kubernetes.io/instance-type=m5.large
                    node.openshift.io/os_id=rhcos
                    topology.ebs.csi.aws.com/zone=us-east-2b
                    topology.kubernetes.io/region=us-east-2
                    topology.kubernetes.io/zone=us-east-2b
Annotations:        csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-048ee2a82b51c1319"}
                    machine.openshift.io/machine: openshift-machine-api/sunilc0501-q8g7n-worker-us-east-2b-l6skb
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-3b5bd44448e8d9aa6de4000b0f64c1d7
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-3b5bd44448e8d9aa6de4000b0f64c1d7
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Done
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Tue, 05 Jan 2021 11:59:42 +0530
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  ip-10-0-164-104.us-east-2.compute.internal
  AcquireTime:     <unset>
  RenewTime:       Tue, 05 Jan 2021 15:34:20 +0530
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Tue, 05 Jan 2021 15:33:28 +0530   Tue, 05 Jan 2021 15:00:33 +0530   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Tue, 05 Jan 2021 15:33:28 +0530   Tue, 05 Jan 2021 11:59:42 +0530   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Tue, 05 Jan 2021 15:33:28 +0530   Tue, 05 Jan 2021 11:59:42 +0530   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Tue, 05 Jan 2021 15:33:28 +0530   Tue, 05 Jan 2021 12:00:53 +0530   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   10.0.164.104
  Hostname:     ip-10-0-164-104.us-east-2.compute.internal
  InternalDNS:  ip-10-0-164-104.us-east-2.compute.internal
Capacity:
  attachable-volumes-aws-ebs:  25
  cpu:                         2
  ephemeral-storage:           125293548Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      7934700Ki
  pods:                        250
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         1500m
  ephemeral-storage:           114396791822
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      6783724Ki
  pods:                        250
System Info:
  Machine ID:                             ec216bdfbcb53e43cf9f0bacd7069f16
  System UUID:                            ec216bdf-bcb5-3e43-cf9f-0bacd7069f16
  Boot ID:                                2006871b-5342-405e-b92d-23edb79081b6
  Kernel Version:                         4.18.0-240.10.1.el8_3.x86_64
  OS Image:                               Red Hat Enterprise Linux CoreOS 47.83.202101041743-0 (Ootpa)
  Operating System:                       linux
  Architecture:                           amd64
  Container Runtime Version:              cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
  Kubelet Version:                        v1.20.0+87544c5
  Kube-Proxy Version:                     v1.20.0+87544c5
ProviderID:                               aws:///us-east-2b/i-048ee2a82b51c1319
Non-terminated Pods:                      (16 in total)
  Namespace                               Name                             CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                               ----                             ------------  ----------  ---------------  -------------  ---
  app                                     badmem-j2lqx                     0 (0%)        0 (0%)      0 (0%)           0 (0%)         92m
  openshift-cluster-csi-drivers           aws-ebs-csi-driver-node-r6msv    30m (2%)      0 (0%)      150Mi (2%)       0 (0%)         3h30m
  openshift-cluster-node-tuning-operator  tuned-kqtmn                      10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         3h30m
  openshift-dns                           dns-default-5bkv9                65m (4%)      0 (0%)      110Mi (1%)       0 (0%)         3h34m
  openshift-image-registry                node-ca-2wgqb                    10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         3h30m
  openshift-ingress-canary                ingress-canary-7kwz6             10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         3h30m
  openshift-machine-config-operator       machine-config-daemon-2vxmt      40m (2%)      0 (0%)      100Mi (1%)       0 (0%)         3h34m
  openshift-marketplace                   certified-operators-b9dvl        10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         34m
  openshift-marketplace                   qe-app-registry-wfmf7            10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         89m
  openshift-marketplace                   redhat-marketplace-tnx8p         10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         31m
  openshift-monitoring                    node-exporter-hq6ff              9m (0%)       0 (0%)      210Mi (3%)       0 (0%)         3h30m
  openshift-multus                        multus-pccck                     10m (0%)      0 (0%)      150Mi (2%)       0 (0%)         3h34m
  openshift-multus                        network-metrics-daemon-wgvrg     20m (1%)      0 (0%)      120Mi (1%)       0 (0%)         3h34m
  openshift-network-diagnostics           network-check-target-nzq88       10m (0%)      0 (0%)      150Mi (2%)       0 (0%)         3h30m
  openshift-sdn                           ovs-stf64                        100m (6%)     0 (0%)      400Mi (6%)       0 (0%)         3h34m
  openshift-sdn                           sdn-z7pvs                        110m (7%)     0 (0%)      220Mi (3%)       0 (0%)         3h34m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests      Limits
  --------                    --------      ------
  cpu                         454m (30%)    0 (0%)
  memory                      1840Mi (27%)  0 (0%)
  ephemeral-storage           0 (0%)        0 (0%)
  hugepages-1Gi               0 (0%)        0 (0%)
  hugepages-2Mi               0 (0%)        0 (0%)
  attachable-volumes-aws-ebs  0             0
Events:
  Type     Reason                     Age                   From                                                 Message
  ----     ------                     ----                  ----                                                 -------
  Normal   NodeNotSchedulable         152m (x2 over 3h13m)  kubelet, ip-10-0-164-104.us-east-2.compute.internal  Node ip-10-0-164-104.us-east-2.compute.internal status is now: NodeNotSchedulable
  Normal   NodeSchedulable            95m (x2 over 3h12m)   kubelet, ip-10-0-164-104.us-east-2.compute.internal  Node ip-10-0-164-104.us-east-2.compute.internal status is now: NodeSchedulable
  Warning  SystemOOM                  94m                   kubelet, ip-10-0-164-104.us-east-2.compute.internal  System OOM encountered, victim process: python, pid: 93966
  Warning  SystemOOM                  94m                   kubelet, ip-10-0-164-104.us-east-2.compute.internal  System OOM encountered, victim process: python, pid: 94071
  Warning  SystemOOM                  94m                   kubelet, ip-10-0-164-104.us-east-2.compute.internal  System OOM encountered, victim process: kube-rbac-proxy, pid: 3462
  Warning  SystemOOM                  93m                   kubelet, ip-10-0-164-104.us-east-2.compute.internal  System OOM encountered, victim process: python, pid: 94343
  Warning  SystemOOM                  93m                   kubelet, ip-10-0-164-104.us-east-2.compute.internal  System OOM encountered, victim process: kube-rbac-proxy, pid: 94208
  Warning  SystemOOM                  93m                   kubelet, ip-10-0-164-104.us-east-2.compute.internal  System OOM encountered, victim process: python, pid: 94776
  Warning  SystemOOM                  93m                   kubelet, ip-10-0-164-104.us-east-2.compute.internal  System OOM encountered, victim process: kube-rbac-proxy, pid: 94594
  Normal   NodeHasInsufficientMemory  92m                   kubelet, ip-10-0-164-104.us-east-2.compute.internal  Node ip-10-0-164-104.us-east-2.compute.internal status is now: NodeHasInsufficientMemory
  Warning  SystemOOM                  92m                   kubelet, ip-10-0-164-104.us-east-2.compute.internal  System OOM encountered, victim process: python, pid: 95543
  Warning  SystemOOM                  92m                   kubelet, ip-10-0-164-104.us-east-2.compute.internal  System OOM encountered, victim process: kube-rbac-proxy, pid: 95146
  Warning  EvictionThresholdMet       38m (x2 over 92m)     kubelet, ip-10-0-164-104.us-east-2.compute.internal  Attempting to reclaim memory
  Normal   NodeHasSufficientMemory    33m (x11 over 3h35m)  kubelet, ip-10-0-164-104.us-east-2.compute.internal  Node ip-10-0-164-104.us-east-2.compute.internal status is now: NodeHasSufficientMemory
  Warning  SystemOOM                  2m22s (x49 over 87m)  kubelet, ip-10-0-164-104.us-east-2.compute.internal  (combined from similar events): System OOM encountered, victim process: python, pid: 198213

Comment 70 Ryan Phillips 2021-01-08 15:38:48 UTC
*** Bug 1873816 has been marked as a duplicate of this bug. ***

Comment 71 Ryan Phillips 2021-01-08 15:43:17 UTC
*** Bug 1910801 has been marked as a duplicate of this bug. ***

Comment 76 Harshal Patil 2021-01-20 10:26:18 UTC
*** Bug 1915023 has been marked as a duplicate of this bug. ***

Comment 77 Neil Girard 2021-01-20 18:23:24 UTC
Hello,

Are there any plans of backporting this fix to older 4.x releases?

Thanks,
Neil Girard

Comment 78 guy chen 2021-02-01 11:22:16 UTC
*** Bug 1904051 has been marked as a duplicate of this bug. ***

Comment 83 errata-xmlrpc 2021-02-24 15:13:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Comment 84 Harshal Patil 2021-03-02 07:09:43 UTC
*** Bug 1931467 has been marked as a duplicate of this bug. ***

Comment 85 Lucas López Montero 2021-03-04 11:33:09 UTC
Following KCS articles about this bug have been written: https://access.redhat.com/solutions/5853471, which links to https://access.redhat.com/solutions/5843241.


Note You need to log in before you can comment on or make changes to this bug.