Verified on 4.6.0-0.nightly-2021-01-05-062422 Cordoned the node and created badmem rc and I could see pod being evicted due to System OOM and then node trying to reclaim memory and recovers from NotReady state. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2021-01-05-062422 True False 18m Cluster version is 4.6.0-0.nightly-2021-01-05-062422 $ oc adm cordon sunilc0501462-d9q8r-worker-northcentralus-bdn7j node/sunilc0501462-d9q8r-worker-northcentralus-bdn7j cordoned $ oc adm cordon sunilc0501462-d9q8r-worker-northcentralus-tfwvv node/sunilc0501462-d9q8r-worker-northcentralus-tfwvv cordoned $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME sunilc0501462-d9q8r-master-0 Ready master 38m v1.19.0+9c69bdc 10.0.0.5 <none> Red Hat Enterprise Linux CoreOS 46.82.202101042340-0 (Ootpa) 4.18.0-193.37.1.el8_2.x86_64 cri-o://1.19.1-2.rhaos4.6.git2af9ecf.el8 sunilc0501462-d9q8r-master-1 Ready master 39m v1.19.0+9c69bdc 10.0.0.8 <none> Red Hat Enterprise Linux CoreOS 46.82.202101042340-0 (Ootpa) 4.18.0-193.37.1.el8_2.x86_64 cri-o://1.19.1-2.rhaos4.6.git2af9ecf.el8 sunilc0501462-d9q8r-master-2 Ready master 39m v1.19.0+9c69bdc 10.0.0.6 <none> Red Hat Enterprise Linux CoreOS 46.82.202101042340-0 (Ootpa) 4.18.0-193.37.1.el8_2.x86_64 cri-o://1.19.1-2.rhaos4.6.git2af9ecf.el8 sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Ready worker 28m v1.19.0+9c69bdc 10.0.32.4 <none> Red Hat Enterprise Linux CoreOS 46.82.202101042340-0 (Ootpa) 4.18.0-193.37.1.el8_2.x86_64 cri-o://1.19.1-2.rhaos4.6.git2af9ecf.el8 sunilc0501462-d9q8r-worker-northcentralus-bdn7j Ready,SchedulingDisabled worker 30m v1.19.0+9c69bdc 10.0.32.5 <none> Red Hat Enterprise Linux CoreOS 46.82.202101042340-0 (Ootpa) 4.18.0-193.37.1.el8_2.x86_64 cri-o://1.19.1-2.rhaos4.6.git2af9ecf.el8 sunilc0501462-d9q8r-worker-northcentralus-tfwvv Ready,SchedulingDisabled worker 30m v1.19.0+9c69bdc 10.0.32.6 <none> Red Hat Enterprise Linux CoreOS 46.82.202101042340-0 (Ootpa) 4.18.0-193.37.1.el8_2.x86_64 cri-o://1.19.1-2.rhaos4.6.git2af9ecf.el8 $ oc create -f rc.yaml replicationcontroller/badmem created $ oc get rc NAME DESIRED CURRENT READY AGE badmem 1 1 0 4s $ oc get pods NAME READY STATUS RESTARTS AGE badmem-kzcds 0/1 ContainerCreating 0 8s $ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES badmem-kzcds 0/1 ContainerCreating 0 11s <none> sunilc0501462-d9q8r-worker-northcentralus-6tlp8 <none> <none> $ oc get nodes NAME STATUS ROLES AGE VERSION sunilc0501462-d9q8r-master-0 Ready master 42m v1.19.0+9c69bdc sunilc0501462-d9q8r-master-1 Ready master 42m v1.19.0+9c69bdc sunilc0501462-d9q8r-master-2 Ready master 42m v1.19.0+9c69bdc sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Ready worker 32m v1.19.0+9c69bdc sunilc0501462-d9q8r-worker-northcentralus-bdn7j Ready,SchedulingDisabled worker 34m v1.19.0+9c69bdc sunilc0501462-d9q8r-worker-northcentralus-tfwvv Ready,SchedulingDisabled worker 34m v1.19.0+9c69bdc $ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES badmem-kzcds 0/1 Evicted 0 3m50s <none> sunilc0501462-d9q8r-worker-northcentralus-6tlp8 <none> <none> badmem-tn2df 0/1 Pending 0 54s <none> sunilc0501462-d9q8r-worker-northcentralus-6tlp8 <none> <none> $ oc describe pod badmem-kzcds Name: badmem-kzcds Namespace: app Priority: 0 Node: sunilc0501462-d9q8r-worker-northcentralus-6tlp8/ Start Time: Tue, 05 Jan 2021 18:57:14 +0530 Labels: app=badmem Annotations: k8s.v1.cni.cncf.io/network-status: [{ "name": "", "interface": "eth0", "ips": [ "10.129.2.20" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "", "interface": "eth0", "ips": [ "10.129.2.20" ], "default": true, "dns": {} }] openshift.io/scc: restricted Status: Failed Reason: Evicted Message: The node was low on resource: memory. Container badmem was using 4824476Ki, which exceeds its request of 0. IP: IPs: <none> Controlled By: ReplicationController/badmem Containers: badmem: Image: registry.redhat.io/rhel7:latest Port: <none> Host Port: <none> Args: python -c x = [] while True: x.append("x" * 1048576) Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-bsglx (ro) Volumes: default-token-bsglx: Type: Secret (a volume populated by a Secret) SecretName: default-token-bsglx Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> Successfully assigned app/badmem-kzcds to sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Normal AddedInterface 4m2s multus Add eth0 [10.129.2.20/23] Normal Pulling 4m2s kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Pulling image "registry.redhat.io/rhel7:latest" Normal Pulled 3m49s kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Successfully pulled image "registry.redhat.io/rhel7:latest" in 12.167769654s Normal Created 3m49s kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Created container badmem Normal Started 3m49s kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Started container badmem Warning Evicted 69s kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 The node was low on resource: memory. Container badmem was using 4824476Ki, which exceeds its request of 0. Normal Killing 69s kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Stopping container badmem $ oc describe node sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Name: sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=Standard_D2s_v3 beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=northcentralus failure-domain.beta.kubernetes.io/zone=0 kubernetes.io/arch=amd64 kubernetes.io/hostname=sunilc0501462-d9q8r-worker-northcentralus-6tlp8 kubernetes.io/os=linux node-role.kubernetes.io/worker= node.kubernetes.io/instance-type=Standard_D2s_v3 node.openshift.io/os_id=rhcos topology.kubernetes.io/region=northcentralus topology.kubernetes.io/zone=0 Annotations: machine.openshift.io/machine: openshift-machine-api/sunilc0501462-d9q8r-worker-northcentralus-6tlp8 machineconfiguration.openshift.io/currentConfig: rendered-worker-a4ac29112a52ba7576de658d93eee318 machineconfiguration.openshift.io/desiredConfig: rendered-worker-a4ac29112a52ba7576de658d93eee318 machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Tue, 05 Jan 2021 18:28:18 +0530 Taints: node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unreachable:NoSchedule Unschedulable: false Lease: HolderIdentity: sunilc0501462-d9q8r-worker-northcentralus-6tlp8 AcquireTime: <unset> RenewTime: Tue, 05 Jan 2021 19:00:02 +0530 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure Unknown Tue, 05 Jan 2021 19:00:10 +0530 Tue, 05 Jan 2021 19:00:58 +0530 NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown Tue, 05 Jan 2021 19:00:10 +0530 Tue, 05 Jan 2021 19:00:58 +0530 NodeStatusUnknown Kubelet stopped posting node status. PIDPressure Unknown Tue, 05 Jan 2021 19:00:10 +0530 Tue, 05 Jan 2021 19:00:58 +0530 NodeStatusUnknown Kubelet stopped posting node status. Ready Unknown Tue, 05 Jan 2021 19:00:10 +0530 Tue, 05 Jan 2021 19:00:58 +0530 NodeStatusUnknown Kubelet stopped posting node status. Addresses: Hostname: sunilc0501462-d9q8r-worker-northcentralus-6tlp8 InternalIP: 10.0.32.4 Capacity: attachable-volumes-azure-disk: 4 cpu: 2 ephemeral-storage: 133665772Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8162108Ki pods: 250 Allocatable: attachable-volumes-azure-disk: 4 cpu: 1500m ephemeral-storage: 122112633448 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 7011132Ki pods: 250 System Info: Machine ID: b59f0d0db5384d05a678d0ae880ddee7 System UUID: 22d41a7c-c740-cb4a-bc5b-69f063a99f3d Boot ID: 3d352a0e-b38b-43f5-a0d7-feaabb935cc4 Kernel Version: 4.18.0-193.37.1.el8_2.x86_64 OS Image: Red Hat Enterprise Linux CoreOS 46.82.202101042340-0 (Ootpa) Operating System: linux Architecture: amd64 Container Runtime Version: cri-o://1.19.1-2.rhaos4.6.git2af9ecf.el8 Kubelet Version: v1.19.0+9c69bdc Kube-Proxy Version: v1.19.0+9c69bdc ProviderID: azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/sunilc0501462-d9q8r-rg/providers/Microsoft.Compute/virtualMachines/sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Non-terminated Pods: (26 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- app badmem-tn2df 0 (0%) 0 (0%) 0 (0%) 0 (0%) 87s openshift-cluster-node-tuning-operator tuned-c9fgb 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 33m openshift-dns dns-default-4m2p8 65m (4%) 0 (0%) 110Mi (1%) 512Mi (7%) 33m openshift-image-registry image-registry-6874c7f96d-lfjd6 100m (6%) 0 (0%) 256Mi (3%) 0 (0%) 13m openshift-image-registry node-ca-qtc8f 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 33m openshift-ingress router-default-7fb5458d59-nd7jx 100m (6%) 0 (0%) 256Mi (3%) 0 (0%) 13m openshift-kube-storage-version-migrator migrator-6696f8b898-ct5l7 100m (6%) 0 (0%) 200Mi (2%) 0 (0%) 13m openshift-machine-config-operator machine-config-daemon-bw7h7 40m (2%) 0 (0%) 100Mi (1%) 0 (0%) 33m openshift-marketplace certified-operators-c7447 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 2m33s openshift-marketplace certified-operators-j4chw 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 13m openshift-marketplace qe-app-registry-pwbgd 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 3m46s openshift-marketplace redhat-marketplace-2h6ls 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 3m4s openshift-marketplace redhat-marketplace-kg6qz 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 13m openshift-marketplace redhat-operators-cvpv2 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 13m openshift-monitoring alertmanager-main-1 8m (0%) 0 (0%) 270Mi (3%) 0 (0%) 12m openshift-monitoring kube-state-metrics-666bbccbb5-xxsd9 4m (0%) 0 (0%) 120Mi (1%) 0 (0%) 13m openshift-monitoring node-exporter-rj2wj 9m (0%) 0 (0%) 210Mi (3%) 0 (0%) 33m openshift-monitoring openshift-state-metrics-57c88f4499-g6m2h 3m (0%) 0 (0%) 190Mi (2%) 0 (0%) 13m openshift-monitoring prometheus-adapter-8cbf5bd6f-6skbb 1m (0%) 0 (0%) 25Mi (0%) 0 (0%) 13m openshift-monitoring prometheus-k8s-1 75m (5%) 0 (0%) 1194Mi (17%) 0 (0%) 13m openshift-monitoring telemeter-client-6654c98cdf-fjpvw 3m (0%) 0 (0%) 20Mi (0%) 0 (0%) 13m openshift-monitoring thanos-querier-7dc858d774-rcnsb 9m (0%) 0 (0%) 92Mi (1%) 0 (0%) 13m openshift-multus multus-4zbns 10m (0%) 0 (0%) 150Mi (2%) 0 (0%) 33m openshift-multus network-metrics-daemon-zzn62 20m (1%) 0 (0%) 120Mi (1%) 0 (0%) 33m openshift-sdn ovs-wjjl6 100m (6%) 0 (0%) 400Mi (5%) 0 (0%) 33m openshift-sdn sdn-7x45m 110m (7%) 0 (0%) 220Mi (3%) 0 (0%) 33m Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 837m (55%) 0 (0%) memory 4293Mi (62%) 512Mi (7%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) attachable-volumes-azure-disk 0 0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Starting 33m kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Starting kubelet. Normal NodeHasSufficientMemory 33m (x2 over 33m) kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Node sunilc0501462-d9q8r-worker-northcentralus-6tlp8 status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 33m (x2 over 33m) kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Node sunilc0501462-d9q8r-worker-northcentralus-6tlp8 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 33m (x2 over 33m) kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Node sunilc0501462-d9q8r-worker-northcentralus-6tlp8 status is now: NodeHasSufficientPID Normal NodeAllocatableEnforced 33m kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Updated Node Allocatable limit across pods Normal NodeReady 32m kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Node sunilc0501462-d9q8r-worker-northcentralus-6tlp8 status is now: NodeReady Normal NodeNotSchedulable 19m kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Node sunilc0501462-d9q8r-worker-northcentralus-6tlp8 status is now: NodeNotSchedulable Normal NodeAllocatableEnforced 13m kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Updated Node Allocatable limit across pods Normal NodeHasSufficientMemory 13m (x2 over 13m) kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Node sunilc0501462-d9q8r-worker-northcentralus-6tlp8 status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 13m (x2 over 13m) kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Node sunilc0501462-d9q8r-worker-northcentralus-6tlp8 status is now: NodeHasNoDiskPressure Normal Starting 13m kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Starting kubelet. Warning Rebooted 13m kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Node sunilc0501462-d9q8r-worker-northcentralus-6tlp8 has been rebooted, boot id: 3d352a0e-b38b-43f5-a0d7-feaabb935cc4 Normal NodeNotReady 13m kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Node sunilc0501462-d9q8r-worker-northcentralus-6tlp8 status is now: NodeNotReady Normal NodeNotSchedulable 13m kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Node sunilc0501462-d9q8r-worker-northcentralus-6tlp8 status is now: NodeNotSchedulable Normal NodeHasSufficientPID 13m (x2 over 13m) kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Node sunilc0501462-d9q8r-worker-northcentralus-6tlp8 status is now: NodeHasSufficientPID Normal NodeReady 13m kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Node sunilc0501462-d9q8r-worker-northcentralus-6tlp8 status is now: NodeReady Normal NodeSchedulable 13m kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Node sunilc0501462-d9q8r-worker-northcentralus-6tlp8 status is now: NodeSchedulable Warning EvictionThresholdMet 87s kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Attempting to reclaim memory Warning ContainerGCFailed 87s kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 rpc error: code = DeadlineExceeded desc = context deadline exceeded Normal NodeHasInsufficientMemory 86s kubelet, sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Node sunilc0501462-d9q8r-worker-northcentralus-6tlp8 status is now: NodeHasInsufficientMemory $ oc get nodes NAME STATUS ROLES AGE VERSION sunilc0501462-d9q8r-master-0 Ready master 44m v1.19.0+9c69bdc sunilc0501462-d9q8r-master-1 Ready master 44m v1.19.0+9c69bdc sunilc0501462-d9q8r-master-2 Ready master 44m v1.19.0+9c69bdc sunilc0501462-d9q8r-worker-northcentralus-6tlp8 NotReady worker 34m v1.19.0+9c69bdc sunilc0501462-d9q8r-worker-northcentralus-bdn7j Ready,SchedulingDisabled worker 36m v1.19.0+9c69bdc sunilc0501462-d9q8r-worker-northcentralus-tfwvv Ready,SchedulingDisabled worker 36m v1.19.0+9c69bdc $ oc get nodes NAME STATUS ROLES AGE VERSION sunilc0501462-d9q8r-master-0 Ready master 50m v1.19.0+9c69bdc sunilc0501462-d9q8r-master-1 Ready master 50m v1.19.0+9c69bdc sunilc0501462-d9q8r-master-2 Ready master 51m v1.19.0+9c69bdc sunilc0501462-d9q8r-worker-northcentralus-6tlp8 Ready worker 40m v1.19.0+9c69bdc sunilc0501462-d9q8r-worker-northcentralus-bdn7j Ready,SchedulingDisabled worker 42m v1.19.0+9c69bdc sunilc0501462-d9q8r-worker-northcentralus-tfwvv Ready,SchedulingDisabled worker 42m v1.19.0+9c69bdc
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.6.12 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0037