During upgrade from 4.7.19 to 4.8.0-0.nightly-2021-07-04-112043, RHEL node has stopped posting kubelet status. Further checking the node is not accessible by ssh. Manally rebooted the node from AWS console couple of time but it is still not accessible. Checking status from AWS console I see check "Instance reachability check failed". Profile: private-templates/functionality-testing/aos-4_7/ipi-on-aws/versioned-installer-customer_vpc-http_proxy-fips-ovn-etcd_encryption-sts-ci version 4.7.19 True True 3h1m Unable to apply 4.8.0-0.nightly-2021-07-04-112043: the cluster operator monitoring has not yet successfully rolled out #oc get node: NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-55-38.us-east-2.compute.internal NotReady,SchedulingDisabled worker 3h3m v1.20.0+87cc9a4 10.0.55.38 <none> Red Hat Enterprise Linux Server 7.9 (Maipo) 3.10.0-1160.31.1.el7.x86_64 cri-o://1.20.3-4.rhaos4.7.gitbaade70.el7 ip-10-0-55-40.us-east-2.compute.internal Ready worker 4h13m v1.20.0+87cc9a4 10.0.55.40 <none> Red Hat Enterprise Linux CoreOS 47.83.202106252242-0 (Ootpa) 4.18.0-240.22.1.el8_3.x86_64 cri-o://1.20.3-6.rhaos4.7.git0d0f863.el8 ip-10-0-60-221.us-east-2.compute.internal Ready master 4h22m v1.21.1+f36aa36 10.0.60.221 <none> Red Hat Enterprise Linux CoreOS 48.84.202107040900-0 (Ootpa) 4.18.0-305.7.1.el8_4.x86_64 cri-o://1.21.1-12.rhaos4.8.git30ca719.el8 ip-10-0-61-63.us-east-2.compute.internal Ready worker 3h3m v1.20.0+87cc9a4 10.0.61.63 <none> Red Hat Enterprise Linux Server 7.9 (Maipo) 3.10.0-1160.31.1.el7.x86_64 cri-o://1.20.3-4.rhaos4.7.gitbaade70.el7 ip-10-0-66-219.us-east-2.compute.internal Ready worker 4h13m v1.20.0+87cc9a4 10.0.66.219 <none> Red Hat Enterprise Linux CoreOS 47.83.202106252242-0 (Ootpa) 4.18.0-240.22.1.el8_3.x86_64 cri-o://1.20.3-6.rhaos4.7.git0d0f863.el8 ip-10-0-67-121.us-east-2.compute.internal Ready worker 4h13m v1.20.0+87cc9a4 10.0.67.121 <none> Red Hat Enterprise Linux CoreOS 47.83.202106252242-0 (Ootpa) 4.18.0-240.22.1.el8_3.x86_64 cri-o://1.20.3-6.rhaos4.7.git0d0f863.el8 ip-10-0-69-180.us-east-2.compute.internal Ready master 4h22m v1.21.1+f36aa36 10.0.69.180 <none> Red Hat Enterprise Linux CoreOS 48.84.202107040900-0 (Ootpa) 4.18.0-305.7.1.el8_4.x86_64 cri-o://1.21.1-12.rhaos4.8.git30ca719.el8 ip-10-0-78-3.us-east-2.compute.internal Ready master 4h23m v1.21.1+f36aa36 10.0.78.3 <none> Red Hat Enterprise Linux CoreOS 48.84.202107040900-0 (Ootpa) 4.18.0-305.7.1.el8_4.x86_64 cri-o://1.21.1-12.rhaos4.8.git30ca719.el8 #oc get co:NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.8.0-0.nightly-2021-07-04-112043 True False False 60m baremetal 4.8.0-0.nightly-2021-07-04-112043 True False False 4h20m cloud-credential 4.8.0-0.nightly-2021-07-04-112043 True False False 4h19m cluster-autoscaler 4.8.0-0.nightly-2021-07-04-112043 True False False 4h19m config-operator 4.8.0-0.nightly-2021-07-04-112043 True False False 4h20m console 4.8.0-0.nightly-2021-07-04-112043 True False False 63m csi-snapshot-controller 4.8.0-0.nightly-2021-07-04-112043 True False False 4h19m dns 4.8.0-0.nightly-2021-07-04-112043 True True False 92m etcd 4.8.0-0.nightly-2021-07-04-112043 True False False 4h18m image-registry 4.8.0-0.nightly-2021-07-04-112043 True False False 4h11m ingress 4.8.0-0.nightly-2021-07-04-112043 True False False 111m insights 4.8.0-0.nightly-2021-07-04-112043 True False False 4h13m kube-apiserver 4.8.0-0.nightly-2021-07-04-112043 True False False 4h16m kube-controller-manager 4.8.0-0.nightly-2021-07-04-112043 True False False 4h17m kube-scheduler 4.8.0-0.nightly-2021-07-04-112043 True False False 4h17m kube-storage-version-migrator 4.8.0-0.nightly-2021-07-04-112043 True False False 63m machine-api 4.8.0-0.nightly-2021-07-04-112043 True False False 4h16m machine-approver 4.8.0-0.nightly-2021-07-04-112043 True False False 4h19m machine-config 4.7.19 False True True 73m marketplace 4.8.0-0.nightly-2021-07-04-112043 True False False 4h18m monitoring 4.8.0-0.nightly-2021-07-04-112043 False True True 81m network 4.8.0-0.nightly-2021-07-04-112043 True True True 4h19m node-tuning 4.8.0-0.nightly-2021-07-04-112043 True False False 111m openshift-apiserver 4.8.0-0.nightly-2021-07-04-112043 True False False 60m openshift-controller-manager 4.8.0-0.nightly-2021-07-04-112043 True False False 111m openshift-samples 4.8.0-0.nightly-2021-07-04-112043 True False False 112m operator-lifecycle-manager 4.8.0-0.nightly-2021-07-04-112043 True False False 4h19m operator-lifecycle-manager-catalog 4.8.0-0.nightly-2021-07-04-112043 True False False 4h19m operator-lifecycle-manager-packageserver 4.8.0-0.nightly-2021-07-04-112043 True False False 74m service-ca 4.8.0-0.nightly-2021-07-04-112043 True False False 4h20m storage 4.8.0-0.nightly-2021-07-04-112043 True True False 63m ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Abnormal node details~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Name: ip-10-0-55-38.us-east-2.compute.internal Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=m4.xlarge beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=us-east-2 failure-domain.beta.kubernetes.io/zone=us-east-2a kubernetes.io/arch=amd64 kubernetes.io/hostname=ip-10-0-55-38.us-east-2.compute.internal kubernetes.io/os=linux node-role.kubernetes.io/worker= node.kubernetes.io/instance-type=m4.xlarge node.openshift.io/os_id=rhel topology.ebs.csi.aws.com/zone=us-east-2a topology.kubernetes.io/region=us-east-2 topology.kubernetes.io/zone=us-east-2a Annotations: csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-0bd63277720bf17c8"} k8s.ovn.org/host-addresses: ["10.0.55.38"] k8s.ovn.org/l3-gateway-config: {"default":{"mode":"shared","interface-id":"br-ex_ip-10-0-55-38.us-east-2.compute.internal","mac-address":"02:62:c6:14:36:00","ip-addresse... k8s.ovn.org/node-chassis-id: 1e1d2533-1425-41d2-beba-09fc44daa288 k8s.ovn.org/node-local-nat-ip: {"default":["169.254.1.242"]} k8s.ovn.org/node-mgmt-port-mac-address: f6:0b:48:15:27:f3 k8s.ovn.org/node-primary-ifaddr: {"ipv4":"10.0.55.38/20"} k8s.ovn.org/node-subnets: {"default":"10.130.2.0/23"} machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable machineconfiguration.openshift.io/currentConfig: rendered-worker-dd6dc5722454034750a66be6346d26bf machineconfiguration.openshift.io/desiredConfig: rendered-worker-09b84bce73d691779ce3c6688af6ab41 machineconfiguration.openshift.io/ssh: accessed machineconfiguration.openshift.io/state: Working volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Tue, 06 Jul 2021 13:09:08 +0000 Taints: node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unreachable:NoSchedule node.kubernetes.io/unschedulable:NoSchedule Unschedulable: true Lease: HolderIdentity: ip-10-0-55-38.us-east-2.compute.internal AcquireTime: <unset> RenewTime: Tue, 06 Jul 2021 14:50:18 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure Unknown Tue, 06 Jul 2021 14:45:48 +0000 Tue, 06 Jul 2021 14:51:03 +0000 NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown Tue, 06 Jul 2021 14:45:48 +0000 Tue, 06 Jul 2021 14:51:03 +0000 NodeStatusUnknown Kubelet stopped posting node status. PIDPressure Unknown Tue, 06 Jul 2021 14:45:48 +0000 Tue, 06 Jul 2021 14:51:03 +0000 NodeStatusUnknown Kubelet stopped posting node status. Ready Unknown Tue, 06 Jul 2021 14:45:48 +0000 Tue, 06 Jul 2021 14:51:03 +0000 NodeStatusUnknown Kubelet stopped posting node status. Addresses: InternalIP: 10.0.55.38 Hostname: ip-10-0-55-38.us-east-2.compute.internal InternalDNS: ip-10-0-55-38.us-east-2.compute.internal Capacity: attachable-volumes-aws-ebs: 39 cpu: 4 ephemeral-storage: 31444972Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 16264952Ki pods: 250 Allocatable: attachable-volumes-aws-ebs: 39 cpu: 3500m ephemeral-storage: 27905944324 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 15113976Ki pods: 250 System Info: Machine ID: a863266299dd48eda1b0e80a7195ae55 System UUID: EC205C3D-AC93-C19F-188B-5E9C81A18EF4 Boot ID: ed1d919e-44a8-4cac-a361-96194d835d31 Kernel Version: 3.10.0-1160.31.1.el7.x86_64 OS Image: Red Hat Enterprise Linux Server 7.9 (Maipo) Operating System: linux Architecture: amd64 Container Runtime Version: cri-o://1.20.3-4.rhaos4.7.gitbaade70.el7 Kubelet Version: v1.20.0+87cc9a4 Kube-Proxy Version: v1.20.0+87cc9a4 ProviderID: aws:///us-east-2a/i-0bd63277720bf17c8 Non-terminated Pods: (13 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- openshift-cluster-csi-drivers aws-ebs-csi-driver-node-z9ftj 30m (0%) 0 (0%) 150Mi (1%) 0 (0%) 111m openshift-cluster-node-tuning-operator tuned-wvfzq 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 110m openshift-dns dns-default-pdwck 60m (1%) 0 (0%) 110Mi (0%) 0 (0%) 93m openshift-dns node-resolver-zltcv 5m (0%) 0 (0%) 21Mi (0%) 0 (0%) 94m openshift-image-registry node-ca-p4fzn 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 111m openshift-ingress-canary ingress-canary-p4qfx 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 112m openshift-machine-config-operator machine-config-daemon-pcrr5 40m (1%) 0 (0%) 100Mi (0%) 0 (0%) 88m openshift-monitoring node-exporter-fbjf5 9m (0%) 0 (0%) 47Mi (0%) 0 (0%) 113m openshift-multus multus-additional-cni-plugins-2fc85 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 109m openshift-multus multus-q6s6q 10m (0%) 0 (0%) 65Mi (0%) 0 (0%) 109m openshift-multus network-metrics-daemon-2bpbc 20m (0%) 0 (0%) 120Mi (0%) 0 (0%) 108m openshift-network-diagnostics network-check-target-9shzw 10m (0%) 0 (0%) 15Mi (0%) 0 (0%) 108m openshift-ovn-kubernetes ovnkube-node-lq8cb 40m (1%) 0 (0%) 640Mi (4%) 0 (0%) 109m Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 264m (7%) 0 (0%) memory 1358Mi (9%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) attachable-volumes-aws-ebs 0 0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Starting 3h5m kubelet Starting kubelet. Normal NodeHasSufficientMemory 3h5m (x2 over 3h5m) kubelet Node ip-10-0-55-38.us-east-2.compute.internal status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 3h5m (x2 over 3h5m) kubelet Node ip-10-0-55-38.us-east-2.compute.internal status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 3h5m (x2 over 3h5m) kubelet Node ip-10-0-55-38.us-east-2.compute.internal status is now: NodeHasSufficientPID Normal NodeAllocatableEnforced 3h5m kubelet Updated Node Allocatable limit across pods Normal NodeReady 3h3m kubelet Node ip-10-0-55-38.us-east-2.compute.internal status is now: NodeReady Normal NodeNotSchedulable 84m kubelet Node ip-10-0-55-38.us-east-2.compute.internal status is now: NodeNotSchedulable