Bug 1979891 - RHEL Node in NotReady state after upgrade from 4.7.19 to 4.8 nightly.
Summary: RHEL Node in NotReady state after upgrade from 4.7.19 to 4.8 nightly.
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Yu Qi Zhang
QA Contact: Rio Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-07 11:03 UTC by Sunil Choudhary
Modified: 2021-11-08 17:34 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-08 17:34:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Sunil Choudhary 2021-07-07 11:03:58 UTC
During upgrade from 4.7.19 to 4.8.0-0.nightly-2021-07-04-112043, RHEL node has stopped posting kubelet status. 

Further checking the node is not accessible by ssh. Manally rebooted the node from AWS console couple of time but it is still not accessible.
Checking status from AWS console I see check "Instance reachability check failed". 

Profile: private-templates/functionality-testing/aos-4_7/ipi-on-aws/versioned-installer-customer_vpc-http_proxy-fips-ovn-etcd_encryption-sts-ci

version   4.7.19    True        True          3h1m    Unable to apply 4.8.0-0.nightly-2021-07-04-112043: the cluster operator monitoring has not yet successfully rolled out
 
#oc get node: NAME                                        STATUS                        ROLES    AGE     VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
 ip-10-0-55-38.us-east-2.compute.internal    NotReady,SchedulingDisabled   worker   3h3m    v1.20.0+87cc9a4   10.0.55.38    <none>        Red Hat Enterprise Linux Server 7.9 (Maipo)                    3.10.0-1160.31.1.el7.x86_64    cri-o://1.20.3-4.rhaos4.7.gitbaade70.el7
 ip-10-0-55-40.us-east-2.compute.internal    Ready                         worker   4h13m   v1.20.0+87cc9a4   10.0.55.40    <none>        Red Hat Enterprise Linux CoreOS 47.83.202106252242-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-6.rhaos4.7.git0d0f863.el8
 ip-10-0-60-221.us-east-2.compute.internal   Ready                         master   4h22m   v1.21.1+f36aa36   10.0.60.221   <none>        Red Hat Enterprise Linux CoreOS 48.84.202107040900-0 (Ootpa)   4.18.0-305.7.1.el8_4.x86_64    cri-o://1.21.1-12.rhaos4.8.git30ca719.el8
 ip-10-0-61-63.us-east-2.compute.internal    Ready                         worker   3h3m    v1.20.0+87cc9a4   10.0.61.63    <none>        Red Hat Enterprise Linux Server 7.9 (Maipo)                    3.10.0-1160.31.1.el7.x86_64    cri-o://1.20.3-4.rhaos4.7.gitbaade70.el7
 ip-10-0-66-219.us-east-2.compute.internal   Ready                         worker   4h13m   v1.20.0+87cc9a4   10.0.66.219   <none>        Red Hat Enterprise Linux CoreOS 47.83.202106252242-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-6.rhaos4.7.git0d0f863.el8
 ip-10-0-67-121.us-east-2.compute.internal   Ready                         worker   4h13m   v1.20.0+87cc9a4   10.0.67.121   <none>        Red Hat Enterprise Linux CoreOS 47.83.202106252242-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-6.rhaos4.7.git0d0f863.el8
 ip-10-0-69-180.us-east-2.compute.internal   Ready                         master   4h22m   v1.21.1+f36aa36   10.0.69.180   <none>        Red Hat Enterprise Linux CoreOS 48.84.202107040900-0 (Ootpa)   4.18.0-305.7.1.el8_4.x86_64    cri-o://1.21.1-12.rhaos4.8.git30ca719.el8
 ip-10-0-78-3.us-east-2.compute.internal     Ready                         master   4h23m   v1.21.1+f36aa36   10.0.78.3     <none>        Red Hat Enterprise Linux CoreOS 48.84.202107040900-0 (Ootpa)   4.18.0-305.7.1.el8_4.x86_64    cri-o://1.21.1-12.rhaos4.8.git30ca719.el8
 
 
#oc get co:NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
 authentication                             4.8.0-0.nightly-2021-07-04-112043   True        False         False      60m
 baremetal                                  4.8.0-0.nightly-2021-07-04-112043   True        False         False      4h20m
 cloud-credential                           4.8.0-0.nightly-2021-07-04-112043   True        False         False      4h19m
 cluster-autoscaler                         4.8.0-0.nightly-2021-07-04-112043   True        False         False      4h19m
 config-operator                            4.8.0-0.nightly-2021-07-04-112043   True        False         False      4h20m
 console                                    4.8.0-0.nightly-2021-07-04-112043   True        False         False      63m
 csi-snapshot-controller                    4.8.0-0.nightly-2021-07-04-112043   True        False         False      4h19m
 dns                                        4.8.0-0.nightly-2021-07-04-112043   True        True          False      92m
 etcd                                       4.8.0-0.nightly-2021-07-04-112043   True        False         False      4h18m
 image-registry                             4.8.0-0.nightly-2021-07-04-112043   True        False         False      4h11m
 ingress                                    4.8.0-0.nightly-2021-07-04-112043   True        False         False      111m
 insights                                   4.8.0-0.nightly-2021-07-04-112043   True        False         False      4h13m
 kube-apiserver                             4.8.0-0.nightly-2021-07-04-112043   True        False         False      4h16m
 kube-controller-manager                    4.8.0-0.nightly-2021-07-04-112043   True        False         False      4h17m
 kube-scheduler                             4.8.0-0.nightly-2021-07-04-112043   True        False         False      4h17m
 kube-storage-version-migrator              4.8.0-0.nightly-2021-07-04-112043   True        False         False      63m
 machine-api                                4.8.0-0.nightly-2021-07-04-112043   True        False         False      4h16m
 machine-approver                           4.8.0-0.nightly-2021-07-04-112043   True        False         False      4h19m
 machine-config                             4.7.19                              False       True          True       73m
 marketplace                                4.8.0-0.nightly-2021-07-04-112043   True        False         False      4h18m
 monitoring                                 4.8.0-0.nightly-2021-07-04-112043   False       True          True       81m
 network                                    4.8.0-0.nightly-2021-07-04-112043   True        True          True       4h19m
 node-tuning                                4.8.0-0.nightly-2021-07-04-112043   True        False         False      111m
 openshift-apiserver                        4.8.0-0.nightly-2021-07-04-112043   True        False         False      60m
 openshift-controller-manager               4.8.0-0.nightly-2021-07-04-112043   True        False         False      111m
 openshift-samples                          4.8.0-0.nightly-2021-07-04-112043   True        False         False      112m
 operator-lifecycle-manager                 4.8.0-0.nightly-2021-07-04-112043   True        False         False      4h19m
 operator-lifecycle-manager-catalog         4.8.0-0.nightly-2021-07-04-112043   True        False         False      4h19m
 operator-lifecycle-manager-packageserver   4.8.0-0.nightly-2021-07-04-112043   True        False         False      74m
 service-ca                                 4.8.0-0.nightly-2021-07-04-112043   True        False         False      4h20m
 storage                                    4.8.0-0.nightly-2021-07-04-112043   True        True          False      63m
 
 
 
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Abnormal node details~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 
 Name:               ip-10-0-55-38.us-east-2.compute.internal
 Roles:              worker
 Labels:             beta.kubernetes.io/arch=amd64
                     beta.kubernetes.io/instance-type=m4.xlarge
                     beta.kubernetes.io/os=linux
                     failure-domain.beta.kubernetes.io/region=us-east-2
                     failure-domain.beta.kubernetes.io/zone=us-east-2a
                     kubernetes.io/arch=amd64
                     kubernetes.io/hostname=ip-10-0-55-38.us-east-2.compute.internal
                     kubernetes.io/os=linux
                     node-role.kubernetes.io/worker=
                     node.kubernetes.io/instance-type=m4.xlarge
                     node.openshift.io/os_id=rhel
                     topology.ebs.csi.aws.com/zone=us-east-2a
                     topology.kubernetes.io/region=us-east-2
                     topology.kubernetes.io/zone=us-east-2a
 Annotations:        csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-0bd63277720bf17c8"}
                     k8s.ovn.org/host-addresses: ["10.0.55.38"]
                     k8s.ovn.org/l3-gateway-config:
                       {"default":{"mode":"shared","interface-id":"br-ex_ip-10-0-55-38.us-east-2.compute.internal","mac-address":"02:62:c6:14:36:00","ip-addresse...
                     k8s.ovn.org/node-chassis-id: 1e1d2533-1425-41d2-beba-09fc44daa288
                     k8s.ovn.org/node-local-nat-ip: {"default":["169.254.1.242"]}
                     k8s.ovn.org/node-mgmt-port-mac-address: f6:0b:48:15:27:f3
                     k8s.ovn.org/node-primary-ifaddr: {"ipv4":"10.0.55.38/20"}
                     k8s.ovn.org/node-subnets: {"default":"10.130.2.0/23"}
                     machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
                     machineconfiguration.openshift.io/currentConfig: rendered-worker-dd6dc5722454034750a66be6346d26bf
                     machineconfiguration.openshift.io/desiredConfig: rendered-worker-09b84bce73d691779ce3c6688af6ab41
                     machineconfiguration.openshift.io/ssh: accessed
                     machineconfiguration.openshift.io/state: Working
                     volumes.kubernetes.io/controller-managed-attach-detach: true
 CreationTimestamp:  Tue, 06 Jul 2021 13:09:08 +0000
 Taints:             node.kubernetes.io/unreachable:NoExecute
                     node.kubernetes.io/unreachable:NoSchedule
                     node.kubernetes.io/unschedulable:NoSchedule
 Unschedulable:      true
 Lease:
   HolderIdentity:  ip-10-0-55-38.us-east-2.compute.internal
   AcquireTime:     <unset>
   RenewTime:       Tue, 06 Jul 2021 14:50:18 +0000
 Conditions:
   Type             Status    LastHeartbeatTime                 LastTransitionTime                Reason              Message
   ----             ------    -----------------                 ------------------                ------              -------
   MemoryPressure   Unknown   Tue, 06 Jul 2021 14:45:48 +0000   Tue, 06 Jul 2021 14:51:03 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
   DiskPressure     Unknown   Tue, 06 Jul 2021 14:45:48 +0000   Tue, 06 Jul 2021 14:51:03 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
   PIDPressure      Unknown   Tue, 06 Jul 2021 14:45:48 +0000   Tue, 06 Jul 2021 14:51:03 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
   Ready            Unknown   Tue, 06 Jul 2021 14:45:48 +0000   Tue, 06 Jul 2021 14:51:03 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
 Addresses:
   InternalIP:   10.0.55.38
   Hostname:     ip-10-0-55-38.us-east-2.compute.internal
   InternalDNS:  ip-10-0-55-38.us-east-2.compute.internal
 Capacity:
   attachable-volumes-aws-ebs:  39
   cpu:                         4
   ephemeral-storage:           31444972Ki
   hugepages-1Gi:               0
   hugepages-2Mi:               0
   memory:                      16264952Ki
   pods:                        250
 Allocatable:
   attachable-volumes-aws-ebs:  39
   cpu:                         3500m
   ephemeral-storage:           27905944324
   hugepages-1Gi:               0
   hugepages-2Mi:               0
   memory:                      15113976Ki
   pods:                        250
 System Info:
   Machine ID:                             a863266299dd48eda1b0e80a7195ae55
   System UUID:                            EC205C3D-AC93-C19F-188B-5E9C81A18EF4
   Boot ID:                                ed1d919e-44a8-4cac-a361-96194d835d31
   Kernel Version:                         3.10.0-1160.31.1.el7.x86_64
   OS Image:                               Red Hat Enterprise Linux Server 7.9 (Maipo)
   Operating System:                       linux
   Architecture:                           amd64
   Container Runtime Version:              cri-o://1.20.3-4.rhaos4.7.gitbaade70.el7
   Kubelet Version:                        v1.20.0+87cc9a4
   Kube-Proxy Version:                     v1.20.0+87cc9a4
 ProviderID:                               aws:///us-east-2a/i-0bd63277720bf17c8
 Non-terminated Pods:                      (13 in total)
   Namespace                               Name                                   CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
   ---------                               ----                                   ------------  ----------  ---------------  -------------  ---
   openshift-cluster-csi-drivers           aws-ebs-csi-driver-node-z9ftj          30m (0%)      0 (0%)      150Mi (1%)       0 (0%)         111m
   openshift-cluster-node-tuning-operator  tuned-wvfzq                            10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         110m
   openshift-dns                           dns-default-pdwck                      60m (1%)      0 (0%)      110Mi (0%)       0 (0%)         93m
   openshift-dns                           node-resolver-zltcv                    5m (0%)       0 (0%)      21Mi (0%)        0 (0%)         94m
   openshift-image-registry                node-ca-p4fzn                          10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         111m
   openshift-ingress-canary                ingress-canary-p4qfx                   10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         112m
   openshift-machine-config-operator       machine-config-daemon-pcrr5            40m (1%)      0 (0%)      100Mi (0%)       0 (0%)         88m
   openshift-monitoring                    node-exporter-fbjf5                    9m (0%)       0 (0%)      47Mi (0%)        0 (0%)         113m
   openshift-multus                        multus-additional-cni-plugins-2fc85    10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         109m
   openshift-multus                        multus-q6s6q                           10m (0%)      0 (0%)      65Mi (0%)        0 (0%)         109m
   openshift-multus                        network-metrics-daemon-2bpbc           20m (0%)      0 (0%)      120Mi (0%)       0 (0%)         108m
   openshift-network-diagnostics           network-check-target-9shzw             10m (0%)      0 (0%)      15Mi (0%)        0 (0%)         108m
   openshift-ovn-kubernetes                ovnkube-node-lq8cb                     40m (1%)      0 (0%)      640Mi (4%)       0 (0%)         109m
 Allocated resources:
   (Total limits may be over 100 percent, i.e., overcommitted.)
   Resource                    Requests     Limits
   --------                    --------     ------
   cpu                         264m (7%)    0 (0%)
   memory                      1358Mi (9%)  0 (0%)
   ephemeral-storage           0 (0%)       0 (0%)
   hugepages-1Gi               0 (0%)       0 (0%)
   hugepages-2Mi               0 (0%)       0 (0%)
   attachable-volumes-aws-ebs  0            0
 Events:
   Type    Reason                   Age                  From     Message
   ----    ------                   ----                 ----     -------
   Normal  Starting                 3h5m                 kubelet  Starting kubelet.
   Normal  NodeHasSufficientMemory  3h5m (x2 over 3h5m)  kubelet  Node ip-10-0-55-38.us-east-2.compute.internal status is now: NodeHasSufficientMemory
   Normal  NodeHasNoDiskPressure    3h5m (x2 over 3h5m)  kubelet  Node ip-10-0-55-38.us-east-2.compute.internal status is now: NodeHasNoDiskPressure
   Normal  NodeHasSufficientPID     3h5m (x2 over 3h5m)  kubelet  Node ip-10-0-55-38.us-east-2.compute.internal status is now: NodeHasSufficientPID
   Normal  NodeAllocatableEnforced  3h5m                 kubelet  Updated Node Allocatable limit across pods
   Normal  NodeReady                3h3m                 kubelet  Node ip-10-0-55-38.us-east-2.compute.internal status is now: NodeReady
   Normal  NodeNotSchedulable       84m                  kubelet  Node ip-10-0-55-38.us-east-2.compute.internal status is now: NodeNotSchedulable


Note You need to log in before you can comment on or make changes to this bug.