Bug 2014683 - Node in SchedulingDisabled state after upgrade to 4.6 nightly
Summary: Node in SchedulingDisabled state after upgrade to 4.6 nightly
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: MCO Team
QA Contact: Rio Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-15 19:54 UTC by Simon
Modified: 2021-11-22 17:33 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-22 17:33:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Simon 2021-10-15 19:54:13 UTC
Description of problem:
Node in SchedulingDisabled state after upgrade to latest nightly. 

Version-Release number of selected component (if applicable):
upgrade from 4.5.0-0.nightly-2021-09-07-164108 to 4.6.0-0.nightly-2021-10-14-030206

How reproducible:
100%

Steps to Reproduce:
1. Install 4.2.36-x86_64 version (profile: 13_UPI on GCP with RHCOS & RHEL7.7 (FIPS off) & http_proxy)
2. Upgrade path: 4.2.36 -> 4.3.0-0.nightly-2021-02-23-060813 -> 4.4.0-0.nightly-2021-03-19-022315 -> 4.5.0-0.nightly-2021-09-07-164108 -> 4.6.0-0.nightly-2021-10-14-030206
3.Check nodes after upgrade.

Actual results:
One of workers in SchedulingDisabled state after upgrade

# oc get node
 NAME                                               STATUS                     ROLES    AGE     VERSION            INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
 ugd-18617-10140918-m-0.c.openshift-qe.internal     Ready                      master   6h44m   v1.19.14+fcff70a   10.0.0.5      <none>        Red Hat Enterprise Linux CoreOS 46.82.202110131857-0 (Ootpa)   4.18.0-193.65.2.el8_2.x86_64   cri-o://1.19.4-3.rhaos4.6.git7d25e5d.el8
 ugd-18617-10140918-m-1.c.openshift-qe.internal     Ready                      master   6h44m   v1.19.14+fcff70a   10.0.0.4      <none>        Red Hat Enterprise Linux CoreOS 46.82.202110131857-0 (Ootpa)   4.18.0-193.65.2.el8_2.x86_64   cri-o://1.19.4-3.rhaos4.6.git7d25e5d.el8
 ugd-18617-10140918-m-2.c.openshift-qe.internal     Ready                      master   6h44m   v1.19.14+fcff70a   10.0.0.6      <none>        Red Hat Enterprise Linux CoreOS 46.82.202110131857-0 (Ootpa)   4.18.0-193.65.2.el8_2.x86_64   cri-o://1.19.4-3.rhaos4.6.git7d25e5d.el8
 ugd-18617-10140918-w-a-0.c.openshift-qe.internal   Ready                      worker   6h30m   v1.19.14+fcff70a   10.0.32.2     <none>        Red Hat Enterprise Linux CoreOS 46.82.202110131857-0 (Ootpa)   4.18.0-193.65.2.el8_2.x86_64   cri-o://1.19.4-3.rhaos4.6.git7d25e5d.el8
 ugd-18617-10140918-w-a-l-0                         Ready                      worker   5h43m   v1.19.14+fcff70a   10.0.32.5     <none>        Red Hat Enterprise Linux Server 7.9 (Maipo)                    3.10.0-1160.45.1.el7.x86_64    cri-o://1.19.4-3.rhaos4.6.git7d25e5d.el7
 ugd-18617-10140918-w-a-l-1                         Ready,SchedulingDisabled   worker   5h43m   v1.18.3+d8ef5ad    10.0.32.6                   Red Hat Enterprise Linux Server 7.9 (Maipo)                    3.10.0-1160.45.1.el7.x86_64    cri-o://1.18.4-11.rhaos4.5.gitfa57051.el7
 ugd-18617-10140918-w-b-1.c.openshift-qe.internal   Ready                      worker   6h30m   v1.19.14+fcff70a   10.0.32.3     <none>        Red Hat Enterprise Linux CoreOS 46.82.202110131857-0 (Ootpa)   4.18.0-193.65.2.el8_2.x86_64   cri-o://1.19.4-3.rhaos4.6.git7d25e5d.el8
 ugd-18617-10140918-w-c-2.c.openshift-qe.internal   Ready                      worker   6h29m   v1.19.14+fcff70a   10.0.32.4     <none>        Red Hat Enterprise Linux CoreOS 46.82.202110131857-0 (Ootpa)   4.18.0-193.65.2.el8_2.x86_64   cri-o://1.19.4-3.rhaos4.6.git7d25e5d.el8

Name:               ugd-18617-10140918-w-a-l-1
 Roles:              worker
 Labels:             beta.kubernetes.io/arch=amd64
                     beta.kubernetes.io/instance-type=n1-standard-4
                     beta.kubernetes.io/os=linux
                     failure-domain.beta.kubernetes.io/region=us-central1
                     failure-domain.beta.kubernetes.io/zone=us-central1-a
                     kubernetes.io/arch=amd64
                     kubernetes.io/hostname=ugd-18617-10140918-w-a-l-1
                     kubernetes.io/os=linux
                     node-role.kubernetes.io/worker=
                     node.kubernetes.io/instance-type=n1-standard-4
                     node.openshift.io/os_id=rhel
                     topology.kubernetes.io/region=us-central1
                     topology.kubernetes.io/zone=us-central1-a
 Annotations:        machineconfiguration.openshift.io/currentConfig: rendered-worker-65ceefb33fc43bfe40f97de7ceaa30f7
                     machineconfiguration.openshift.io/desiredConfig: rendered-worker-65ceefb33fc43bfe40f97de7ceaa30f7
                     machineconfiguration.openshift.io/reason: 
                     machineconfiguration.openshift.io/ssh: accessed
                     machineconfiguration.openshift.io/state: Done
                     volumes.kubernetes.io/controller-managed-attach-detach: true
 CreationTimestamp:  Thu, 14 Oct 2021 10:32:25 +0000
 Taints:             node.kubernetes.io/unschedulable:NoSchedule
 Unschedulable:      true
 Lease:
   HolderIdentity:  ugd-18617-10140918-w-a-l-1
   AcquireTime:     <unset>
   RenewTime:       Thu, 14 Oct 2021 16:17:08 +0000
 Conditions:
   Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
   ----                 ------  -----------------                 ------------------                ------                       -------
   NetworkUnavailable   False   Mon, 01 Jan 0001 00:00:00 +0000   Thu, 14 Oct 2021 10:32:25 +0000   RouteCreated                 openshift-sdn cleared kubelet-set NoRouteCreated
   MemoryPressure       False   Thu, 14 Oct 2021 16:15:19 +0000   Thu, 14 Oct 2021 15:47:24 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
   DiskPressure         False   Thu, 14 Oct 2021 16:15:19 +0000   Thu, 14 Oct 2021 15:47:24 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
   PIDPressure          False   Thu, 14 Oct 2021 16:15:19 +0000   Thu, 14 Oct 2021 15:47:24 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
   Ready                True    Thu, 14 Oct 2021 16:15:19 +0000   Thu, 14 Oct 2021 15:47:34 +0000   KubeletReady                 kubelet is posting ready status
 Addresses:
   InternalIP:   10.0.32.6
   ExternalIP:   
   InternalDNS:  ugd-18617-10140918-w-a-l-1.c.openshift-qe.internal
   Hostname:     ugd-18617-10140918-w-a-l-1.c.openshift-qe.internal
 Capacity:
   attachable-volumes-gce-pd:  127
   cpu:                        4
   ephemeral-storage:          62899276Ki
   hugepages-1Gi:              0
   hugepages-2Mi:              0
   memory:                     15234104Ki
   pods:                       250
 Allocatable:
   attachable-volumes-gce-pd:  127
   cpu:                        3500m
   ephemeral-storage:          56894230842
   hugepages-1Gi:              0
   hugepages-2Mi:              0
   memory:                     14083128Ki
   pods:                       250
 System Info:
   Machine ID:                             989c8e6037e59b85635b9c5969a6b513
   System UUID:                            76B8B45F-BE2E-AFA9-3855-597F2EC7C0D2
   Boot ID:                                a3f28517-179d-4abd-879d-5e794b37990e
   Kernel Version:                         3.10.0-1160.45.1.el7.x86_64
   OS Image:                               Red Hat Enterprise Linux Server 7.9 (Maipo)
   Operating System:                       linux
   Architecture:                           amd64
   Container Runtime Version:              cri-o://1.18.4-11.rhaos4.5.gitfa57051.el7
   Kubelet Version:                        v1.18.3+d8ef5ad
   Kube-Proxy Version:                     v1.18.3+d8ef5ad
 PodCIDR:                                  10.128.5.0/24
 PodCIDRs:                                 10.128.5.0/24
 ProviderID:                               gce://openshift-qe/us-central1-a/ugd-18617-10140918-w-a-l-1
 Non-terminated Pods:                      (11 in total)
   Namespace                               Name                            CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
   ---------                               ----                            ------------  ----------  ---------------  -------------  ---
   node-upgrade                            hello-daemonset-nrc8w           0 (0%)        0 (0%)      0 (0%)           0 (0%)         5h40m
   openshift-cluster-node-tuning-operator  tuned-qjg92                     10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         56m
   openshift-dns                           dns-default-p889z               65m (1%)      0 (0%)      110Mi (0%)       512Mi (3%)     38m
   openshift-image-registry                node-ca-zjpr2                   10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         55m
   openshift-machine-config-operator       machine-config-daemon-ssh5b     20m (0%)      0 (0%)      50Mi (0%)        0 (0%)         35m
   openshift-monitoring                    node-exporter-mb9b5             9m (0%)       0 (0%)      210Mi (1%)       0 (0%)         57m
   openshift-multus                        multus-dflkp                    10m (0%)      0 (0%)      150Mi (1%)       0 (0%)         48m
   openshift-multus                        network-metrics-daemon-kxjb6    20m (0%)      0 (0%)      120Mi (0%)       0 (0%)         49m
   openshift-sdn                           ovs-s6cjq                       100m (2%)     0 (0%)      400Mi (2%)       0 (0%)         45m
   openshift-sdn                           sdn-hwdf6                       110m (3%)     0 (0%)      220Mi (1%)       0 (0%)         49m
   ui-upgrade                              hello-daemonset-mksjl           0 (0%)        0 (0%)      0 (0%)           0 (0%)         5h32m
 Allocated resources:
   (Total limits may be over 100 percent, i.e., overcommitted.)
   Resource                   Requests     Limits
   --------                   --------     ------
   cpu                        354m (10%)   0 (0%)
   memory                     1320Mi (9%)  512Mi (3%)
   ephemeral-storage          0 (0%)       0 (0%)
   hugepages-1Gi              0 (0%)       0 (0%)
   hugepages-2Mi              0 (0%)       0 (0%)
   attachable-volumes-gce-pd  0            0
 Events:
   Type     Reason                   Age                    From     Message
   ----     ------                   ----                   ----     -------
   Normal   NodeNotReady             5h13m (x2 over 5h16m)  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeNotReady
   Normal   NodeReady                5h11m (x3 over 5h44m)  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeReady
   Normal   NodeNotSchedulable       4h40m                  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeNotSchedulable
   Normal   Starting                 4h24m                  kubelet  Starting kubelet.
   Normal   NodeNotReady             4h24m                  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeNotReady
   Normal   NodeHasNoDiskPressure    4h24m (x2 over 4h24m)  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasNoDiskPressure
   Normal   NodeHasSufficientPID     4h24m (x2 over 4h24m)  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasSufficientPID
   Warning  Rebooted                 4h24m                  kubelet  Node ugd-18617-10140918-w-a-l-1 has been rebooted, boot id: ac3e208a-ce39-4ed9-9c65-696e7290199d
   Normal   NodeHasSufficientMemory  4h24m (x2 over 4h24m)  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasSufficientMemory
   Normal   NodeNotSchedulable       4h24m                  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeNotSchedulable
   Normal   NodeAllocatableEnforced  4h24m                  kubelet  Updated Node Allocatable limit across pods
   Normal   NodeReady                4h24m                  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeReady
   Normal   Starting                 4h23m                  kubelet  Starting kubelet.
   Normal   NodeAllocatableEnforced  4h23m                  kubelet  Updated Node Allocatable limit across pods
   Normal   NodeHasNoDiskPressure    4h23m (x2 over 4h23m)  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasNoDiskPressure
   Normal   NodeHasSufficientMemory  4h23m (x2 over 4h23m)  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasSufficientMemory
   Warning  Rebooted                 4h23m                  kubelet  Node ugd-18617-10140918-w-a-l-1 has been rebooted, boot id: 4657a474-d21e-4980-bc85-21ccae8a0978
   Normal   NodeNotReady             4h23m                  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeNotReady
   Normal   NodeHasSufficientPID     4h23m (x2 over 4h23m)  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasSufficientPID
   Normal   NodeReady                4h23m                  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeReady
   Normal   NodeNotSchedulable       3h21m                  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeNotSchedulable
   Normal   Starting                 3h20m                  kubelet  Starting kubelet.
   Normal   NodeHasSufficientMemory  3h20m                  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasSufficientMemory
   Normal   NodeHasNoDiskPressure    3h20m                  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasNoDiskPressure
   Normal   NodeHasSufficientPID     3h20m                  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasSufficientPID
   Warning  Rebooted                 3h20m                  kubelet  Node ugd-18617-10140918-w-a-l-1 has been rebooted, boot id: 611cee01-7458-4f11-a043-496d5804acee
   Normal   NodeNotReady             3h20m                  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeNotReady
   Normal   NodeAllocatableEnforced  3h20m                  kubelet  Updated Node Allocatable limit across pods
   Normal   NodeReady                3h19m                  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeReady
   Normal   NodeSchedulable          3h19m                  kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeSchedulable
   Normal   NodeNotSchedulable       3h4m (x2 over 3h20m)   kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeNotSchedulable
   Normal   NodeAllocatableEnforced  176m                   kubelet  Updated Node Allocatable limit across pods
   Normal   NodeHasSufficientMemory  176m (x2 over 176m)    kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasSufficientMemory
   Normal   NodeHasNoDiskPressure    176m (x2 over 176m)    kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasNoDiskPressure
   Normal   NodeHasSufficientPID     176m (x2 over 176m)    kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasSufficientPID
   Normal   Starting                 176m                   kubelet  Starting kubelet.
   Normal   NodeNotReady             176m                   kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeNotReady
   Warning  Rebooted                 176m                   kubelet  Node ugd-18617-10140918-w-a-l-1 has been rebooted, boot id: 5e7d697c-85e4-4122-a15a-88c51425a055
   Normal   NodeReady                176m                   kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeReady
   Normal   NodeSchedulable          176m                   kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeSchedulable
   Normal   NodeNotSchedulable       131m (x2 over 176m)    kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeNotSchedulable
   Normal   NodeHasSufficientPID     128m                   kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasSufficientPID
   Normal   NodeHasSufficientMemory  128m                   kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasSufficientMemory
   Normal   NodeHasNoDiskPressure    128m                   kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasNoDiskPressure
   Normal   Starting                 128m                   kubelet  Starting kubelet.
   Warning  Rebooted                 128m                   kubelet  Node ugd-18617-10140918-w-a-l-1 has been rebooted, boot id: 07b37ee5-9db7-4ef4-a6f6-befc4b6885d9
   Normal   NodeNotReady             128m                   kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeNotReady
   Normal   NodeAllocatableEnforced  128m                   kubelet  Updated Node Allocatable limit across pods
   Normal   NodeReady                128m                   kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeReady
   Normal   NodeSchedulable          128m                   kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeSchedulable
   Normal   NodeNotSchedulable       100m (x2 over 128m)    kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeNotSchedulable
   Normal   NodeHasSufficientPID     89m (x2 over 89m)      kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasSufficientPID
   Normal   NodeHasSufficientMemory  89m (x2 over 89m)      kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasSufficientMemory
   Normal   NodeHasNoDiskPressure    89m (x2 over 89m)      kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasNoDiskPressure
   Normal   Starting                 89m                    kubelet  Starting kubelet.
   Warning  Rebooted                 89m                    kubelet  Node ugd-18617-10140918-w-a-l-1 has been rebooted, boot id: 55a20a02-02a3-41c7-976f-488e233f96d5
   Normal   NodeNotReady             89m                    kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeNotReady
   Normal   NodeAllocatableEnforced  89m                    kubelet  Updated Node Allocatable limit across pods
   Normal   NodeReady                89m                    kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeReady
   Normal   NodeSchedulable          89m                    kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeSchedulable
   Normal   NodeNotSchedulable       32m (x2 over 89m)      kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeNotSchedulable
   Normal   NodeHasSufficientMemory  29m (x2 over 29m)      kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasSufficientMemory
   Normal   Starting                 29m                    kubelet  Starting kubelet.
   Normal   NodeHasNoDiskPressure    29m (x2 over 29m)      kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasNoDiskPressure
   Normal   NodeHasSufficientPID     29m (x2 over 29m)      kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeHasSufficientPID
   Warning  Rebooted                 29m                    kubelet  Node ugd-18617-10140918-w-a-l-1 has been rebooted, boot id: a3f28517-179d-4abd-879d-5e794b37990e
   Normal   NodeNotReady             29m                    kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeNotReady
   Normal   NodeAllocatableEnforced  29m                    kubelet  Updated Node Allocatable limit across pods
   Normal   NodeReady                29m                    kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeReady
   Normal   NodeSchedulable          29m                    kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeSchedulable
   Normal   NodeNotSchedulable       2m1s (x2 over 29m)     kubelet  Node ugd-18617-10140918-w-a-l-1 status is now: NodeNotSchedulable

Comment 3 Sinny Kumari 2021-11-22 17:33:07 UTC
Closing this bug because this is a very old cluster update i.e. 4.2 cluster. Don't think a 4.2 or 4.3 clusters are supported now. 

Please open a new bug with must-gather if you see this issue in recent supported cluster.


Note You need to log in before you can comment on or make changes to this bug.