Bug 1982971 - Node stuck in Scheduling disabled state after upgrade from 4.7.20 to 4.8 nightly
Summary: Node stuck in Scheduling disabled state after upgrade from 4.7.20 to 4.8 nightly
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Yu Qi Zhang
QA Contact: Rio Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-16 07:09 UTC by Sunil Choudhary
Modified: 2021-11-08 17:37 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-08 17:37:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Sunil Choudhary 2021-07-16 07:09:51 UTC
Node stuck in SchedulingDisabled state after upgrade from 4.7.20 to 4.8.0-0.nightly-2021-07-13-115744.
I see worker mcp is stuck in UPDATING True state

Profile: upi-on-baremetal/versioned-installer-packet-disk_encryption-etcd_encryption-ci

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.20    True        True          22h     Unable to apply 4.8.0-0.nightly-2021-07-13-115744: wait has exceeded 40 minutes for these operators: ingress


$ oc get nodes -o wide
NAME                                                     STATUS                     ROLES    AGE   VERSION           INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
master-00.schoudha15131907.qe.devcluster.openshift.com   Ready                      master   23h   v1.21.1+f36aa36   147.75.35.185   <none>        Red Hat Enterprise Linux CoreOS 48.84.202107091120-0 (Ootpa)   4.18.0-305.7.1.el8_4.x86_64    cri-o://1.21.1-13.rhaos4.8.git8d20153.el8
master-01.schoudha15131907.qe.devcluster.openshift.com   Ready                      master   23h   v1.21.1+f36aa36   147.75.55.139   <none>        Red Hat Enterprise Linux CoreOS 48.84.202107091120-0 (Ootpa)   4.18.0-305.7.1.el8_4.x86_64    cri-o://1.21.1-13.rhaos4.8.git8d20153.el8
master-02.schoudha15131907.qe.devcluster.openshift.com   Ready                      master   23h   v1.21.1+f36aa36   147.75.35.209   <none>        Red Hat Enterprise Linux CoreOS 48.84.202107091120-0 (Ootpa)   4.18.0-305.7.1.el8_4.x86_64    cri-o://1.21.1-13.rhaos4.8.git8d20153.el8
worker-00.schoudha15131907.qe.devcluster.openshift.com   Ready                      worker   23h   v1.21.1+f36aa36   147.75.55.137   <none>        Red Hat Enterprise Linux CoreOS 48.84.202107091120-0 (Ootpa)   4.18.0-305.7.1.el8_4.x86_64    cri-o://1.21.1-13.rhaos4.8.git8d20153.el8
worker-01.schoudha15131907.qe.devcluster.openshift.com   Ready,SchedulingDisabled   worker   23h   v1.20.0+01c9f3f   147.75.35.193   <none>        Red Hat Enterprise Linux CoreOS 47.83.202107070542-0 (Ootpa)   4.18.0-240.22.1.el8_3.x86_64   cri-o://1.20.3-7.rhaos4.7.git41925ef.el8

$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.8.0-0.nightly-2021-07-13-115744   True        False         False      20h
baremetal                                  4.8.0-0.nightly-2021-07-13-115744   True        False         False      23h
cloud-credential                           4.8.0-0.nightly-2021-07-13-115744   True        False         False      23h
cluster-autoscaler                         4.8.0-0.nightly-2021-07-13-115744   True        False         False      23h
config-operator                            4.8.0-0.nightly-2021-07-13-115744   True        False         False      23h
console                                    4.8.0-0.nightly-2021-07-13-115744   True        False         False      20h
csi-snapshot-controller                    4.8.0-0.nightly-2021-07-13-115744   True        False         False      23h
dns                                        4.8.0-0.nightly-2021-07-13-115744   True        False         False      21h
etcd                                       4.8.0-0.nightly-2021-07-13-115744   True        False         False      23h
image-registry                             4.8.0-0.nightly-2021-07-13-115744   True        False         False      20h
ingress                                    4.8.0-0.nightly-2021-07-13-115744   True        False         True       20h
insights                                   4.8.0-0.nightly-2021-07-13-115744   True        False         False      23h
kube-apiserver                             4.8.0-0.nightly-2021-07-13-115744   True        False         False      23h
kube-controller-manager                    4.8.0-0.nightly-2021-07-13-115744   True        False         False      23h
kube-scheduler                             4.8.0-0.nightly-2021-07-13-115744   True        False         False      23h
kube-storage-version-migrator              4.8.0-0.nightly-2021-07-13-115744   True        False         False      20h
machine-api                                4.8.0-0.nightly-2021-07-13-115744   True        False         False      23h
machine-approver                           4.8.0-0.nightly-2021-07-13-115744   True        False         False      23h
machine-config                             4.8.0-0.nightly-2021-07-13-115744   True        False         False      20h
marketplace                                4.8.0-0.nightly-2021-07-13-115744   True        False         False      23h
monitoring                                 4.8.0-0.nightly-2021-07-13-115744   False       True          True       20h
network                                    4.8.0-0.nightly-2021-07-13-115744   True        True          True       23h
node-tuning                                4.8.0-0.nightly-2021-07-13-115744   True        False         False      21h
openshift-apiserver                        4.8.0-0.nightly-2021-07-13-115744   True        False         False      20h
openshift-controller-manager               4.8.0-0.nightly-2021-07-13-115744   True        False         False      23h
openshift-samples                          4.8.0-0.nightly-2021-07-13-115744   True        False         False      21h
operator-lifecycle-manager                 4.8.0-0.nightly-2021-07-13-115744   True        False         False      23h
operator-lifecycle-manager-catalog         4.8.0-0.nightly-2021-07-13-115744   True        False         False      23h
operator-lifecycle-manager-packageserver   4.8.0-0.nightly-2021-07-13-115744   True        False         False      20h
service-ca                                 4.8.0-0.nightly-2021-07-13-115744   True        False         False      23h
storage                                    4.8.0-0.nightly-2021-07-13-115744   True        False         False      23h


$ oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             23h
00-worker                                          29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             23h
01-master-container-runtime                        29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             23h
01-master-kubelet                                  29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             23h
01-worker-container-runtime                        29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             23h
01-worker-kubelet                                  29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             23h
99-master-generated-registries                     29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             23h
99-master-ssh                                                                                 3.2.0             23h
99-worker-generated-registries                     29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             23h
99-worker-ssh                                                                                 3.2.0             23h
master-tpm                                                                                    3.2.0             23h
rendered-master-78bfa53d9319513afae8d58982ddcac6   8eadb800abc91dd9759edc4be57235eb80ad695d   3.2.0             23h
rendered-master-d1abfc7ea3a93023a4228b96d4fe2164   8eadb800abc91dd9759edc4be57235eb80ad695d   3.2.0             23h
rendered-master-e9319c8dd734fd13564c4771d412cee2   29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             20h
rendered-worker-c503bf2b57a1b70532312aa9759be2e4   8eadb800abc91dd9759edc4be57235eb80ad695d   3.2.0             23h
rendered-worker-c88ed22f35718a35b52591ba32324713   8eadb800abc91dd9759edc4be57235eb80ad695d   3.2.0             23h
rendered-worker-dc3d2b0d956fcf88a339b88ca07680bb   29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             20h
worker-tpm                                                                                    3.2.0             23h

$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-e9319c8dd734fd13564c4771d412cee2   True      False      False      3              3                   3                     0                      23h
worker   rendered-worker-c503bf2b57a1b70532312aa9759be2e4   False     True       False      2              1                   1                     0                      23h

$ oc describe mcp worker
Name:         worker
Namespace:    
Labels:       machineconfiguration.openshift.io/mco-built-in=
              pools.operator.machineconfiguration.openshift.io/worker=
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         MachineConfigPool
Metadata:
  Creation Timestamp:  2021-07-15T05:40:28Z
  Generation:          4
  Managed Fields:
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:machineconfiguration.openshift.io/mco-built-in:
          f:pools.operator.machineconfiguration.openshift.io/worker:
      f:spec:
        .:
        f:configuration:
          .:
          f:source:
        f:machineConfigSelector:
          .:
          f:matchLabels:
            .:
            f:machineconfiguration.openshift.io/role:
        f:nodeSelector:
          .:
          f:matchLabels:
            .:
            f:node-role.kubernetes.io/worker:
        f:paused:
    Manager:      machine-config-operator
    Operation:    Update
    Time:         2021-07-15T05:40:28Z
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:configuration:
          f:name:
          f:source:
      f:status:
        .:
        f:conditions:
        f:configuration:
          .:
          f:name:
          f:source:
        f:degradedMachineCount:
        f:machineCount:
        f:observedGeneration:
        f:readyMachineCount:
        f:unavailableMachineCount:
        f:updatedMachineCount:
    Manager:         machine-config-controller
    Operation:       Update
    Time:            2021-07-15T05:41:26Z
  Resource Version:  100837
  UID:               d1d06ebf-5649-4a33-9b70-4aba8487d8d6
Spec:
  Configuration:
    Name:  rendered-worker-dc3d2b0d956fcf88a339b88ca07680bb
    Source:
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         00-worker
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-container-runtime
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-kubelet
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-generated-registries
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-ssh
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         worker-tpm
  Machine Config Selector:
    Match Labels:
      machineconfiguration.openshift.io/role:  worker
  Node Selector:
    Match Labels:
      node-role.kubernetes.io/worker:  
  Paused:                              false
Status:
  Conditions:
    Last Transition Time:  2021-07-15T05:41:21Z
    Message:               
    Reason:                
    Status:                False
    Type:                  RenderDegraded
    Last Transition Time:  2021-07-15T05:41:26Z
    Message:               
    Reason:                
    Status:                False
    Type:                  NodeDegraded
    Last Transition Time:  2021-07-15T05:41:26Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Degraded
    Last Transition Time:  2021-07-15T08:26:16Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Updated
    Last Transition Time:  2021-07-15T08:26:16Z
    Message:               All nodes are updating to rendered-worker-dc3d2b0d956fcf88a339b88ca07680bb
    Reason:                
    Status:                True
    Type:                  Updating
  Configuration:
    Name:  rendered-worker-c503bf2b57a1b70532312aa9759be2e4
    Source:
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   00-worker
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-worker-container-runtime
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-worker-kubelet
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-generated-registries
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-ssh
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   worker-tpm
  Degraded Machine Count:     0
  Machine Count:              2
  Observed Generation:        4
  Ready Machine Count:        1
  Unavailable Machine Count:  1
  Updated Machine Count:      1
Events:                       <none>

$ oc describe node worker-01.schoudha15131907.qe.devcluster.openshift.com
Name:               worker-01.schoudha15131907.qe.devcluster.openshift.com
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=worker-01.schoudha15131907.qe.devcluster.openshift.com
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=
                    node.openshift.io/os_id=rhcos
Annotations:        machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-c503bf2b57a1b70532312aa9759be2e4
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-dc3d2b0d956fcf88a339b88ca07680bb
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Working
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 15 Jul 2021 11:20:12 +0530
Taints:             node.kubernetes.io/unschedulable:NoSchedule
Unschedulable:      true
Lease:
  HolderIdentity:  worker-01.schoudha15131907.qe.devcluster.openshift.com
  AcquireTime:     <unset>
  RenewTime:       Fri, 16 Jul 2021 10:38:29 +0530
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Fri, 16 Jul 2021 10:36:09 +0530   Thu, 15 Jul 2021 11:20:12 +0530   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 16 Jul 2021 10:36:09 +0530   Thu, 15 Jul 2021 11:20:12 +0530   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Fri, 16 Jul 2021 10:36:09 +0530   Thu, 15 Jul 2021 11:20:12 +0530   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Fri, 16 Jul 2021 10:36:09 +0530   Thu, 15 Jul 2021 11:20:53 +0530   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  147.75.35.193
  Hostname:    worker-01.schoudha15131907.qe.devcluster.openshift.com
Capacity:
  cpu:                56
  ephemeral-storage:  233879108Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             394653344Ki
  pods:               250
Allocatable:
  cpu:                55500m
  ephemeral-storage:  214469243752
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             393502368Ki
  pods:               250
System Info:
  Machine ID:                             7f105a9ca9884d14a1bebe76516144cc
  System UUID:                            4c4c4544-0059-3410-8046-b8c04f445632
  Boot ID:                                4c0a68ee-f98b-4683-8c07-d9e75701c6b6
  Kernel Version:                         4.18.0-240.22.1.el8_3.x86_64
  OS Image:                               Red Hat Enterprise Linux CoreOS 47.83.202107070542-0 (Ootpa)
  Operating System:                       linux
  Architecture:                           amd64
  Container Runtime Version:              cri-o://1.20.3-7.rhaos4.7.git41925ef.el8
  Kubelet Version:                        v1.20.0+01c9f3f
  Kube-Proxy Version:                     v1.20.0+01c9f3f
Non-terminated Pods:                      (12 in total)
  Namespace                               Name                                   CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                               ----                                   ------------  ----------  ---------------  -------------  ---
  openshift-cluster-node-tuning-operator  tuned-dvk96                            10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         21h
  openshift-dns                           dns-default-rkdpg                      60m (0%)      0 (0%)      110Mi (0%)       0 (0%)         21h
  openshift-dns                           node-resolver-8j9n5                    5m (0%)       0 (0%)      21Mi (0%)        0 (0%)         21h
  openshift-image-registry                node-ca-l5wc5                          10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         21h
  openshift-ingress-canary                ingress-canary-t6zrl                   10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         21h
  openshift-machine-config-operator       machine-config-daemon-9lwxv            40m (0%)      0 (0%)      100Mi (0%)       0 (0%)         21h
  openshift-monitoring                    node-exporter-6m2w5                    9m (0%)       0 (0%)      47Mi (0%)        0 (0%)         21h
  openshift-multus                        multus-additional-cni-plugins-xgg89    10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         21h
  openshift-multus                        multus-bgn2p                           10m (0%)      0 (0%)      65Mi (0%)        0 (0%)         21h
  openshift-multus                        network-metrics-daemon-jqcqg           20m (0%)      0 (0%)      120Mi (0%)       0 (0%)         21h
  openshift-network-diagnostics           network-check-target-lpmcm             10m (0%)      0 (0%)      15Mi (0%)        0 (0%)         21h
  openshift-sdn                           sdn-f6g89                              110m (0%)     0 (0%)      220Mi (0%)       0 (0%)         21h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                304m (0%)   0 (0%)
  memory             788Mi (0%)  0 (0%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:              <none>


Note You need to log in before you can comment on or make changes to this bug.