Bug 1828031 - one master node becomes SchedulingDisabled status after upgrade
Summary: one master node becomes SchedulingDisabled status after upgrade
Keywords:
Status: CLOSED DUPLICATE of bug 1826329
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Ryan Phillips
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-26 10:38 UTC by Yadan Pei
Modified: 2020-04-26 13:56 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-26 12:17:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Yadan Pei 2020-04-26 10:38:49 UTC
Description of problem:
as title

Version-Release number of selected component (if applicable):
during upgrade from 4.3.17-x86_64 to 4.4.0-0.nightly-2020-04-25-061259

How reproducible:


Steps to Reproduce:
1. upgrade cluster to 
# oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-04-25-061259 --force=true --allow-explicit-upgrade=true
# Post action: #oc get node: NAME                                             STATUS                     ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                CONTAINER-RUNTIME
ugd-ci-vbrqz-m-0.c.openshift-qe.internal         Ready                      master   3h10m   v1.17.1   10.0.0.5                    Red Hat Enterprise Linux CoreOS 44.81.202004250133-0 (Ootpa)   4.18.0-147.8.1.el8_1.x86_64   cri-o://1.17.4-6.dev.rhaos4.4.gitb5c490c.el8
ugd-ci-vbrqz-m-1.c.openshift-qe.internal         Ready,SchedulingDisabled   master   3h10m   v1.16.2   10.0.0.4                    Red Hat Enterprise Linux CoreOS 43.81.202004211653.0 (Ootpa)   4.18.0-147.8.1.el8_1.x86_64   cri-o://1.16.6-5.dev.rhaos4.3.git5fb6738.el8
ugd-ci-vbrqz-m-2.c.openshift-qe.internal         Ready                      master   3h10m   v1.17.1   10.0.0.6                    Red Hat Enterprise Linux CoreOS 44.81.202004250133-0 (Ootpa)   4.18.0-147.8.1.el8_1.x86_64   cri-o://1.17.4-6.dev.rhaos4.4.gitb5c490c.el8
ugd-ci-vbrqz-w-a-29525.c.openshift-qe.internal   Ready                      worker   175m    v1.17.1   10.0.32.4                   Red Hat Enterprise Linux CoreOS 44.81.202004250133-0 (Ootpa)   4.18.0-147.8.1.el8_1.x86_64   cri-o://1.17.4-6.dev.rhaos4.4.gitb5c490c.el8
ugd-ci-vbrqz-w-b-kf8vg.c.openshift-qe.internal   Ready                      worker   175m    v1.17.1   10.0.32.3                   Red Hat Enterprise Linux CoreOS 44.81.202004250133-0 (Ootpa)   4.18.0-147.8.1.el8_1.x86_64   cri-o://1.17.4-6.dev.rhaos4.4.gitb5c490c.el8
ugd-ci-vbrqz-w-c-65vgn.c.openshift-qe.internal   Ready                      worker   176m    v1.17.1   10.0.32.2                   Red Hat Enterprise Linux CoreOS 44.81.202004250133-0 (Ootpa)   4.18.0-147.8.1.el8_1.x86_64   cri-o://1.17.4-6.dev.rhaos4.4.gitb5c490c.el8


Post action: #oc get co:NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.4.0-0.nightly-2020-04-25-061259   True        False         False      161m
cloud-credential                           4.4.0-0.nightly-2020-04-25-061259   True        False         False      3h10m
cluster-autoscaler                         4.4.0-0.nightly-2020-04-25-061259   True        False         False      174m
console                                    4.4.0-0.nightly-2020-04-25-061259   True        False         False      84m
csi-snapshot-controller                    4.4.0-0.nightly-2020-04-25-061259   True        False         False      91m
dns                                        4.4.0-0.nightly-2020-04-25-061259   True        False         False      3h4m
etcd                                       4.4.0-0.nightly-2020-04-25-061259   True        False         False      85m
image-registry                             4.4.0-0.nightly-2020-04-25-061259   True        False         False      88m
ingress                                    4.4.0-0.nightly-2020-04-25-061259   True        False         False      87m
insights                                   4.4.0-0.nightly-2020-04-25-061259   True        False         False      3h1m
kube-apiserver                             4.4.0-0.nightly-2020-04-25-061259   True        False         False      117m
kube-controller-manager                    4.4.0-0.nightly-2020-04-25-061259   True        False         False      3h2m
kube-scheduler                             4.4.0-0.nightly-2020-04-25-061259   True        False         False      115m
kube-storage-version-migrator              4.4.0-0.nightly-2020-04-25-061259   True        False         False      93m
machine-api                                4.4.0-0.nightly-2020-04-25-061259   True        False         False      3h5m
machine-config                             4.3.17                              False       True          True       76m
marketplace                                4.4.0-0.nightly-2020-04-25-061259   True        False         False      88m
monitoring                                 4.4.0-0.nightly-2020-04-25-061259   True        False         False      173m
network                                    4.4.0-0.nightly-2020-04-25-061259   True        False         False      3h5m
node-tuning                                4.4.0-0.nightly-2020-04-25-061259   False       True          False      91m
openshift-apiserver                        4.4.0-0.nightly-2020-04-25-061259   True        False         True       104m
openshift-controller-manager               4.4.0-0.nightly-2020-04-25-061259   True        False         False      3h2m
openshift-samples                          4.4.0-0.nightly-2020-04-25-061259   True        False         False      5m41s
operator-lifecycle-manager                 4.4.0-0.nightly-2020-04-25-061259   True        False         False      3h1m
operator-lifecycle-manager-catalog         4.4.0-0.nightly-2020-04-25-061259   True        False         False      3h1m
operator-lifecycle-manager-packageserver   4.4.0-0.nightly-2020-04-25-061259   True        False         False      83m
service-ca                                 4.4.0-0.nightly-2020-04-25-061259   True        False         False      3h5m
service-catalog-apiserver                  4.4.0-0.nightly-2020-04-25-061259   True        False         False      91m
service-catalog-controller-manager         4.4.0-0.nightly-2020-04-25-061259   True        False         False      160m
storage                                    4.4.0-0.nightly-2020-04-25-061259   True        False         False      108m


print detail msg for node(SchedulingDisabled) if exist:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Abnormal node details~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Name:               ugd-ci-vbrqz-m-1.c.openshift-qe.internal
Roles:              master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=n1-standard-4
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-central1
                    failure-domain.beta.kubernetes.io/zone=us-central1-b
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ugd-ci-vbrqz-m-1.c.openshift-qe.internal
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/master=
                    node.openshift.io/os_id=rhcos
Annotations:        machineconfiguration.openshift.io/currentConfig: rendered-master-3d8c18521e251ab2db1d44560d4bfe31
                    machineconfiguration.openshift.io/desiredConfig: rendered-master-aadcfe0319ef167c1ef1def893932fa0
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Working
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sat, 25 Apr 2020 12:37:28 -0400
Taints:             node-role.kubernetes.io/master:NoSchedule
                    node.kubernetes.io/unschedulable:NoSchedule
Unschedulable:      true
Lease:
  HolderIdentity:  ugd-ci-vbrqz-m-1.c.openshift-qe.internal
  AcquireTime:     <unset>
  RenewTime:       Sat, 25 Apr 2020 15:47:35 -0400
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Mon, 01 Jan 0001 00:00:00 +0000   Sat, 25 Apr 2020 12:38:23 -0400   RouteCreated                 openshift-sdn cleared kubelet-set NoRouteCreated
  MemoryPressure       False   Sat, 25 Apr 2020 15:46:47 -0400   Sat, 25 Apr 2020 12:37:28 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Sat, 25 Apr 2020 15:46:47 -0400   Sat, 25 Apr 2020 12:37:28 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Sat, 25 Apr 2020 15:46:47 -0400   Sat, 25 Apr 2020 12:37:28 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Sat, 25 Apr 2020 15:46:47 -0400   Sat, 25 Apr 2020 12:41:18 -0400   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   10.0.0.4
  ExternalIP:   
  InternalDNS:  ugd-ci-vbrqz-m-1.c.openshift-qe.internal
  Hostname:     ugd-ci-vbrqz-m-1.c.openshift-qe.internal
Capacity:
  attachable-volumes-gce-pd:  127
  cpu:                        4
  ephemeral-storage:          133665772Ki
  hugepages-1Gi:              0
  hugepages-2Mi:              0
  memory:                     15389448Ki
  pods:                       250
Allocatable:
  attachable-volumes-gce-pd:  127
  cpu:                        3500m
  ephemeral-storage:          123186375272
  hugepages-1Gi:              0
  hugepages-2Mi:              0
  memory:                     14775048Ki
  pods:                       250
System Info:
  Machine ID:                                   db4155615604d0e837f34816ea03c29f
  System UUID:                                  db415561-5604-d0e8-37f3-4816ea03c29f
  Boot ID:                                      ddc6a121-1841-4d8e-ae8d-09ba61aac5f3
  Kernel Version:                               4.18.0-147.8.1.el8_1.x86_64
  OS Image:                                     Red Hat Enterprise Linux CoreOS 43.81.202004211653.0 (Ootpa)
  Operating System:                             linux
  Architecture:                                 amd64
  Container Runtime Version:                    cri-o://1.16.6-5.dev.rhaos4.3.git5fb6738.el8
  Kubelet Version:                              v1.16.2
  Kube-Proxy Version:                           v1.16.2
ProviderID:                                     gce://openshift-qe/us-central1-b/ugd-ci-vbrqz-m-1
Non-terminated Pods:                            (19 in total)
  Namespace                                     Name                                                                 CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                                     ----                                                                 ------------  ----------  ---------------  -------------  ---
  kube-system                                   gcp-routes-controller-ugd-ci-vbrqz-m-1.c.openshift-qe.internal       20m (0%)      0 (0%)      50Mi (0%)        0 (0%)         3h9m
  openshift-apiserver                           apiserver-dzhzt                                                      150m (4%)     0 (0%)      200Mi (1%)       0 (0%)         3h1m
  openshift-controller-manager                  controller-manager-md4qz                                             100m (2%)     0 (0%)      100Mi (0%)       0 (0%)         107m
  openshift-dns                                 dns-default-j7ddq                                                    110m (3%)     0 (0%)      70Mi (0%)        512Mi (3%)     98m
  openshift-etcd                                etcd-ugd-ci-vbrqz-m-1.c.openshift-qe.internal                        430m (12%)    0 (0%)      860Mi (5%)       0 (0%)         118m
  openshift-image-registry                      node-ca-z7t66                                                        10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         107m
  openshift-kube-apiserver                      kube-apiserver-ugd-ci-vbrqz-m-1.c.openshift-qe.internal              330m (9%)     0 (0%)      1174Mi (8%)      0 (0%)         112m
  openshift-kube-controller-manager             kube-controller-manager-ugd-ci-vbrqz-m-1.c.openshift-qe.internal     100m (2%)     0 (0%)      500Mi (3%)       0 (0%)         113m
  openshift-kube-scheduler                      openshift-kube-scheduler-ugd-ci-vbrqz-m-1.c.openshift-qe.internal    20m (0%)      0 (0%)      100Mi (0%)       0 (0%)         110m
  openshift-machine-config-operator             machine-config-daemon-9nc8n                                          40m (1%)      0 (0%)      100Mi (0%)       0 (0%)         97m
  openshift-machine-config-operator             machine-config-server-r4wfp                                          20m (0%)      0 (0%)      50Mi (0%)        0 (0%)         93m
  openshift-monitoring                          node-exporter-xrl8m                                                  9m (0%)       0 (0%)      210Mi (1%)       0 (0%)         107m
  openshift-multus                              multus-admission-controller-f28b7                                    20m (0%)      0 (0%)      20Mi (0%)        0 (0%)         103m
  openshift-multus                              multus-j5ddj                                                         10m (0%)      0 (0%)      150Mi (1%)       0 (0%)         100m
  openshift-sdn                                 ovs-tx5nt                                                            100m (2%)     0 (0%)      400Mi (2%)       0 (0%)         104m
  openshift-sdn                                 sdn-controller-4rhr8                                                 10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         104m
  openshift-sdn                                 sdn-qwtdv                                                            100m (2%)     0 (0%)      200Mi (1%)       0 (0%)         105m
  openshift-service-catalog-apiserver           apiserver-b5qcl                                                      0 (0%)        0 (0%)      200Mi (1%)       0 (0%)         106m
  openshift-service-catalog-controller-manager  controller-manager-2n2cb                                             100m (2%)     0 (0%)      100Mi (0%)       0 (0%)         106m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                   Requests      Limits
  --------                   --------      ------
  cpu                        1679m (47%)   0 (0%)
  memory                     4544Mi (31%)  512Mi (3%)
  ephemeral-storage          0 (0%)        0 (0%)
  attachable-volumes-gce-pd  0             0
Events:
  Type    Reason              Age   From                                               Message
  ----    ------              ----  ----                                               -------
  Normal  NodeReady           3h6m  kubelet, ugd-ci-vbrqz-m-1.c.openshift-qe.internal  Node ugd-ci-vbrqz-m-1.c.openshift-qe.internal status is now: NodeReady
  Normal  NodeNotSchedulable  84m   kubelet, ugd-ci-vbrqz-m-1.c.openshift-qe.internal  Node ugd-ci-vbrqz-m-1.c.openshift-qe.internal status is now: NodeNotSchedulable
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


print detail msg for co(AVAILABLE != True or PROGRESSING!=False or version != target_version) if exist:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Abnormal co details==~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Name:         machine-config
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2020-04-25T16:41:21Z
  Generation:          1
  Resource Version:    122423
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/machine-config
  UID:                 3d94e739-5596-48eb-8f55-430dc87521a2
Spec:
Status:
  Conditions:
    Last Transition Time:  2020-04-25T18:31:01Z
    Message:               Cluster not available for 4.4.0-0.nightly-2020-04-25-061259
    Status:                False
    Type:                  Available
    Last Transition Time:  2020-04-25T18:10:13Z
    Message:               Working towards 4.4.0-0.nightly-2020-04-25-061259
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2020-04-25T18:31:01Z
    Message:               Unable to apply 4.4.0-0.nightly-2020-04-25-061259: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-3d8c18521e251ab2db1d44560d4bfe31 expected 5803275f05729eef2a5affc1ad437235c6981f68 has f6d1fe753cbcecb3aa1c2d3d3edd4a5d04ffca54, retrying
    Reason:                RequiredPoolsFailed
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2020-04-25T16:42:33Z
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
  Extension:
  Related Objects:
    Group:     
    Name:      openshift-machine-config-operator
    Resource:  namespaces
    Group:     machineconfiguration.openshift.io
    Name:      master
    Resource:  machineconfigpools
    Group:     machineconfiguration.openshift.io
    Name:      worker
    Resource:  machineconfigpools
    Group:     machineconfiguration.openshift.io
    Name:      machine-config-controller
    Resource:  controllerconfigs
  Versions:
    Name:     operator
    Version:  4.3.17
Events:       <none>
Name:         node-tuning
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2020-04-25T16:46:29Z
  Generation:          1
  Resource Version:    71777
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/node-tuning
  UID:                 06f0b18e-2356-4b72-920c-5e8889b7e20a
Spec:
Status:
  Conditions:
    Last Transition Time:  2020-04-25T18:16:08Z
    Message:               DaemonSet "tuned" has no available Pod(s).
    Reason:                TunedUnavailable
    Status:                False
    Type:                  Available
    Last Transition Time:  2020-04-25T18:16:08Z
    Message:               Working towards "4.4.0-0.nightly-2020-04-25-061259"
    Reason:                Reconciling
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2020-04-25T16:46:30Z
    Message:               DaemonSet "tuned" available
    Reason:                AsExpected
    Status:                False
    Type:                  Degraded
  Extension:               <nil>
  Related Objects:
    Group:      
    Name:       openshift-cluster-node-tuning-operator
    Resource:   namespaces
    Group:      tuned.openshift.io
    Name:       default
    Namespace:  openshift-cluster-node-tuning-operator
    Resource:   Tuned
    Group:      apps
    Name:       tuned
    Namespace:  openshift-cluster-node-tuning-operator
    Resource:   DaemonSet
  Versions:
    Name:     operator
    Version:  4.4.0-0.nightly-2020-04-25-061259
Events:       <none>
Name:         machine-config
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2020-04-25T16:41:21Z
  Generation:          1
  Resource Version:    122423
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/machine-config
  UID:                 3d94e739-5596-48eb-8f55-430dc87521a2
Spec:
Status:
  Conditions:
    Last Transition Time:  2020-04-25T18:31:01Z
    Message:               Cluster not available for 4.4.0-0.nightly-2020-04-25-061259
    Status:                False
    Type:                  Available
    Last Transition Time:  2020-04-25T18:10:13Z
    Message:               Working towards 4.4.0-0.nightly-2020-04-25-061259
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2020-04-25T18:31:01Z
    Message:               Unable to apply 4.4.0-0.nightly-2020-04-25-061259: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-3d8c18521e251ab2db1d44560d4bfe31 expected 5803275f05729eef2a5affc1ad437235c6981f68 has f6d1fe753cbcecb3aa1c2d3d3edd4a5d04ffca54, retrying
    Reason:                RequiredPoolsFailed
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2020-04-25T16:42:33Z
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
  Extension:
  Related Objects:
    Group:     
    Name:      openshift-machine-config-operator
    Resource:  namespaces
    Group:     machineconfiguration.openshift.io
    Name:      master
    Resource:  machineconfigpools
    Group:     machineconfiguration.openshift.io
    Name:      worker
    Resource:  machineconfigpools
    Group:     machineconfiguration.openshift.io
    Name:      machine-config-controller
    Resource:  controllerconfigs
  Versions:
    Name:     operator
    Version:  4.3.17

Actual results:
1. machine-config degraded and also openshift-apiserver is not successfully upgraded

Expected results:
1. upgrade successfully

Additional info:


Note You need to log in before you can comment on or make changes to this bug.