Description: MCO updates are stalled and 2 out of 3 master nodes are not getting the MCO updates during CNI migration from SDN to OVNKube. How reproducible: Encountered this issue with an older build sometime last week. Recreateable again with latest build. Steps to Reproduce: 1) Deploy OCP 4.7 cluster with latest build. ``` # oc version Client Version: 4.7.0-0.nightly-ppc64le-2021-01-18-024748 Server Version: 4.7.0-0.nightly-ppc64le-2021-01-18-024748 Kubernetes Version: v1.20.0+d9c52cc ``` 2) Follow steps from https://docs.openshift.com/container-platform/4.6/networking/ovn_kubernetes_network_provider/migrate-from-openshift-sdn.html to migrate the CNI from SDN to OVNKube. 3) All commands completed fine until step 7 i.e manual node reboot across all nodes of the cluster. 4) Enable MCO config pool updates as mentioned in step 8. The machine configpool updates for worker nodes completed fine. Issue 1: ======= Master-0 node remains in Scheduling disabled state due to the following error. ``` [root@arc-npv-ovn-bastion ~]# oc get nodes NAME STATUS ROLES AGE VERSION master-0 Ready,SchedulingDisabled master 3h11m v1.20.0+d9c52cc master-1 Ready master 3h11m v1.20.0+d9c52cc master-2 Ready master 3h11m v1.20.0+d9c52cc worker-0 Ready worker 175m v1.20.0+d9c52cc worker-1 Ready worker 174m v1.20.0+d9c52cc [root@arc-npv-ovn-bastion ~]# oc describe node master-0 Name: master-0 Roles: master Labels: beta.kubernetes.io/arch=ppc64le beta.kubernetes.io/os=linux kubernetes.io/arch=ppc64le kubernetes.io/hostname=master-0 kubernetes.io/os=linux node-role.kubernetes.io/master= node.openshift.io/os_id=rhcos Annotations: k8s.ovn.org/l3-gateway-config: {"default":{"mode":"local","mac-address":"0a:58:09:72:62:8c","ip-addresses":["9.114.98.140/22"],"ip-address":"9.114.98.140/22","next-hops"... k8s.ovn.org/node-chassis-id: 7661ce35-d4c2-4508-8ccd-2b17d26ce29d k8s.ovn.org/node-local-nat-ip: {"default":["169.254.9.224"]} k8s.ovn.org/node-mgmt-port-mac-address: d6:0e:04:f0:c3:d1 k8s.ovn.org/node-subnets: {"default":"10.131.0.0/23"} machineconfiguration.openshift.io/currentConfig: rendered-master-7fd61bf26aa8bc5527e461c69134d6e4 machineconfiguration.openshift.io/desiredConfig: rendered-master-8b39ed05a2ffacc7c762c92801ed688d machineconfiguration.openshift.io/reason: failed to drain node (5 tries): timed out waiting for the condition: error when evicting pod "etcd-quorum-guard-7db666dcff-p4sf7": global ... machineconfiguration.openshift.io/state: Degraded nfd.node.kubernetes.io/master.version: 1.15 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Mon, 18 Jan 2021 03:15:54 -0500 Taints: node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/unschedulable:NoSchedule Unschedulable: true Lease: HolderIdentity: master-0 AcquireTime: <unset> RenewTime: Mon, 18 Jan 2021 07:34:04 -0500 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Mon, 18 Jan 2021 07:32:12 -0500 Mon, 18 Jan 2021 05:31:28 -0500 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Mon, 18 Jan 2021 07:32:12 -0500 Mon, 18 Jan 2021 05:31:28 -0500 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Mon, 18 Jan 2021 07:32:12 -0500 Mon, 18 Jan 2021 05:31:28 -0500 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Mon, 18 Jan 2021 07:32:12 -0500 Mon, 18 Jan 2021 05:31:38 -0500 KubeletReady kubelet is posting ready status Addresses: InternalIP: 9.114.98.140 Hostname: master-0 Capacity: cpu: 8 ephemeral-storage: 125420524Ki hugepages-16Gi: 0 hugepages-16Mi: 0 memory: 33454272Ki pods: 250 Allocatable: cpu: 7500m ephemeral-storage: 114513812904 hugepages-16Gi: 0 hugepages-16Mi: 0 memory: 32303296Ki pods: 250 System Info: Machine ID: ebfd53285ff44128b92bf199f18b43f8 System UUID: IBM,0213C5C9W Boot ID: ee20a5b0-1730-4c1f-998c-9b814bebcb72 Kernel Version: 4.18.0-240.10.1.el8_3.ppc64le OS Image: Red Hat Enterprise Linux CoreOS 47.83.202101180112-0 (Ootpa) Operating System: linux Architecture: ppc64le Container Runtime Version: cri-o://1.20.0-0.rhaos4.7.gitd9f17c8.el8.42 Kubelet Version: v1.20.0+d9c52cc Kube-Proxy Version: v1.20.0+d9c52cc Non-terminated Pods: (21 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- openshift-cluster-node-tuning-operator tuned-756zl 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 4h15m openshift-controller-manager controller-manager-xblcw 100m (1%) 0 (0%) 100Mi (0%) 0 (0%) 3h48m openshift-dns dns-default-nlrz8 65m (0%) 0 (0%) 131Mi (0%) 0 (0%) 4h15m openshift-etcd etcd-master-0 430m (5%) 0 (0%) 860Mi (2%) 0 (0%) 4h7m openshift-etcd etcd-quorum-guard-7db666dcff-p4sf7 10m (0%) 0 (0%) 5Mi (0%) 0 (0%) 164m openshift-image-registry node-ca-d2zw8 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 4h10m openshift-kube-apiserver kube-apiserver-master-0 340m (4%) 0 (0%) 1224Mi (3%) 0 (0%) 3h40m openshift-kube-controller-manager kube-controller-manager-master-0 100m (1%) 0 (0%) 500Mi (1%) 0 (0%) 4h4m openshift-kube-scheduler openshift-kube-scheduler-master-0 25m (0%) 0 (0%) 150Mi (0%) 0 (0%) 4h8m openshift-machine-config-operator machine-config-daemon-5r99g 40m (0%) 0 (0%) 100Mi (0%) 0 (0%) 4h16m openshift-machine-config-operator machine-config-server-8bxgm 20m (0%) 0 (0%) 50Mi (0%) 0 (0%) 4h16m openshift-monitoring node-exporter-fxrzj 9m (0%) 0 (0%) 210Mi (0%) 0 (0%) 4h16m openshift-multus multus-admission-controller-79s2p 20m (0%) 0 (0%) 20Mi (0%) 0 (0%) 4h17m openshift-multus multus-tvrgp 10m (0%) 0 (0%) 150Mi (0%) 0 (0%) 133m openshift-multus network-metrics-daemon-qbxp6 20m (0%) 0 (0%) 120Mi (0%) 0 (0%) 4h18m openshift-network-diagnostics network-check-target-pb6zb 10m (0%) 0 (0%) 150Mi (0%) 0 (0%) 4h17m openshift-operators nfd-master-dc8wt 0 (0%) 0 (0%) 0 (0%) 0 (0%) 149m openshift-ovn-kubernetes ovnkube-master-49xgj 50m (0%) 0 (0%) 1220Mi (3%) 0 (0%) 133m openshift-ovn-kubernetes ovnkube-node-mc6q5 30m (0%) 0 (0%) 620Mi (1%) 0 (0%) 133m openshift-ovn-kubernetes ovs-node-nltmd 100m (1%) 0 (0%) 300Mi (0%) 0 (0%) 133m powervm-rmc powervm-rmc-pjl2z 100m (1%) 0 (0%) 500Mi (1%) 1Gi (3%) 3h48m Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 1499m (19%) 0 (0%) memory 6470Mi (20%) 1Gi (3%) ephemeral-storage 0 (0%) 0 (0%) hugepages-16Gi 0 (0%) 0 (0%) hugepages-16Mi 0 (0%) 0 (0%) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeHasNoDiskPressure 4h10m (x17 over 4h18m) kubelet Node master-0 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientMemory 4h8m (x18 over 4h18m) kubelet Node master-0 status is now: NodeHasSufficientMemory Normal NodeNotSchedulable 164m (x2 over 3h1m) kubelet Node master-0 status is now: NodeNotSchedulable Normal NodeSchedulable 163m (x2 over 3h) kubelet Node master-0 status is now: NodeSchedulable Normal NodeHasNoDiskPressure 122m (x2 over 122m) kubelet Node master-0 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientMemory 122m (x2 over 122m) kubelet Node master-0 status is now: NodeHasSufficientMemory Normal Starting 122m kubelet Starting kubelet. Normal NodeHasSufficientPID 122m (x2 over 122m) kubelet Node master-0 status is now: NodeHasSufficientPID Warning Rebooted 122m kubelet Node master-0 has been rebooted, boot id: ee20a5b0-1730-4c1f-998c-9b814bebcb72 Normal NodeNotReady 122m kubelet Node master-0 status is now: NodeNotReady Normal NodeAllocatableEnforced 122m kubelet Updated Node Allocatable limit across pods Normal NodeReady 122m kubelet Node master-0 status is now: NodeReady Normal NodeNotSchedulable 78m kubelet Node master-0 status is now: NodeNotSchedulable [root@arc-npv-ovn-bastion ~]# oc get pods -n openshift-etcd NAME READY STATUS RESTARTS AGE etcd-master-0 3/3 Running 0 4h7m etcd-master-1 0/3 Init:0/2 0 4h9m etcd-master-2 3/3 Running 0 4h6m etcd-quorum-guard-7db666dcff-m6kfd 1/1 Running 0 163m etcd-quorum-guard-7db666dcff-p4sf7 1/1 Running 0 165m etcd-quorum-guard-7db666dcff-tpmmx 0/1 ContainerCreating 0 162m revision-pruner-3-master-1 0/1 Completed 0 126m revision-pruner-3-master-2 0/1 Completed 0 162m ``` Work around tried to recover master-0: Manually kill the pod that failed to get evicted. After that, master-0 returned to Ready state and MCO updates were also pushed by the master machine config pool on master-0. ``` [root@arc-npv-ovn-bastion ~]# oc get nodes NAME STATUS ROLES AGE VERSION master-0 Ready master 4h35m v1.20.0+d9c52cc master-1 Ready master 4h35m v1.20.0+d9c52cc master-2 Ready master 4h34m v1.20.0+d9c52cc worker-0 Ready worker 4h18m v1.20.0+d9c52cc worker-1 Ready worker 4h17m v1.20.0+d9c52cc [root@arc-npv-ovn-bastion ~]# oc delete pod etcd-quorum-guard-7db666dcff-p4sf7 -n openshift-etcd pod "etcd-quorum-guard-7db666dcff-p4sf7" deleted ``` Expected results: ----------------- The etcd pod should have got evicted and master-0 should have been in Ready state. Issue2: ====== After waiting for few hours, the master machine config pool updates for master-1 and master-2 didn’t get triggered at all. ``` [root@arc-npv-ovn-bastion ~]# oc get machineconfigpool -n openshift-machine-config-operator NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-7fd61bf26aa8bc5527e461c69134d6e4 False True False 3 1 1 0 4h51m worker rendered-worker-d048988f4915c29580fcd159da4c91bf True False False 2 2 2 0 4h51m [root@arc-npv-ovn-bastion ~]# oc get pods -n openshift-machine-config-operator -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES machine-config-controller-76f78dc4bd-vq6wr 1/1 Running 2 3h22m 10.128.2.18 master-2 <none> <none> machine-config-daemon-29hmr 2/2 Running 0 4h42m 9.114.98.155 worker-0 <none> <none> machine-config-daemon-4x7fr 2/2 Running 0 4h57m 9.114.98.159 master-2 <none> <none> machine-config-daemon-5r99g 2/2 Running 0 4h57m 9.114.98.140 master-0 <none> <none> machine-config-daemon-6lqwq 2/2 Running 0 4h41m 9.114.98.151 worker-1 <none> <none> machine-config-daemon-vjl48 0/2 ContainerCreating 0 4h57m 9.114.98.146 master-1 <none> <none> machine-config-operator-869ffdf466-ktpfp 0/1 ContainerCreating 0 118m <none> master-1 <none> <none> machine-config-server-8bxgm 1/1 Running 0 4h57m 9.114.98.140 master-0 <none> <none> machine-config-server-rqqn2 0/1 ContainerCreating 0 4h57m 9.114.98.146 master-1 <none> <none> machine-config-server-wfmlk 1/1 Running 0 4h57m 9.114.98.159 master-2 <none> <none> [root@arc-npv-ovn-bastion ~]# oc describe pod machine-config-daemon-vjl48 -n openshift-machine-config-operator Name: machine-config-daemon-vjl48 Namespace: openshift-machine-config-operator Priority: 2000001000 Priority Class Name: system-node-critical Node: master-1/9.114.98.146 Start Time: Mon, 18 Jan 2021 03:17:12 -0500 Labels: controller-revision-hash=797bd6c55 k8s-app=machine-config-daemon pod-template-generation=1 Annotations: <none> Status: Pending IP: 9.114.98.146 IPs: IP: 9.114.98.146 Controlled By: DaemonSet/machine-config-daemon Containers: machine-config-daemon: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f063c5233971c0d46ec5f9af49b7329d7def6f5caee5fcea86de85e5aaba0a10 Image ID: Port: <none> Host Port: <none> Command: /usr/bin/machine-config-daemon Args: start State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 20m memory: 50Mi Environment: NODE_NAME: (v1:spec.nodeName) Mounts: /rootfs from rootfs (rw) /var/run/secrets/kubernetes.io/serviceaccount from machine-config-daemon-token-59f2j (ro) oauth-proxy: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:075ce49038d88ef3507d7212be71f57382ff15ad6d8fac4292556d89ad4626e2 Image ID: Port: 9001/TCP Host Port: 9001/TCP Args: --https-address=:9001 --provider=openshift --openshift-service-account=machine-config-daemon --upstream=http://127.0.0.1:8797 --tls-cert=/etc/tls/private/tls.crt --tls-key=/etc/tls/private/tls.key --cookie-secret-file=/etc/tls/cookie-secret/cookie-secret --openshift-sar={"resource": "namespaces", "verb": "get"} --openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}} State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 20m memory: 50Mi Environment: <none> Mounts: /etc/tls/cookie-secret from cookie-secret (rw) /etc/tls/private from proxy-tls (rw) /var/run/secrets/kubernetes.io/serviceaccount from machine-config-daemon-token-59f2j (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: rootfs: Type: HostPath (bare host directory volume) Path: / HostPathType: proxy-tls: Type: Secret (a volume populated by a Secret) SecretName: proxy-tls Optional: false cookie-secret: Type: Secret (a volume populated by a Secret) SecretName: cookie-secret Optional: false machine-config-daemon-token-59f2j: Type: Secret (a volume populated by a Secret) SecretName: machine-config-daemon-token-59f2j Optional: false QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: op=Exists Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 4h58m default-scheduler Successfully assigned openshift-machine-config-operator/machine-config-daemon-vjl48 to master-1 Warning FailedMount 4h58m (x6 over 4h58m) kubelet MountVolume.SetUp failed for volume "proxy-tls" : secret "proxy-tls" not found Normal Pulled 4h57m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f063c5233971c0d46ec5f9af49b7329d7def6f5caee5fcea86de85e5aaba0a10" already present on machine Normal Created 4h57m kubelet Created container machine-config-daemon Normal Started 4h57m kubelet Started container machine-config-daemon Normal Pulling 4h57m kubelet Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:075ce49038d88ef3507d7212be71f57382ff15ad6d8fac4292556d89ad4626e2" Normal Pulled 4h57m kubelet Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:075ce49038d88ef3507d7212be71f57382ff15ad6d8fac4292556d89ad4626e2" in 2.504461363s Normal Created 4h57m kubelet Created container oauth-proxy Normal Started 4h57m kubelet Started container oauth-proxy Warning NodeNotReady 4h50m node-controller Node is not ready Warning NodeNotReady 159m node-controller Node is not ready Warning FailedCreatePodSandBox 157m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to mount container k8s_POD_machine-config-daemon-vjl48_openshift-machine-config-operator_47baeded-e186-488b-9136-70e1c8306c44_0 in pod sandbox k8s_machine-config-daemon-vjl48_openshift-machine-config-operator_47baeded-e186-488b-9136-70e1c8306c44_0(be255fa971dec31b3490e38f3748432cc1b42bc694836e16b8d915fa52a39b9c): error recreating the missing symlinks: error reading name of symlink for &{"f67273f1e7595755a0261036e8a8368fc5dd6ed95c239e40d46f87f0413044dc" '\x14' %!q(os.FileMode=2147484096) {%!q(uint64=117472805) %!q(int64=63746554216) %!q(*time.Location=&{Local [{UTC 0 false}] [{-576460752303423488 0 false false}] UTC0 9223372036854775807 9223372036854775807 0xc00030ac20})} {'࠴' %!q(uint64=119537899) '\x03' '䇀' '\x00' '\x00' '\x00' '\x00' '\x14' '𐀀' '\x00' {%!q(int64=1610957416) %!q(int64=217472806)} {%!q(int64=1610957416) %!q(int64=117472805)} {%!q(int64=1610957416) %!q(int64=117472805)} '\x00' '\x00' '\x00'}}: open /var/lib/containers/storage/overlay/f67273f1e7595755a0261036e8a8368fc5dd6ed95c239e40d46f87f0413044dc/link: no such file or directory Warning FailedCreatePodSandBox 157m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to mount container k8s_POD_machine-config-daemon-vjl48_openshift-machine-config-operator_47baeded-e186-488b-9136-70e1c8306c44_0 in pod sandbox k8s_machine-config-daemon-vjl48_openshift-machine-config-operator_47baeded-e186-488b-9136-70e1c8306c44_0(bc82bc6526fad3aaa245b7cb2680d8d199b6179575769b7e5726683ba678eeef): error recreating the missing symlinks: error reading name of symlink for &{"f67273f1e7595755a0261036e8a8368fc5dd6ed95c239e40d46f87f0413044dc" '\x14' %!q(os.FileMode=2147484096) {%!q(uint64=117472805) %!q(int64=63746554216) %!q(*time.Location=&{Local [{UTC 0 false}] [{-576460752303423488 0 false false}] UTC0 9223372036854775807 9223372036854775807 0xc00030ac20})} {'࠴' %!q(uint64=119537899) '\x03' '䇀' '\x00' '\x00' '\x00' '\x00' '\x14' '𐀀' '\x00' {%!q(int64=1610957416) %!q(int64=217472806)} {%!q(int64=1610957416) %!q(int64=117472805)} {%!q(int64=1610957416) %!q(int64=117472805)} '\x00' '\x00' '\x00'}}: open /var/lib/containers/storage/overlay/f67273f1e7595755a0261036e8a8368fc5dd6ed95c239e40d46f87f0413044dc/link: no such file or directory Warning FailedCreatePodSandBox 156m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to mount container k8s_POD_machine-config-daemon-vjl48_openshift-machine-config-operator_47baeded-e186-488b-9136-70e1c8306c44_0 in pod sandbox k8s_machine-config-daemon-vjl48_openshift-machine-config-operator_47baeded-e186-488b-9136-70e1c8306c44_0(b6224ba02e630d5f77cd3075166c27633714f05159dcbd2e62a7f80b972b26c6): error recreating the missing symlinks: error reading name of symlink for &{"f67273f1e7595755a0261036e8a8368fc5dd6ed95c239e40d46f87f0413044dc" '\x14' %!q(os.FileMode=2147484096) {%!q(uint64=117472805) %!q(int64=63746554216) %!q(*time.Location=&{Local [{UTC 0 false}] [{-576460752303423488 0 false false}] UTC0 9223372036854775807 9223372036854775807 0xc00030ac20})} {'࠴' %!q(uint64=119537899) '\x03' '䇀' '\x00' '\x00' '\x00' '\x00' '\x14' '𐀀' '\x00' {%!q(int64=1610957416) %!q(int64=217472806)} {%!q(int64=1610957416) %!q(int64=117472805)} {%!q(int64=1610957416) %!q(int64=117472805)} '\x00' '\x00' '\x00'}}: open /var/lib/containers/storage/overlay/f67273f1e7595755a0261036e8a8368fc5dd6ed95c239e40d46f87f0413044dc/link: no such file or directory Warning FailedCreatePodSandBox 156m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to mount container k8s_POD_machine-config-daemon-vjl48_openshift-machine-config-operator_47baeded-e186-488b-9136-70e1c8306c44_0 in pod sandbox k8s_machine-config-daemon-vjl48_openshift-machine-config-operator_47baeded-&{"f67273f1e7595755a0261036e8a8368fc5dd6ed95c239e40d46f87f0413044dc" '\x14' %!q(os.FileMode=2147484096) {%!q(uint64=117472805) %!q(int64=63746554216) %!q(*time.Location=&{Local [{UTC 0 false}] [{-576460752303423488 0 false false}] UTC0 9223372036854775807 9223372036854775807 0xc00030ac20})} {'࠴' %!q(uint64=119537899) '\x03' '䇀' '\x00' '\x00' '\x00' '\x00' '\x14' '𐀀' '\x00' {%!q(int64=1610957416) %!q(int64=217472806)} {%!q(int64=1610957416) %!q(int64=117472805)} {%!q(int64=1610957416) %!q(int64=117472805)} '\x00' '\x00' '\x00'}}: open /var/lib/containers/storage/overlay/f67273f1e7595755a0261036e8a8368fc5dd6ed95c239e40d46f87f0413044dc/link: no such file or directory rendered-worker-d048988f4915c29580fcd159da4c91bf ``` Expected results : ----------------- Even if MCO update for 1 node is stuck, it should pick up other nodes until the error in the faulty node is fixed. Work around tried - Manually restart node master-1 and ran into issue 3 as mentioned below. Issue 3: ======= The master-1 node remains in NotReady state and the crio and kubelet services are not running. Even after manually trying to start these services, it is hung and doesn’t get started. ``` [core@master-1 ~]$ sudo su [root@master-1 core]# systemctl start crio ^C [core@master-1 ~]$ sudo su [root@master-1 core]# systemctl status kubelet ● kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-mco-default-env.conf, 20-logging.conf, 20-nodenet.conf Active: inactive (dead) [root@master-1 core]# systemctl start kubelet ^C ``` Expected result: --------------- The master-1 node should return to Ready state with manual reboot.
Additional info: # Adding node and mco related events output. It is clear from this data that the node drain does not happen on master-1 and master-2 for the machine config pool updates to get started. ``` #oc get events 5h9m Normal OperatorVersionChanged /machine-config clusteroperator/machine-config-operator started a version change from [] to [{operator 4.7.0-0.nightly-ppc64le-2021-01-18-024748}] 5h7m Normal OperatorVersionChanged /machine-config clusteroperator/machine-config-operator version changed from [] to [{operator 4.7.0-0.nightly-ppc64le-2021-01-18-024748}] 5h1m Normal NodeHasSufficientMemory node/master-0 Node master-0 status is now: NodeHasSufficientMemory 5h2m Normal NodeHasNoDiskPressure node/master-0 Node master-0 status is now: NodeHasNoDiskPressure 5h10m Normal RegisteredNode node/master-0 Node master-0 event: Registered Node master-0 in Controller 5h9m Normal Starting node/master-0 openshift-sdn done initializing node networking. 5h8m Normal NodeDone node/master-0 Setting node master-0, currentConfig rendered-master-0b5c44dc253e57776c4ead0f3bf7fc43 to Done 5h6m Normal RegisteredNode node/master-0 Node master-0 event: Registered Node master-0 in Controller 5h4m Normal RegisteredNode node/master-0 Node master-0 event: Registered Node master-0 in Controller 5h4m Normal NodeNotReady node/master-0 Node master-0 status is now: NodeNotReady 5h1m Normal RegisteredNode node/master-0 Node master-0 event: Registered Node master-0 in Controller 5h1m Normal NodeNotReady node/master-0 Node master-0 status is now: NodeNotReady 5h Normal RegisteredNode node/master-0 Node master-0 event: Registered Node master-0 in Controller 4h58m Normal RegisteredNode node/master-0 Node master-0 event: Registered Node master-0 in Controller 4h57m Normal RegisteredNode node/master-0 Node master-0 event: Registered Node master-0 in Controller 3h37m Normal Drain node/master-0 Draining node to update config. 3h37m Normal NodeNotSchedulable node/master-0 Node master-0 status is now: NodeNotSchedulable 3h35m Normal OSUpdateStarted node/master-0 3h35m Normal OSUpdateStaged node/master-0 Changes to OS staged 3h52m Normal PendingConfig node/master-0 Written pending config rendered-master-fbe0b855c426cca34f099204f8149f73 3h52m Normal SkipReboot node/master-0 Config changes do not require reboot. 3h52m Normal NodeDone node/master-0 Setting node master-0, currentConfig rendered-master-fbe0b855c426cca34f099204f8149f73 to Done 3h35m Normal NodeSchedulable node/master-0 Node master-0 status is now: NodeSchedulable 3h35m Normal PendingConfig node/master-0 Written pending config rendered-master-7fd61bf26aa8bc5527e461c69134d6e4 3h35m Normal SkipReboot node/master-0 Config changes do not require reboot. Service crio was reloaded. 3h35m Normal NodeDone node/master-0 Setting node master-0, currentConfig rendered-master-7fd61bf26aa8bc5527e461c69134d6e4 to Done 178m Normal RegisteredNode node/master-0 Node master-0 event: Registered Node master-0 in Controller 177m Normal NodeNotReady node/master-0 Node master-0 status is now: NodeNotReady 175m Normal Starting node/master-0 Starting kubelet. 175m Normal NodeHasSufficientMemory node/master-0 Node master-0 status is now: NodeHasSufficientMemory 175m Normal NodeHasNoDiskPressure node/master-0 Node master-0 status is now: NodeHasNoDiskPressure 175m Normal NodeHasSufficientPID node/master-0 Node master-0 status is now: NodeHasSufficientPID 175m Warning Rebooted node/master-0 Node master-0 has been rebooted, boot id: ee20a5b0-1730-4c1f-998c-9b814bebcb72 175m Normal NodeNotReady node/master-0 Node master-0 status is now: NodeNotReady 175m Normal NodeAllocatableEnforced node/master-0 Updated Node Allocatable limit across pods 174m Normal NodeReady node/master-0 Node master-0 status is now: NodeReady 170m Normal RegisteredNode node/master-0 Node master-0 event: Registered Node master-0 in Controller 160m Normal RegisteredNode node/master-0 Node master-0 event: Registered Node master-0 in Controller 59m Normal Drain node/master-0 Draining node to update config. 130m Normal NodeNotSchedulable node/master-0 Node master-0 status is now: NodeNotSchedulable 59m Warning FailedToDrain node/master-0 5 tries: error when evicting pod "etcd-quorum-guard-7db666dcff-p4sf7": global timeout reached: 1m30s 49m Normal OSUpdateStarted node/master-0 49m Normal OSUpdateStaged node/master-0 Changes to OS staged 49m Normal PendingConfig node/master-0 Written pending config rendered-master-8b39ed05a2ffacc7c762c92801ed688d 49m Normal Reboot node/master-0 Node will reboot into config rendered-master-8b39ed05a2ffacc7c762c92801ed688d 44m Normal Starting node/master-0 Starting kubelet. 44m Normal NodeHasSufficientMemory node/master-0 Node master-0 status is now: NodeHasSufficientMemory 44m Normal NodeHasNoDiskPressure node/master-0 Node master-0 status is now: NodeHasNoDiskPressure 44m Normal NodeHasSufficientPID node/master-0 Node master-0 status is now: NodeHasSufficientPID 44m Normal NodeAllocatableEnforced node/master-0 Updated Node Allocatable limit across pods 43m Normal RegisteredNode node/master-0 Node master-0 event: Registered Node master-0 in Controller 42m Normal NodeDone node/master-0 Setting node master-0, currentConfig rendered-master-8b39ed05a2ffacc7c762c92801ed688d to Done 40m Normal RegisteredNode node/master-0 Node master-0 event: Registered Node master-0 in Controller 5h1m Normal NodeHasSufficientMemory node/master-1 Node master-1 status is now: NodeHasSufficientMemory 5h1m Normal NodeHasNoDiskPressure node/master-1 Node master-1 status is now: NodeHasNoDiskPressure 5h10m Normal RegisteredNode node/master-1 Node master-1 event: Registered Node master-1 in Controller 5h9m Normal Starting node/master-1 openshift-sdn done initializing node networking. 5h8m Normal NodeDone node/master-1 Setting node master-1, currentConfig rendered-master-0b5c44dc253e57776c4ead0f3bf7fc43 to Done 5h6m Normal RegisteredNode node/master-1 Node master-1 event: Registered Node master-1 in Controller 5h4m Normal RegisteredNode node/master-1 Node master-1 event: Registered Node master-1 in Controller 5h1m Normal RegisteredNode node/master-1 Node master-1 event: Registered Node master-1 in Controller 5h1m Normal NodeNotReady node/master-1 Node master-1 status is now: NodeNotReady 5h Normal RegisteredNode node/master-1 Node master-1 event: Registered Node master-1 in Controller 4h58m Normal RegisteredNode node/master-1 Node master-1 event: Registered Node master-1 in Controller 4h57m Normal RegisteredNode node/master-1 Node master-1 event: Registered Node master-1 in Controller 3h34m Normal Drain node/master-1 Draining node to update config. 3h34m Normal NodeNotSchedulable node/master-1 Node master-1 status is now: NodeNotSchedulable 3h30m Normal OSUpdateStarted node/master-1 3h30m Normal OSUpdateStaged node/master-1 Changes to OS staged 3h46m Normal PendingConfig node/master-1 Written pending config rendered-master-fbe0b855c426cca34f099204f8149f73 3h46m Normal SkipReboot node/master-1 Config changes do not require reboot. 3h46m Normal NodeDone node/master-1 Setting node master-1, currentConfig rendered-master-fbe0b855c426cca34f099204f8149f73 to Done 3h30m Normal NodeSchedulable node/master-1 Node master-1 status is now: NodeSchedulable 3h30m Normal PendingConfig node/master-1 Written pending config rendered-master-7fd61bf26aa8bc5527e461c69134d6e4 3h30m Normal SkipReboot node/master-1 Config changes do not require reboot. Service crio was reloaded. 3h30m Normal NodeDone node/master-1 Setting node master-1, currentConfig rendered-master-7fd61bf26aa8bc5527e461c69134d6e4 to Done 178m Normal RegisteredNode node/master-1 Node master-1 event: Registered Node master-1 in Controller 170m Normal RegisteredNode node/master-1 Node master-1 event: Registered Node master-1 in Controller 170m Normal NodeNotReady node/master-1 Node master-1 status is now: NodeNotReady 168m Normal Starting node/master-1 Starting kubelet. 168m Normal NodeHasSufficientMemory node/master-1 Node master-1 status is now: NodeHasSufficientMemory 168m Normal NodeHasNoDiskPressure node/master-1 Node master-1 status is now: NodeHasNoDiskPressure 168m Normal NodeHasSufficientPID node/master-1 Node master-1 status is now: NodeHasSufficientPID 168m Warning Rebooted node/master-1 Node master-1 has been rebooted, boot id: 4315fec9-3dc0-4bae-bcb7-8e882e3ad211 168m Normal NodeNotReady node/master-1 Node master-1 status is now: NodeNotReady 168m Normal NodeAllocatableEnforced node/master-1 Updated Node Allocatable limit across pods 167m Normal NodeReady node/master-1 Node master-1 status is now: NodeReady 160m Normal RegisteredNode node/master-1 Node master-1 event: Registered Node master-1 in Controller 43m Normal RegisteredNode node/master-1 Node master-1 event: Registered Node master-1 in Controller 40m Normal RegisteredNode node/master-1 Node master-1 event: Registered Node master-1 in Controller 5h9m Normal RegisteredNode node/master-2 Node master-2 event: Registered Node master-2 in Controller 5h9m Normal Starting node/master-2 openshift-sdn done initializing node networking. 5h8m Normal NodeDone node/master-2 Setting node master-2, currentConfig rendered-master-0b5c44dc253e57776c4ead0f3bf7fc43 to Done 5h6m Normal RegisteredNode node/master-2 Node master-2 event: Registered Node master-2 in Controller 5h4m Normal RegisteredNode node/master-2 Node master-2 event: Registered Node master-2 in Controller 5h1m Normal RegisteredNode node/master-2 Node master-2 event: Registered Node master-2 in Controller 5h Normal RegisteredNode node/master-2 Node master-2 event: Registered Node master-2 in Controller 4h58m Normal RegisteredNode node/master-2 Node master-2 event: Registered Node master-2 in Controller 4h57m Normal RegisteredNode node/master-2 Node master-2 event: Registered Node master-2 in Controller 3h35m Normal Drain node/master-2 Draining node to update config. 3h35m Normal NodeNotSchedulable node/master-2 Node master-2 status is now: NodeNotSchedulable 3h34m Normal OSUpdateStarted node/master-2 3h34m Normal OSUpdateStaged node/master-2 Changes to OS staged 3h45m Normal PendingConfig node/master-2 Written pending config rendered-master-fbe0b855c426cca34f099204f8149f73 3h45m Normal SkipReboot node/master-2 Config changes do not require reboot. 3h45m Normal NodeDone node/master-2 Setting node master-2, currentConfig rendered-master-fbe0b855c426cca34f099204f8149f73 to Done 3h34m Normal NodeSchedulable node/master-2 Node master-2 status is now: NodeSchedulable 3h34m Normal PendingConfig node/master-2 Written pending config rendered-master-7fd61bf26aa8bc5527e461c69134d6e4 3h34m Normal SkipReboot node/master-2 Config changes do not require reboot. Service crio was reloaded. 3h34m Normal NodeDone node/master-2 Setting node master-2, currentConfig rendered-master-7fd61bf26aa8bc5527e461c69134d6e4 to Done 178m Normal RegisteredNode node/master-2 Node master-2 event: Registered Node master-2 in Controller 170m Normal RegisteredNode node/master-2 Node master-2 event: Registered Node master-2 in Controller 160m Normal Starting node/master-2 Starting kubelet. 160m Normal NodeHasSufficientMemory node/master-2 Node master-2 status is now: NodeHasSufficientMemory 160m Normal NodeHasNoDiskPressure node/master-2 Node master-2 status is now: NodeHasNoDiskPressure 160m Normal NodeHasSufficientPID node/master-2 Node master-2 status is now: NodeHasSufficientPID 160m Normal NodeAllocatableEnforced node/master-2 Updated Node Allocatable limit across pods 160m Normal RegisteredNode node/master-2 Node master-2 event: Registered Node master-2 in Controller 43m Normal RegisteredNode node/master-2 Node master-2 event: Registered Node master-2 in Controller 40m Normal RegisteredNode node/master-2 Node master-2 event: Registered Node master-2 in Controller 5h8m Normal AnnotationChange machineconfigpool/master Node master-2 now has machineconfiguration.openshift.io/state=Degraded 5h8m Normal AnnotationChange machineconfigpool/master Node master-1 now has machineconfiguration.openshift.io/state=Done 5h8m Normal AnnotationChange machineconfigpool/master Node master-2 now has machineconfiguration.openshift.io/state=Done 5h8m Normal AnnotationChange machineconfigpool/master Node master-0 now has machineconfiguration.openshift.io/state=Done 3h53m Normal SetDesiredConfig machineconfigpool/master Targeted node master-0 to config rendered-master-fbe0b855c426cca34f099204f8149f73 3h53m Normal AnnotationChange machineconfigpool/master Node master-0 now has machineconfiguration.openshift.io/desiredConfig=rendered-master-fbe0b855c426cca34f099204f8149f73 3h53m Normal AnnotationChange machineconfigpool/master Node master-0 now has machineconfiguration.openshift.io/state=Working 3h52m Normal SetDesiredConfig machineconfigpool/master Targeted node master-1 to config rendered-master-fbe0b855c426cca34f099204f8149f73 3h52m Normal AnnotationChange machineconfigpool/master Node master-1 now has machineconfiguration.openshift.io/desiredConfig=rendered-master-fbe0b855c426cca34f099204f8149f73 3h52m Normal AnnotationChange machineconfigpool/master Node master-1 now has machineconfiguration.openshift.io/state=Working 3h46m Normal SetDesiredConfig machineconfigpool/master Targeted node master-2 to config rendered-master-fbe0b855c426cca34f099204f8149f73 3h46m Normal AnnotationChange machineconfigpool/master Node master-2 now has machineconfiguration.openshift.io/desiredConfig=rendered-master-fbe0b855c426cca34f099204f8149f73 3h46m Normal AnnotationChange machineconfigpool/master Node master-2 now has machineconfiguration.openshift.io/state=Working 3h37m Normal SetDesiredConfig machineconfigpool/master Targeted node master-0 to config rendered-master-7fd61bf26aa8bc5527e461c69134d6e4 3h37m Normal AnnotationChange machineconfigpool/master Node master-0 now has machineconfiguration.openshift.io/desiredConfig=rendered-master-7fd61bf26aa8bc5527e461c69134d6e4 3h37m Normal AnnotationChange machineconfigpool/master Node master-0 now has machineconfiguration.openshift.io/state=Working 3h35m Normal SetDesiredConfig machineconfigpool/master Targeted node master-2 to config rendered-master-7fd61bf26aa8bc5527e461c69134d6e4 3h35m Normal AnnotationChange machineconfigpool/master Node master-2 now has machineconfiguration.openshift.io/desiredConfig=rendered-master-7fd61bf26aa8bc5527e461c69134d6e4 3h35m Normal AnnotationChange machineconfigpool/master Node master-2 now has machineconfiguration.openshift.io/state=Working 3h34m Normal SetDesiredConfig machineconfigpool/master Targeted node master-1 to config rendered-master-7fd61bf26aa8bc5527e461c69134d6e4 3h34m Normal AnnotationChange machineconfigpool/master Node master-1 now has machineconfiguration.openshift.io/desiredConfig=rendered-master-7fd61bf26aa8bc5527e461c69134d6e4 3h34m Normal AnnotationChange machineconfigpool/master Node master-1 now has machineconfiguration.openshift.io/state=Working 130m Normal SetDesiredConfig machineconfigpool/master Targeted node master-0 to config rendered-master-8b39ed05a2ffacc7c762c92801ed688d 130m Normal AnnotationChange machineconfigpool/master Node master-0 now has machineconfiguration.openshift.io/desiredConfig=rendered-master-8b39ed05a2ffacc7c762c92801ed688d 130m Normal AnnotationChange machineconfigpool/master Node master-0 now has machineconfiguration.openshift.io/state=Working 120m Normal AnnotationChange machineconfigpool/master Node master-0 now has machineconfiguration.openshift.io/state=Degraded 36m Normal SetDesiredConfig machineconfigpool/master Targeted node master-1 to config rendered-master-8b39ed05a2ffacc7c762c92801ed688d 36m Normal AnnotationChange machineconfigpool/master Node master-1 now has machineconfiguration.openshift.io/desiredConfig=rendered-master-8b39ed05a2ffacc7c762c92801ed688d 4h54m Normal RegisteredNode node/worker-0 Node worker-0 event: Registered Node worker-0 in Controller 4h53m Normal Starting node/worker-0 openshift-sdn done initializing node networking. 4h53m Normal NodeDone node/worker-0 Setting node worker-0, currentConfig rendered-worker-aa2999cca8e237b2a24cf4c1d5123a72 to Done 3h37m Normal Drain node/worker-0 Draining node to update config. 3h37m Normal NodeNotSchedulable node/worker-0 Node worker-0 status is now: NodeNotSchedulable 3h35m Normal OSUpdateStarted node/worker-0 3h35m Normal OSUpdateStaged node/worker-0 Changes to OS staged 3h52m Normal PendingConfig node/worker-0 Written pending config rendered-worker-53036b57fbb35b65691bd0423ad209ef 3h52m Normal SkipReboot node/worker-0 Config changes do not require reboot. 3h52m Normal NodeDone node/worker-0 Setting node worker-0, currentConfig rendered-worker-53036b57fbb35b65691bd0423ad209ef to Done 3h35m Normal NodeSchedulable node/worker-0 Node worker-0 status is now: NodeSchedulable 3h35m Normal PendingConfig node/worker-0 Written pending config rendered-worker-b9fd2121252e22fbaa0bcdd29b67f5eb 3h35m Normal SkipReboot node/worker-0 Config changes do not require reboot. Service crio was reloaded. 3h35m Normal NodeDone node/worker-0 Setting node worker-0, currentConfig rendered-worker-b9fd2121252e22fbaa0bcdd29b67f5eb to Done 178m Normal RegisteredNode node/worker-0 Node worker-0 event: Registered Node worker-0 in Controller 170m Normal RegisteredNode node/worker-0 Node worker-0 event: Registered Node worker-0 in Controller 160m Normal RegisteredNode node/worker-0 Node worker-0 event: Registered Node worker-0 in Controller 156m Normal NodeNotReady node/worker-0 Node worker-0 status is now: NodeNotReady 153m Normal Starting node/worker-0 Starting kubelet. 153m Normal NodeHasSufficientMemory node/worker-0 Node worker-0 status is now: NodeHasSufficientMemory 153m Normal NodeHasNoDiskPressure node/worker-0 Node worker-0 status is now: NodeHasNoDiskPressure 153m Normal NodeHasSufficientPID node/worker-0 Node worker-0 status is now: NodeHasSufficientPID 153m Warning Rebooted node/worker-0 Node worker-0 has been rebooted, boot id: 447c8830-4cb8-485f-a958-fc64f3ad36e6 153m Normal NodeNotReady node/worker-0 Node worker-0 status is now: NodeNotReady 153m Normal NodeAllocatableEnforced node/worker-0 Updated Node Allocatable limit across pods 153m Normal NodeReady node/worker-0 Node worker-0 status is now: NodeReady 43m Normal RegisteredNode node/worker-0 Node worker-0 event: Registered Node worker-0 in Controller 40m Normal RegisteredNode node/worker-0 Node worker-0 event: Registered Node worker-0 in Controller 36m Normal Drain node/worker-0 Draining node to update config. 36m Normal NodeNotSchedulable node/worker-0 Node worker-0 status is now: NodeNotSchedulable 34m Normal OSUpdateStarted node/worker-0 34m Normal OSUpdateStaged node/worker-0 Changes to OS staged 34m Normal PendingConfig node/worker-0 Written pending config rendered-worker-d048988f4915c29580fcd159da4c91bf 34m Normal Reboot node/worker-0 Node will reboot into config rendered-worker-d048988f4915c29580fcd159da4c91bf 33m Normal NodeNotReady node/worker-0 Node worker-0 status is now: NodeNotReady 28m Normal Starting node/worker-0 Starting kubelet. 28m Normal NodeHasSufficientMemory node/worker-0 Node worker-0 status is now: NodeHasSufficientMemory 28m Normal NodeHasNoDiskPressure node/worker-0 Node worker-0 status is now: NodeHasNoDiskPressure 28m Normal NodeHasSufficientPID node/worker-0 Node worker-0 status is now: NodeHasSufficientPID 28m Warning Rebooted node/worker-0 Node worker-0 has been rebooted, boot id: becd0304-0c9a-4028-80ca-b655f21b960c 28m Normal NodeNotReady node/worker-0 Node worker-0 status is now: NodeNotReady 28m Normal NodeNotSchedulable node/worker-0 Node worker-0 status is now: NodeNotSchedulable 28m Normal NodeAllocatableEnforced node/worker-0 Updated Node Allocatable limit across pods 28m Normal NodeReady node/worker-0 Node worker-0 status is now: NodeReady 27m Normal NodeDone node/worker-0 Setting node worker-0, currentConfig rendered-worker-d048988f4915c29580fcd159da4c91bf to Done 27m Normal NodeSchedulable node/worker-0 Node worker-0 status is now: NodeSchedulable 4h53m Normal RegisteredNode node/worker-1 Node worker-1 event: Registered Node worker-1 in Controller 4h52m Normal Starting node/worker-1 openshift-sdn done initializing node networking. 4h52m Normal NodeDone node/worker-1 Setting node worker-1, currentConfig rendered-worker-aa2999cca8e237b2a24cf4c1d5123a72 to Done 3h35m Normal Drain node/worker-1 Draining node to update config. 3h35m Normal NodeNotSchedulable node/worker-1 Node worker-1 status is now: NodeNotSchedulable 3h29m Normal OSUpdateStarted node/worker-1 3h29m Normal OSUpdateStaged node/worker-1 Changes to OS staged 3h46m Normal PendingConfig node/worker-1 Written pending config rendered-worker-53036b57fbb35b65691bd0423ad209ef 3h46m Normal SkipReboot node/worker-1 Config changes do not require reboot. 3h46m Normal NodeDone node/worker-1 Setting node worker-1, currentConfig rendered-worker-53036b57fbb35b65691bd0423ad209ef to Done 3h29m Normal NodeSchedulable node/worker-1 Node worker-1 status is now: NodeSchedulable 3h29m Normal PendingConfig node/worker-1 Written pending config rendered-worker-b9fd2121252e22fbaa0bcdd29b67f5eb 3h29m Normal SkipReboot node/worker-1 Config changes do not require reboot. Service crio was reloaded. 3h29m Normal NodeDone node/worker-1 Setting node worker-1, currentConfig rendered-worker-b9fd2121252e22fbaa0bcdd29b67f5eb to Done 178m Normal RegisteredNode node/worker-1 Node worker-1 event: Registered Node worker-1 in Controller 170m Normal RegisteredNode node/worker-1 Node worker-1 event: Registered Node worker-1 in Controller 160m Normal RegisteredNode node/worker-1 Node worker-1 event: Registered Node worker-1 in Controller 149m Normal NodeNotReady node/worker-1 Node worker-1 status is now: NodeNotReady 146m Normal Starting node/worker-1 Starting kubelet. 146m Normal NodeHasSufficientMemory node/worker-1 Node worker-1 status is now: NodeHasSufficientMemory 146m Normal NodeHasNoDiskPressure node/worker-1 Node worker-1 status is now: NodeHasNoDiskPressure 146m Normal NodeHasSufficientPID node/worker-1 Node worker-1 status is now: NodeHasSufficientPID 146m Warning Rebooted node/worker-1 Node worker-1 has been rebooted, boot id: 385523b8-f1c6-4c98-aff7-6016ab9d6bc0 146m Normal NodeNotReady node/worker-1 Node worker-1 status is now: NodeNotReady 146m Normal NodeAllocatableEnforced node/worker-1 Updated Node Allocatable limit across pods 146m Normal NodeReady node/worker-1 Node worker-1 status is now: NodeReady 43m Normal RegisteredNode node/worker-1 Node worker-1 event: Registered Node worker-1 in Controller 40m Normal RegisteredNode node/worker-1 Node worker-1 event: Registered Node worker-1 in Controller 27m Normal Drain node/worker-1 Draining node to update config. 27m Normal NodeNotSchedulable node/worker-1 Node worker-1 status is now: NodeNotSchedulable 25m Normal OSUpdateStarted node/worker-1 25m Normal OSUpdateStaged node/worker-1 Changes to OS staged 25m Normal PendingConfig node/worker-1 Written pending config rendered-worker-d048988f4915c29580fcd159da4c91bf 25m Normal Reboot node/worker-1 Node will reboot into config rendered-worker-d048988f4915c29580fcd159da4c91bf 25m Normal NodeNotReady node/worker-1 Node worker-1 status is now: NodeNotReady 20m Normal Starting node/worker-1 Starting kubelet. 20m Normal NodeHasSufficientMemory node/worker-1 Node worker-1 status is now: NodeHasSufficientMemory 20m Normal NodeHasNoDiskPressure node/worker-1 Node worker-1 status is now: NodeHasNoDiskPressure 20m Normal NodeHasSufficientPID node/worker-1 Node worker-1 status is now: NodeHasSufficientPID 20m Warning Rebooted node/worker-1 Node worker-1 has been rebooted, boot id: 16df475e-a341-4e96-ae65-af47294868ac 20m Normal NodeNotReady node/worker-1 Node worker-1 status is now: NodeNotReady 20m Normal NodeNotSchedulable node/worker-1 Node worker-1 status is now: NodeNotSchedulable 20m Normal NodeAllocatableEnforced node/worker-1 Updated Node Allocatable limit across pods 20m Normal NodeReady node/worker-1 Node worker-1 status is now: NodeReady 19m Normal NodeDone node/worker-1 Setting node worker-1, currentConfig rendered-worker-d048988f4915c29580fcd159da4c91bf to Done 19m Normal NodeSchedulable node/worker-1 Node worker-1 status is now: NodeSchedulable 3h53m Normal SetDesiredConfig machineconfigpool/worker Targeted node worker-0 to config rendered-worker-53036b57fbb35b65691bd0423ad209ef 3h52m Normal SetDesiredConfig machineconfigpool/worker Targeted node worker-1 to config rendered-worker-53036b57fbb35b65691bd0423ad209ef 3h37m Normal SetDesiredConfig machineconfigpool/worker Targeted node worker-0 to config rendered-worker-b9fd2121252e22fbaa0bcdd29b67f5eb 3h35m Normal SetDesiredConfig machineconfigpool/worker Targeted node worker-1 to config rendered-worker-b9fd2121252e22fbaa0bcdd29b67f5eb 36m Normal SetDesiredConfig machineconfigpool/worker Targeted node worker-0 to config rendered-worker-d048988f4915c29580fcd159da4c91bf 27m Normal SetDesiredConfig machineconfigpool/worker Targeted node worker-1 to config ```
Please add a must gather from this cluster.
> Even if MCO update for 1 node is stuck, it should pick up other nodes until the error in the faulty node is fixed. This is false.
Sorry hit send too soon. We don't keep rolling out to other nodes for safety. But I'd like to see a must gather to get to the bottom of what's happening as there seem to be a few errors in the info pasted above and more detailed logs in the must gather will help us figure it out.
pushed to 4.8, as we will support UPI clusters then.
Since the MCO updates got stuck mid way, cluster is unhealthy and must-gather fails to complete as shown below. ``` [root@arc-npv-ovn-bastion ~]# oc adm must-gather [must-gather ] OUT Using must-gather plug-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b990e9178c45dd579115ef7f51b4bbfb79f1fa8c6bde525c1ed0b9718fdf39f7 [must-gather ] OUT namespace/openshift-must-gather-f4vqb created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-bjt7q created [must-gather ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b990e9178c45dd579115ef7f51b4bbfb79f1fa8c6bde525c1ed0b9718fdf39f7 created [must-gather-mgsrd] OUT gather logs unavailable: Get "https://9.114.98.140:10250/containerLogs/openshift-must-gather-f4vqb/must-gather-mgsrd/gather?follow=true": x509: certificate signed by unknown authority [must-gather-mgsrd] OUT waiting for gather to complete [must-gather-mgsrd] OUT downloading gather output WARNING: cannot use rsync: rsync not available in container WARNING: cannot use tar: tar not available in container [must-gather-mgsrd] OUT gather output not downloaded: No available strategies to copy. [must-gather-mgsrd] OUT [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-bjt7q deleted [must-gather ] OUT namespace/openshift-must-gather-f4vqb deleted error: unable to download output from pod must-gather-mgsrd: No available strategies to copy. ``` Please let me know if there is any specific log or command output you want me to pick up.
@pliu regarding your comment above "pushed to 4.8, as we will support UPI clusters then.", is this now in plan for 4.8?
(In reply to lmcfadde from comment #8) > @pliu regarding your comment above "pushed to 4.8, as we will > support UPI clusters then.", is this now in plan for 4.8? Yes, I'm working on it.
Hi @Peng, since the target release for this bug is 4.8 and this bug and BZ 1937594 are preventing the regression testing of OVNKube from completion, should we set the "Blocker?" flag to "Blocker+" for this bug instead for 4.8?
@pliu will the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1937594 also fix this BZ and is this still considered a priority?
Yes, I think both BZ can be fixed after the PR merged.
Please follow the new migration procedure https://github.com/openshift/openshift-docs/pull/31089
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438