Description of problem: openshift-kube-storage-version-migrator-operator doesn't create migrator deployment that include toleration for masters https://github.com/openshift/cluster-kube-storage-version-migrator-operator/blob/release-4.5/bindata/kube-storage-version-migrator/deployment.yaml Although the operator it self have such toleration https://github.com/openshift/cluster-kube-storage-version-migrator-operator/blob/51972754a030b5e9ed9df617de276f5deaad5066/manifests/0000_40_kube-storage-version-migrator-operator_07_deployment.yaml#L63-L65 This means that if client is applying taints on worker nodes the Pods are failed to schedule.
*** Bug 1935347 has been marked as a duplicate of this bug. ***
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-03-16-221720 True False 151m Cluster version is 4.8.0-0.nightly-2021-03-16-221720 Check what master node kube-storage pods is running, $ oc get pod -A -o wide | grep kube-storage openshift-kube-storage-version-migrator-operator kube-storage-version-migrator-operator-564cdcc96c-xzprc 1/1 Running 0 4h7m 10.130.0.63 ip-10-0-176-115.us-east-2.compute.internal <none> <none> openshift-kube-storage-version-migrator migrator-8bdb5f65f-22prn 1/1 Running 0 4h7m 10.130.0.62 ip-10-0-176-115.us-east-2.compute.internal <none> <none> New deployment were applied to pods, $ oc describe pod -n openshift-kube-storage-version-migrator-operator kube-storage-version-migrator-operator-564cdcc96c-xzprc Name: kube-storage-version-migrator-operator-564cdcc96c-xzprc Namespace: openshift-kube-storage-version-migrator-operator ... Node-Selectors: node-role.kubernetes.io/master= Tolerations: node-role.kubernetes.io/master:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 120s node.kubernetes.io/unreachable:NoExecute op=Exists for 120s Events: <none> ---------- $ oc describe pod -n openshift-kube-storage-version-migrator migrator-8bdb5f65f-22prn Name: migrator-8bdb5f65f-22prn Namespace: openshift-kube-storage-version-migrator ... Node-Selectors: <none> Tolerations: node-role.kubernetes.io/master:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 120s node.kubernetes.io/unreachable:NoExecute op=Exists for 120s Events: <none> To stop kubelet service on the master node which the kube-srorage pods are located, $ oc debug node/ip-10-0-176-115.us-east-2.compute.internal Starting pod/ip-10-0-176-115us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.176.115 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# systemctl stop kubelet Removing debug pod ... About 2 minutes after the master which stopped kubelet service status is NotReady, kube-storage operator pods running on that master were changed to status Terminating and those pods were scheduled to other master. $ oc get no NAME STATUS ROLES AGE VERSION ip-10-0-140-229.us-east-2.compute.internal Ready master 5h11m v1.20.0+e1bc274 ip-10-0-157-114.us-east-2.compute.internal Ready worker 5h2m v1.20.0+e1bc274 ip-10-0-176-115.us-east-2.compute.internal NotReady master 5h6m v1.20.0+e1bc274 ip-10-0-185-50.us-east-2.compute.internal Ready worker 5h2m v1.20.0+e1bc274 ip-10-0-221-102.us-east-2.compute.internal Ready master 5h7m v1.20.0+e1bc274 $ date;echo;oc get pod -A -o wide | grep kube-storage Wed Mar 17 05:19:24 EDT 2021 openshift-kube-storage-version-migrator-operator kube-storage-version-migrator-operator-564cdcc96c-8hv7z 1/1 Running 0 31s 10.129.0.89 ip-10-0-221-102.us-east-2.compute.internal <none> <none> openshift-kube-storage-version-migrator-operator kube-storage-version-migrator-operator-564cdcc96c-xzprc 1/1 Terminating 0 4h42m 10.130.0.63 ip-10-0-176-115.us-east-2.compute.internal <none> <none> openshift-kube-storage-version-migrator migrator-8bdb5f65f-22prn 1/1 Terminating 0 4h42m 10.130.0.62 ip-10-0-176-115.us-east-2.compute.internal <none> <none> openshift-kube-storage-version-migrator migrator-8bdb5f65f-cq26s 1/1 Running 0 31s 10.128.2.155 ip-10-0-157-114.us-east-2.compute.internal <none> <none> From above, the results is as expected, so move the bug VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438