Bug 1881938 - migrator deployment doesn't tolerate masters
Summary: migrator deployment doesn't tolerate masters
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-storage-version-migrator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.8.0
Assignee: Luis Sanchez
QA Contact: Ke Wang
URL:
Whiteboard:
: 1935347 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-23 12:38 UTC by Raif Ahmed
Modified: 2024-12-20 19:16 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:33:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-storage-version-migrator-operator pull 31 0 None open Bug 1881938: migrator deployment doesn't tolerate masters 2021-02-10 20:49:07 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:34:13 UTC

Description Raif Ahmed 2020-09-23 12:38:30 UTC
Description of problem:

openshift-kube-storage-version-migrator-operator doesn't create migrator deployment that include toleration for masters

https://github.com/openshift/cluster-kube-storage-version-migrator-operator/blob/release-4.5/bindata/kube-storage-version-migrator/deployment.yaml

Although the operator it self have such toleration 

https://github.com/openshift/cluster-kube-storage-version-migrator-operator/blob/51972754a030b5e9ed9df617de276f5deaad5066/manifests/0000_40_kube-storage-version-migrator-operator_07_deployment.yaml#L63-L65

This means that if client is applying taints on worker nodes the Pods are failed to schedule.

Comment 2 W. Trevor King 2021-03-15 22:28:01 UTC
*** Bug 1935347 has been marked as a duplicate of this bug. ***

Comment 3 W. Trevor King 2021-03-15 23:46:15 UTC
*** Bug 1935347 has been marked as a duplicate of this bug. ***

Comment 4 Ke Wang 2021-03-17 10:46:47 UTC
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-03-16-221720   True        False         151m    Cluster version is 4.8.0-0.nightly-2021-03-16-221720

Check what master node kube-storage pods is running,
$ oc get pod -A -o wide | grep kube-storage
openshift-kube-storage-version-migrator-operator   kube-storage-version-migrator-operator-564cdcc96c-xzprc               1/1     Running       0          4h7m    10.130.0.63    ip-10-0-176-115.us-east-2.compute.internal   <none>           <none>
openshift-kube-storage-version-migrator            migrator-8bdb5f65f-22prn                                              1/1     Running       0          4h7m    10.130.0.62    ip-10-0-176-115.us-east-2.compute.internal   <none>           <none>

New deployment were applied to pods,
$ oc describe pod -n openshift-kube-storage-version-migrator-operator kube-storage-version-migrator-operator-564cdcc96c-xzprc
Name:                 kube-storage-version-migrator-operator-564cdcc96c-xzprc
Namespace:            openshift-kube-storage-version-migrator-operator
...
Node-Selectors:  node-role.kubernetes.io/master=
Tolerations:     node-role.kubernetes.io/master:NoSchedule op=Exists
                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 120s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 120s
Events:          <none>


----------

$ oc describe pod -n openshift-kube-storage-version-migrator migrator-8bdb5f65f-22prn
Name:         migrator-8bdb5f65f-22prn
Namespace:    openshift-kube-storage-version-migrator
...
Node-Selectors:  <none>
Tolerations:     node-role.kubernetes.io/master:NoSchedule op=Exists
                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 120s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 120s
Events:          <none>

To stop kubelet service on the master node which the kube-srorage pods are located,
$ oc debug node/ip-10-0-176-115.us-east-2.compute.internal
Starting pod/ip-10-0-176-115us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.176.115
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# systemctl stop kubelet

Removing debug pod ...

About 2 minutes after the master which stopped kubelet service status is NotReady, kube-storage operator pods running on that master were changed to status Terminating and those pods were scheduled to other master.

$ oc get no
NAME                                         STATUS     ROLES    AGE     VERSION
ip-10-0-140-229.us-east-2.compute.internal   Ready      master   5h11m   v1.20.0+e1bc274
ip-10-0-157-114.us-east-2.compute.internal   Ready      worker   5h2m    v1.20.0+e1bc274
ip-10-0-176-115.us-east-2.compute.internal   NotReady   master   5h6m    v1.20.0+e1bc274
ip-10-0-185-50.us-east-2.compute.internal    Ready      worker   5h2m    v1.20.0+e1bc274
ip-10-0-221-102.us-east-2.compute.internal   Ready      master   5h7m    v1.20.0+e1bc274

$ date;echo;oc get pod -A -o wide | grep kube-storage
Wed Mar 17 05:19:24 EDT 2021

openshift-kube-storage-version-migrator-operator   kube-storage-version-migrator-operator-564cdcc96c-8hv7z               1/1     Running       0          31s     10.129.0.89    ip-10-0-221-102.us-east-2.compute.internal   <none>           <none>
openshift-kube-storage-version-migrator-operator   kube-storage-version-migrator-operator-564cdcc96c-xzprc               1/1     Terminating   0          4h42m   10.130.0.63    ip-10-0-176-115.us-east-2.compute.internal   <none>           <none>
openshift-kube-storage-version-migrator            migrator-8bdb5f65f-22prn                                              1/1     Terminating   0          4h42m   10.130.0.62    ip-10-0-176-115.us-east-2.compute.internal   <none>           <none>
openshift-kube-storage-version-migrator            migrator-8bdb5f65f-cq26s                                              1/1     Running       0          31s     10.128.2.155   ip-10-0-157-114.us-east-2.compute.internal   <none>           <none>

From above, the results is as expected, so move the bug VERIFIED.

Comment 7 errata-xmlrpc 2021-07-27 22:33:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.