1. Proposed title of this feature request Fix missing NoExecute taint on NotReady nodes for Kubernetes 3. What is the nature and description of the request? Bugfix 4. Why would you like this alteration? (List the business requirements here) Failover of application pods is not happening. Impact on resiliency and high availability. 5. How would you like to achieve this? (List the functional requirements here) NoExecute taint must be added to NotReady nodes, so that pods running on a failed node can restart on a healthy node 6. For each functional requirement listed in question 5, specify how you can Red Hat can test to confirm the requirement is successfully implemented. The fix is already available in Kubernetes upstream (https://github.com/kubernetes/kubernetes/pull/98168). Redhat need to make it available downstream for OCP 4.7 7. Is there already an existing RFE upstream or in Red Hat Bugzilla that you know of? No 8. Do you have any specific timeline for this update? 17th Aug (As soon as possible, since our beta release is scheduled on OCP 4.7) 10. List any affected packages or components. Kubernetes 11. Would you be willing and able to assist in testing this functionality if implemented? yes
I will be working on bringing in the latest version of k8s 1.20 into openshift at the beginning of September.
Maciej Szulik: I've checked the ocp4.7, when node NotReady I can't reproduce this issue: [root@localhost ~]# oc get node NAME STATUS ROLES AGE VERSION ip-10-0-48-228.us-east-2.compute.internal Ready worker 28m v1.20.0+4593a24 ip-10-0-49-143.us-east-2.compute.internal Ready master 37m v1.20.0+4593a24 ip-10-0-61-211.us-east-2.compute.internal Ready master 37m v1.20.0+4593a24 ip-10-0-69-178.us-east-2.compute.internal Ready master 37m v1.20.0+4593a24 ip-10-0-75-179.us-east-2.compute.internal NotReady worker 27m v1.20.0+4593a24 [root@localhost ~]# oc describe node/ip-10-0-75-179.us-east-2.compute.internal Name: ip-10-0-75-179.us-east-2.compute.internal Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=m5.xlarge beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=us-east-2 failure-domain.beta.kubernetes.io/zone=us-east-2b kubernetes.io/arch=amd64 kubernetes.io/hostname=ip-10-0-75-179 kubernetes.io/os=linux node-role.kubernetes.io/worker= node.kubernetes.io/instance-type=m5.xlarge node.openshift.io/os_id=rhcos topology.ebs.csi.aws.com/zone=us-east-2b topology.kubernetes.io/region=us-east-2 topology.kubernetes.io/zone=us-east-2b Annotations: csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-0a2ea9aeb5ee2e144"} machineconfiguration.openshift.io/currentConfig: rendered-worker-01dafe43db2faeda0184971ba016d9c0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-01dafe43db2faeda0184971ba016d9c0 machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 26 Aug 2021 16:06:28 +0800 Taints: node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unreachable:NoSchedule Is this still need update ?
(In reply to zhou ying from comment #3) > Is this still need update ? The update to latest k8s patch release is still going to happen.
Current PR bringing in kubernetes v1.20.10 is https://github.com/openshift/kubernetes/pull/935 This is bringing in the following changes: Changelog for v1.20.10: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1209 Changelog for v1.20.9: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1208 Changelog for v1.20.8: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1207 Changelog for v1.20.7: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1206 Changelog for v1.20.6: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1206 Changelog for v1.20.5: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1205 Changelog for v1.20.4: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1204 Changelog for v1.20.3: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1203 Changelog for v1.20.2: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1202 Changelog for v1.20.1: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1201 Changelog for v1.20.0: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1200
This should be fixed by now with https://bugzilla.redhat.com/show_bug.cgi?id=2003027 and specifically https://github.com/openshift/kubernetes/pull/935 which brings in k8s 1.20.10. Moving to modified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.7.33 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3686