Bug 1994111 - Fix missing NoExecute taint on NotReady nodes for Kubernetes
Summary: Fix missing NoExecute taint on NotReady nodes for Kubernetes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-controller-manager
Version: 4.7
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 4.7.z
Assignee: Maciej Szulik
QA Contact: zhou ying
URL:
Whiteboard:
Depends On: 2003027
Blocks: 2004836
TreeView+ depends on / blocked
 
Reported: 2021-08-16 18:46 UTC by blpowers@redhat.com
Modified: 2024-12-20 20:43 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-12 19:51:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes pull 935 0 None None None 2021-09-07 13:22:02 UTC
Red Hat Knowledge Base (Solution) 6369741 0 None None None 2021-09-28 19:58:01 UTC
Red Hat Product Errata RHBA-2021:3686 0 None None None 2021-10-12 19:52:10 UTC

Description blpowers@redhat.com 2021-08-16 18:46:40 UTC
1. Proposed title of this feature request

Fix missing NoExecute taint on NotReady nodes for Kubernetes

3. What is the nature and description of the request?

Bugfix
 
4. Why would you like this alteration? (List the business requirements here)

Failover of application pods is not happening.
Impact on resiliency and high availability.
 
5. How would you like to achieve this? (List the functional requirements here)

NoExecute taint must be added to NotReady nodes, so that pods running on a failed node can restart on a healthy node
 
6. For each functional requirement listed in question 5, specify how you can Red Hat can test to confirm the requirement is successfully implemented.

The fix is already available in Kubernetes upstream (https://github.com/kubernetes/kubernetes/pull/98168). Redhat need to make it available downstream for OCP 4.7
 
7. Is there already an existing RFE upstream or in Red Hat Bugzilla that you know of?

No
 
8. Do you have any specific timeline for this update?

17th Aug (As soon as possible, since our beta release is scheduled on OCP 4.7)
 
10. List any affected packages or components.

Kubernetes
 
11. Would you be willing and able to assist in testing this functionality if implemented?

yes

Comment 2 Maciej Szulik 2021-08-19 11:59:58 UTC
I will be working on bringing in the latest version of k8s 1.20 into openshift at the beginning of September.

Comment 3 zhou ying 2021-08-26 08:41:17 UTC
Maciej Szulik:

I've checked the ocp4.7, when node NotReady  I can't reproduce this issue:

[root@localhost ~]# oc get node
NAME                                        STATUS     ROLES    AGE   VERSION
ip-10-0-48-228.us-east-2.compute.internal   Ready      worker   28m   v1.20.0+4593a24
ip-10-0-49-143.us-east-2.compute.internal   Ready      master   37m   v1.20.0+4593a24
ip-10-0-61-211.us-east-2.compute.internal   Ready      master   37m   v1.20.0+4593a24
ip-10-0-69-178.us-east-2.compute.internal   Ready      master   37m   v1.20.0+4593a24
ip-10-0-75-179.us-east-2.compute.internal   NotReady   worker   27m   v1.20.0+4593a24
[root@localhost ~]# oc describe node/ip-10-0-75-179.us-east-2.compute.internal
Name:               ip-10-0-75-179.us-east-2.compute.internal
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m5.xlarge
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-east-2
                    failure-domain.beta.kubernetes.io/zone=us-east-2b
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-0-75-179
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=
                    node.kubernetes.io/instance-type=m5.xlarge
                    node.openshift.io/os_id=rhcos
                    topology.ebs.csi.aws.com/zone=us-east-2b
                    topology.kubernetes.io/region=us-east-2
                    topology.kubernetes.io/zone=us-east-2b
Annotations:        csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-0a2ea9aeb5ee2e144"}
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-01dafe43db2faeda0184971ba016d9c0
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-01dafe43db2faeda0184971ba016d9c0
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Done
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 26 Aug 2021 16:06:28 +0800
Taints:             node.kubernetes.io/unreachable:NoExecute
                    node.kubernetes.io/unreachable:NoSchedule

Is this still need update ?

Comment 5 Maciej Szulik 2021-09-01 10:49:05 UTC
(In reply to zhou ying from comment #3)
> Is this still need update ?

The update to latest k8s patch release is still going to happen.

Comment 6 Maciej Szulik 2021-09-08 12:21:23 UTC
Current PR bringing in kubernetes v1.20.10 is https://github.com/openshift/kubernetes/pull/935

This is bringing in the following changes:

Changelog for v1.20.10: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1209
Changelog for v1.20.9: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1208
Changelog for v1.20.8: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1207
Changelog for v1.20.7: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1206
Changelog for v1.20.6: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1206
Changelog for v1.20.5: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1205
Changelog for v1.20.4: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1204
Changelog for v1.20.3: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1203
Changelog for v1.20.2: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1202
Changelog for v1.20.1: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1201
Changelog for v1.20.0: https://github.com/kubernetes/kubernetes/blob/release-1.20/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1200

Comment 8 Maciej Szulik 2021-09-30 14:03:20 UTC
This should be fixed by now with https://bugzilla.redhat.com/show_bug.cgi?id=2003027 and specifically https://github.com/openshift/kubernetes/pull/935 which brings in k8s 1.20.10. 
Moving to modified.

Comment 15 errata-xmlrpc 2021-10-12 19:51:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.33 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3686


Note You need to log in before you can comment on or make changes to this bug.