Bug 1832641

Summary:	Cordoning a node takes it out of service load balancers' rotation
Product:	OpenShift Container Platform	Reporter:	Miciah Dashiel Butler Masters <mmasters>
Component:	Networking	Assignee:	Miciah Dashiel Butler Masters <mmasters>
Networking sub component:	router	QA Contact:	Hongan Li <hongli>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	aaleman, aos-bugs, bbennett, hongkliu, wking
Version:	4.5
Target Milestone:	---
Target Release:	4.5.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: The service controller had logic that removed unschedulable Nodes from cloud load-balancers' rotations. Consequence: Cordoning a Node (which marks it as unschedulable) would prevent it from being used to handle requests for "LoadBalancer"-type Services. If an IngressController used a cloud load-balancer (as is the case when the IngressController specifies the "LoadBalancerService" endpoint publishing strategy type), cordoning all the Nodes running that IngressController's pod replicas would cause a service outage for the IngressController. Fix: The service controller was modified not to remove unschedulable Nodes from cloud load-balancers. Result: Cordoning a Node no longer disrupts traffic to "LoadBalancer"-type Services.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-07-13 17:35:44 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Miciah Dashiel Butler Masters 2020-05-07 01:15:50 UTC

Description of problem:

The service controller removes a node from a service load balancer's rotation when the node is cordoned.  In particular, this breaks the ingress controller when all the nodes to which its pod replicas are scheduled are cordoned.


Version-Release number of selected component (if applicable):

Since OpenShift 3.2 (upstream commit https://github.com/kubernetes/kubernetes/pull/16443/commits/604203595ad7fb77709715de3b9494f8d3550fc7, picked up in https://github.com/openshift/origin/pull/6320/commits/1ca5c508ed534f940615b7de789a2d4fcf2eec1a).


How reproducible:

• I was able to reproduce the issue on first try on AWS.
• I was not able to reproduce the issue on Azure; although the service controller logs showed that the worker nodes were removed from the load balancer, the curl command continued to work.


Steps to Reproduce:

1. Launch a new cluster.

2. Create a new application and route on the cluster from Step 1:

    oc adm new-project hello-openshift
    oc -n hello-openshift create -f ~/src/github.com/openshift/origin/examples/hello-openshift/hello-pod.json
    oc -n hello-openshift expose pod/hello-openshift
    oc -n hello-openshift expose svc/hello-openshift

3. Get the host name of the route from Step 2:

    host=$(oc -n hello-openshift get routes/hello-openshift --output='jsonpath={.spec.host}')

4. Curl the host name from Step 3:

    curl -s -o /dev/null -w $'%{http_code}\n' "$host"

5. Cordon the cluster's worker nodes:

    oc adm cordon -l node-role.kubernetes.io/worker

6. Verify that the worker nodes have scheduling disabled:

    oc get nodes -l node-role.kubernetes.io/worker -o wide

7. Verify that all ingress controller pods are on cordoned nodes:

    oc -n openshift-ingress get pods -o wide

8. Check the service controller's logs:

    while read pod container; do echo "pod $pod:"; oc -n openshift-kube-controller-manager logs "$pod" -c kube-controller-manager; done < <(oc -n openshift-kube-controller-manager get pods -l app=kube-controller-manager -o name)

9. Repeat Step 4.


Actual results:

• Step 4 outputs "200".

• Step 6 shows that the worker nodes have scheduling disabled:

    NAME                                         STATUS                     ROLES    AGE   VERSION        INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION          CONTAINER-RUNTIME
    ip-10-0-128-163.us-east-2.compute.internal   Ready,SchedulingDisabled   worker   10m   v1.18.0-rc.1   10.0.128.163   <none>        Red Hat Enterprise Linux CoreOS 45.81.202005022156-0 (Ootpa)   4.18.0-193.el8.x86_64   cri-o://1.18.0-13.dev.rhaos4.5.git87f42de.el8
    ip-10-0-137-119.us-east-2.compute.internal   Ready,SchedulingDisabled   worker   10m   v1.18.0-rc.1   10.0.137.119   <none>        Red Hat Enterprise Linux CoreOS 45.81.202005022156-0 (Ootpa)   4.18.0-193.el8.x86_64   cri-o://1.18.0-13.dev.rhaos4.5.git87f42de.el8
    ip-10-0-149-204.us-east-2.compute.internal   Ready,SchedulingDisabled   worker   10m   v1.18.0-rc.1   10.0.149.204   <none>        Red Hat Enterprise Linux CoreOS 45.81.202005022156-0 (Ootpa)   4.18.0-193.el8.x86_64   cri-o://1.18.0-13.dev.rhaos4.5.git87f42de.el8

• Step 7 shows that the ingress controller pods are running on cordoned worker nodes:

    NAME                              READY   STATUS    RESTARTS   AGE   IP           NODE                                         NOMINATED NODE   READINESS GATES
    router-default-7f75575b65-ckr57   1/1     Running   0          14m   10.131.0.6   ip-10-0-149-204.us-east-2.compute.internal   <none>           <none>
    router-default-7f75575b65-mggrc   1/1     Running   0          14m   10.128.2.3   ip-10-0-128-163.us-east-2.compute.internal   <none>           <none>

• Step 8 shows that the service controller has taken the 3 worker nodes out of rotation (leaving only the 3 master nodes):

    I0507 00:11:28.304088       1 controller.go:661] Detected change in list of current cluster nodes. New node set: map[ip-10-0-129-180.us-east-2.compute.internal:{} ip-10-0-139-101.us-east-2.compute.internal:{} ip-10-0-156-150.us-east-2.compute.internal:{}]


• Step 9 outputs "000". 


Expected results:


• Step 8 should show that the 3 worker nodes (as well as the 3 master nodes) are in the load balancer's rotation:

    Detected change in list of current cluster nodes. New node set: map[ip-10-0-128-163.us-east-2.compute.internal:{} ip-10-0-129-180.us-east-2.compute.internal:{} ip-10-0-137-119.us-east-2.compute.internal:{} ip-10-0-139-101.us-east-2.compute.internal:{} ip-10-0-149-204.us-east-2.compute.internal:{} ip-10-0-156-150.us-east-2.compute.internal:{}]

• Step 9 should output "200".

• The other steps should have the actual results reported above.


Additional info:

• It may be necessary to wait a couple of minutes between Step 5 (cordoning the worker nodes) and Step 8 (seeing the service controller take the worker nodes out of rotation).

Comment 3 Hongan Li 2020-05-25 06:55:28 UTC

verified with 4.5.0-0.nightly-2020-05-24-223848 in both AWS and GCP platform, the issue has been fixed.

$ oc get node 
NAME                                         STATUS                     ROLES    AGE     VERSION
ip-10-0-133-218.us-east-2.compute.internal   Ready                      master   3h23m   v1.18.2
ip-10-0-134-228.us-east-2.compute.internal   Ready,SchedulingDisabled   worker   3h13m   v1.18.2
ip-10-0-173-28.us-east-2.compute.internal    Ready                      worker   3h13m   v1.18.2
ip-10-0-185-179.us-east-2.compute.internal   Ready                      master   3h23m   v1.18.2
ip-10-0-196-108.us-east-2.compute.internal   Ready,SchedulingDisabled   worker   3h13m   v1.18.2
ip-10-0-198-240.us-east-2.compute.internal   Ready                      master   3h23m   v1.18.2

$ oc -n openshift-ingress get pod -owide
NAME                              READY   STATUS    RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
router-default-67759d5dbf-cmg8m   1/1     Running   0          3h18m   10.131.0.16   ip-10-0-196-108.us-east-2.compute.internal   <none>           <none>
router-default-67759d5dbf-mqn27   1/1     Running   0          3h18m   10.128.2.3    ip-10-0-134-228.us-east-2.compute.internal   <none>           <none>

Checking console route...console-openshift-console.apps.hongli-pl848.qe.devcluster.openshift.com
200

Comment 4 errata-xmlrpc 2020-07-13 17:35:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409