Description of problem: The service controller removes a node from a service load balancer's rotation when the node is cordoned. In particular, this breaks the ingress controller when all the nodes to which its pod replicas are scheduled are cordoned. Version-Release number of selected component (if applicable): Since OpenShift 3.2 (upstream commit https://github.com/kubernetes/kubernetes/pull/16443/commits/604203595ad7fb77709715de3b9494f8d3550fc7, picked up in https://github.com/openshift/origin/pull/6320/commits/1ca5c508ed534f940615b7de789a2d4fcf2eec1a). How reproducible: • I was able to reproduce the issue on first try on AWS. • I was not able to reproduce the issue on Azure; although the service controller logs showed that the worker nodes were removed from the load balancer, the curl command continued to work. Steps to Reproduce: 1. Launch a new cluster. 2. Create a new application and route on the cluster from Step 1: oc adm new-project hello-openshift oc -n hello-openshift create -f ~/src/github.com/openshift/origin/examples/hello-openshift/hello-pod.json oc -n hello-openshift expose pod/hello-openshift oc -n hello-openshift expose svc/hello-openshift 3. Get the host name of the route from Step 2: host=$(oc -n hello-openshift get routes/hello-openshift --output='jsonpath={.spec.host}') 4. Curl the host name from Step 3: curl -s -o /dev/null -w $'%{http_code}\n' "$host" 5. Cordon the cluster's worker nodes: oc adm cordon -l node-role.kubernetes.io/worker 6. Verify that the worker nodes have scheduling disabled: oc get nodes -l node-role.kubernetes.io/worker -o wide 7. Verify that all ingress controller pods are on cordoned nodes: oc -n openshift-ingress get pods -o wide 8. Check the service controller's logs: while read pod container; do echo "pod $pod:"; oc -n openshift-kube-controller-manager logs "$pod" -c kube-controller-manager; done < <(oc -n openshift-kube-controller-manager get pods -l app=kube-controller-manager -o name) 9. Repeat Step 4. Actual results: • Step 4 outputs "200". • Step 6 shows that the worker nodes have scheduling disabled: NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-128-163.us-east-2.compute.internal Ready,SchedulingDisabled worker 10m v1.18.0-rc.1 10.0.128.163 <none> Red Hat Enterprise Linux CoreOS 45.81.202005022156-0 (Ootpa) 4.18.0-193.el8.x86_64 cri-o://1.18.0-13.dev.rhaos4.5.git87f42de.el8 ip-10-0-137-119.us-east-2.compute.internal Ready,SchedulingDisabled worker 10m v1.18.0-rc.1 10.0.137.119 <none> Red Hat Enterprise Linux CoreOS 45.81.202005022156-0 (Ootpa) 4.18.0-193.el8.x86_64 cri-o://1.18.0-13.dev.rhaos4.5.git87f42de.el8 ip-10-0-149-204.us-east-2.compute.internal Ready,SchedulingDisabled worker 10m v1.18.0-rc.1 10.0.149.204 <none> Red Hat Enterprise Linux CoreOS 45.81.202005022156-0 (Ootpa) 4.18.0-193.el8.x86_64 cri-o://1.18.0-13.dev.rhaos4.5.git87f42de.el8 • Step 7 shows that the ingress controller pods are running on cordoned worker nodes: NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES router-default-7f75575b65-ckr57 1/1 Running 0 14m 10.131.0.6 ip-10-0-149-204.us-east-2.compute.internal <none> <none> router-default-7f75575b65-mggrc 1/1 Running 0 14m 10.128.2.3 ip-10-0-128-163.us-east-2.compute.internal <none> <none> • Step 8 shows that the service controller has taken the 3 worker nodes out of rotation (leaving only the 3 master nodes): I0507 00:11:28.304088 1 controller.go:661] Detected change in list of current cluster nodes. New node set: map[ip-10-0-129-180.us-east-2.compute.internal:{} ip-10-0-139-101.us-east-2.compute.internal:{} ip-10-0-156-150.us-east-2.compute.internal:{}] • Step 9 outputs "000". Expected results: • Step 8 should show that the 3 worker nodes (as well as the 3 master nodes) are in the load balancer's rotation: Detected change in list of current cluster nodes. New node set: map[ip-10-0-128-163.us-east-2.compute.internal:{} ip-10-0-129-180.us-east-2.compute.internal:{} ip-10-0-137-119.us-east-2.compute.internal:{} ip-10-0-139-101.us-east-2.compute.internal:{} ip-10-0-149-204.us-east-2.compute.internal:{} ip-10-0-156-150.us-east-2.compute.internal:{}] • Step 9 should output "200". • The other steps should have the actual results reported above. Additional info: • It may be necessary to wait a couple of minutes between Step 5 (cordoning the worker nodes) and Step 8 (seeing the service controller take the worker nodes out of rotation).
verified with 4.5.0-0.nightly-2020-05-24-223848 in both AWS and GCP platform, the issue has been fixed. $ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-133-218.us-east-2.compute.internal Ready master 3h23m v1.18.2 ip-10-0-134-228.us-east-2.compute.internal Ready,SchedulingDisabled worker 3h13m v1.18.2 ip-10-0-173-28.us-east-2.compute.internal Ready worker 3h13m v1.18.2 ip-10-0-185-179.us-east-2.compute.internal Ready master 3h23m v1.18.2 ip-10-0-196-108.us-east-2.compute.internal Ready,SchedulingDisabled worker 3h13m v1.18.2 ip-10-0-198-240.us-east-2.compute.internal Ready master 3h23m v1.18.2 $ oc -n openshift-ingress get pod -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES router-default-67759d5dbf-cmg8m 1/1 Running 0 3h18m 10.131.0.16 ip-10-0-196-108.us-east-2.compute.internal <none> <none> router-default-67759d5dbf-mqn27 1/1 Running 0 3h18m 10.128.2.3 ip-10-0-134-228.us-east-2.compute.internal <none> <none> Checking console route...console-openshift-console.apps.hongli-pl848.qe.devcluster.openshift.com 200
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409