Bug 1832641
Summary: | Cordoning a node takes it out of service load balancers' rotation | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Miciah Dashiel Butler Masters <mmasters> |
Component: | Networking | Assignee: | Miciah Dashiel Butler Masters <mmasters> |
Networking sub component: | router | QA Contact: | Hongan Li <hongli> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | aaleman, aos-bugs, bbennett, hongkliu, wking |
Version: | 4.5 | ||
Target Milestone: | --- | ||
Target Release: | 4.5.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: The service controller had logic that removed unschedulable Nodes from cloud load-balancers' rotations.
Consequence: Cordoning a Node (which marks it as unschedulable) would prevent it from being used to handle requests for "LoadBalancer"-type Services. If an IngressController used a cloud load-balancer (as is the case when the IngressController specifies the "LoadBalancerService" endpoint publishing strategy type), cordoning all the Nodes running that IngressController's pod replicas would cause a service outage for the IngressController.
Fix: The service controller was modified not to remove unschedulable Nodes from cloud load-balancers.
Result: Cordoning a Node no longer disrupts traffic to "LoadBalancer"-type Services.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-13 17:35:44 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Miciah Dashiel Butler Masters
2020-05-07 01:15:50 UTC
verified with 4.5.0-0.nightly-2020-05-24-223848 in both AWS and GCP platform, the issue has been fixed. $ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-133-218.us-east-2.compute.internal Ready master 3h23m v1.18.2 ip-10-0-134-228.us-east-2.compute.internal Ready,SchedulingDisabled worker 3h13m v1.18.2 ip-10-0-173-28.us-east-2.compute.internal Ready worker 3h13m v1.18.2 ip-10-0-185-179.us-east-2.compute.internal Ready master 3h23m v1.18.2 ip-10-0-196-108.us-east-2.compute.internal Ready,SchedulingDisabled worker 3h13m v1.18.2 ip-10-0-198-240.us-east-2.compute.internal Ready master 3h23m v1.18.2 $ oc -n openshift-ingress get pod -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES router-default-67759d5dbf-cmg8m 1/1 Running 0 3h18m 10.131.0.16 ip-10-0-196-108.us-east-2.compute.internal <none> <none> router-default-67759d5dbf-mqn27 1/1 Running 0 3h18m 10.128.2.3 ip-10-0-134-228.us-east-2.compute.internal <none> <none> Checking console route...console-openshift-console.apps.hongli-pl848.qe.devcluster.openshift.com 200 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |