Cause:
Failure of the self-hosted Loadbalancer used for distributing OCP API traffic on the master node that serves as API LB frontend (owns the API VIP).
Consequence:
The relevant master node will continue to own the API-VIP IP address although local LB is unhealthy and as a result of that OCP API will be unreachable for ~10 seconds.
Fix:
The Keepalived check for API-VIP script will monitor also self-hosted Loadbalancer health.
Result:
In case of a failure in local self-hosted Loadbalancer in the master node holding the API-VIP the API-VIP will failover to another master-node and we shouldn't hit service downtime for OCP-API.
Description of problem:
In the current implementation, the API VIP fails over to another master only based on local kube-api-server pod status.
With this approach, the API VIP can be owned by a master node without a healthy LB.
Version-Release number of the following components:
4.5.0-0.ci-2020-05-14-170026
How reproducible:
Steps to Reproduce:
1.Ssh to the master node holds API VIP
2.Run script that repeatedly deletes Haproxy LB, like so
sleep 1
sudo crictl rm -f $(sudo crictl ps --name haproxy | awk 'FNR==2{ print $1}')
Actual results:
1. API VIP still owned by this master node although local LB is unhealthy/doesn't run
2. In case OCP-APi is not accessible via LB, haproxy-monitor container will delete (after ~10 Sec) the firewall rule that redirects API traffic to LB. so API traffic will be sent directly to local kube-api-server and won't be distributed between masters.
Expected results:
If local LB is not healthy, API VIP should failover to another master node.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2020:2409
Description of problem: In the current implementation, the API VIP fails over to another master only based on local kube-api-server pod status. With this approach, the API VIP can be owned by a master node without a healthy LB. Version-Release number of the following components: 4.5.0-0.ci-2020-05-14-170026 How reproducible: Steps to Reproduce: 1.Ssh to the master node holds API VIP 2.Run script that repeatedly deletes Haproxy LB, like so sleep 1 sudo crictl rm -f $(sudo crictl ps --name haproxy | awk 'FNR==2{ print $1}') Actual results: 1. API VIP still owned by this master node although local LB is unhealthy/doesn't run 2. In case OCP-APi is not accessible via LB, haproxy-monitor container will delete (after ~10 Sec) the firewall rule that redirects API traffic to LB. so API traffic will be sent directly to local kube-api-server and won't be distributed between masters. Expected results: If local LB is not healthy, API VIP should failover to another master node.