Currently, HAproxy is configured in an active/passive way to load balance OSP API services. This is captured in this document: https://github.com/beekhof/osp-ha-deploy/blob/master/HA-keepalived.md This will limit throughput to what a single node is capable of delivering, leaving scale-up as the only option. A superior design would be to assign a router VIP to each HAproxy instance, and then have the API VIP on the loopback interface. Quagga could then be used on each OSP controller to advertise a route to the API VIP, via the distinct router VIP on that controller using OSPF/BGP. The upstream router would peer with the OSP controller nodes, resulting in multiple routes to the API VIP, via each of the router VIPs, and balance flows across all HAproxy instances using 5-tuple ECMP hashing. If HAproxy fails a heartbeat, quagga should withdraw it's route, so that the upstream router will redistribute flows across the surviving routes. If an OSP controller completely crashes, the upstream peer will reconverge to the surviving routes.
Assuming single digit ms latency between sites by means of dwdm or other private transit (for galera-cluster), this could also enable a robust mechanism to provide a single API endpoint for a control plane spanning multiple sites.
Can someone add me to 1261979?
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
Engineering has been evaluating the request and while the proposed architecture might work, there are several aspects that are problematic. First of all, there are different projects covering routing protocols. All forks of zebra or quagga, but none of them has established itself as the de-facto standard in the industry. This is problematic from a support perspective and might require plenty resources just to support this single solution. Second, the added complexity of this architecture might be a big adoption barrier for customers and to debug any issue. Routing protocols are not simple. Third, while theoretically this solution might solve a bottleneck problem, the question really become: how often did we hit the limit? So far, we have never heard of any customer complaining about this specific issue. Therefor we agreed to close this request as WONTFIX as Red Hat will not implement this feature.