Bug 1309533

Summary: [RFE] Heat template to configure quagga active/active/active HAproxy for OSP controllers
Product: Red Hat OpenStack Reporter: Kyle Bader <kbader>
Component: puppet-haproxyAssignee: Fabio Massimo Di Nitto <fdinitto>
Status: CLOSED WONTFIX QA Contact: Udi Shkalim <ushkalim>
Severity: medium Docs Contact:
Priority: medium    
Version: 12.0 (Pike)CC: alan_bishop, arkady_kanevsky, bperkins, cdevine, christopher_dearborn, dbecker, gael_rehault, ipilcher, jcoufal, jdonohue, jjoyce, joherr, John_walsh, jschluet, j_t_williams, kbader, kschinck, kurt_hey, mburns, morazi, nlevine, rajini.karthik, randy_perryman, rhel-osp-director-maint, royoung, rsussman, slinaber, smerrow, sreichar, tvignaud, ushkalim, wayne_allen
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-19 15:25:22 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1419948, 1458798    

Description Kyle Bader 2016-02-18 04:04:00 UTC
Currently, HAproxy is configured in an active/passive way to load balance OSP API services. This is captured in this document:

https://github.com/beekhof/osp-ha-deploy/blob/master/HA-keepalived.md

This will limit throughput to what a single node is capable of delivering, leaving scale-up as the only option. A superior design would be to assign a router VIP to each HAproxy instance, and then have the API VIP on the loopback interface. Quagga could then be used on each OSP controller to advertise a route to the API VIP, via the distinct router VIP on that controller using OSPF/BGP. The upstream router would peer with the OSP controller nodes, resulting in multiple routes to the API VIP, via each of the router VIPs, and balance flows across all HAproxy instances using 5-tuple ECMP hashing. If HAproxy fails a heartbeat, quagga should withdraw it's route, so that the upstream router will redistribute flows across the surviving routes. If an OSP controller completely crashes, the upstream peer will reconverge to the surviving routes.

Comment 2 Kyle Bader 2016-02-18 04:10:52 UTC
Assuming single digit ms latency between sites by means of dwdm or other private transit (for galera-cluster), this could also enable a robust mechanism to provide a single API endpoint for a control plane spanning multiple sites.

Comment 3 Kyle Bader 2016-02-18 05:27:08 UTC
Can someone add me to 1261979?

Comment 4 Mike Burns 2016-04-07 21:11:06 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 7 Fabio Massimo Di Nitto 2017-10-19 15:25:22 UTC
Engineering has been evaluating the request and while the proposed architecture might work, there are several aspects that are problematic.

First of all, there are different projects covering routing protocols. All forks of zebra or quagga, but none of them has established itself as the de-facto standard in the industry. This is problematic from a support perspective and might require plenty resources just to support this single solution.

Second, the added complexity of this architecture might be a big adoption barrier for customers and to debug any issue. Routing protocols are not simple.

Third, while theoretically this solution might solve a bottleneck problem, the question really become: how often did we hit the limit? So far, we have never heard of any customer complaining about this specific issue.

Therefor we agreed to close this request as WONTFIX as Red Hat will not implement this feature.