968334 – Head gear haproxy configs hard code the IP address of the sub gears...

Bug 968334 - Head gear haproxy configs hard code the IP address of the sub gears...

Summary: Head gear haproxy configs hard code the IP address of the sub gears...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Online
Classification:	Red Hat
Component:	Containers
Sub Component:
Version:	2.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Mrunal Patel
QA Contact:	libra bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-05-29 13:27 UTC by Thomas Wiest
Modified:	2015-05-14 23:20 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-01-24 03:22:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Thomas Wiest 2013-05-29 13:27:34 UTC

Description of problem:
The head gear haproxy configs hard code the IP address of the sub gears.

For Example:
server gear-51a5133903ef64c23f000122-REDACTED 10.154.132.226:36121 check fall 2 rise 3 inter 2000 cookie 51a5133903ef64c23f000122-REDACTED

The IP address 10.154.132.226 should really be a DNS entry for the gear.

In other words, that line should look like this:
server gear-51a5133903ef64c23f000122-REDACTED 51a5133903ef64c23f000122-REDACTED.rhcloud.com:36121 check fall 2 rise 3 inter 2000 cookie 51a5133903ef64c23f000122-REDACTED

Hard coding the IP address is bad for a number of reasons, but primarily it's bad when the IP address of the gear changes. When this happens, the haproxy config is broken and won't work.

The IP address of a gear can change for a couple of reasons:
1. The ex-node has problems and must be stopped and started so that it's placed on a different physical vm host.
2. We move the gear from one ex-node to another

Both of these happen quite often in PROD.

Once these happen, the haproxy configs are wrong until some event causes the haproxy config file to be rewritten (like a scale up or scale down event).

However, many users lock the number of gears they run to a certain number, which means that they will not receive the config file updates.

Version-Release number of selected component (if applicable):
openshift-origin-cartridge-haproxy-0.4.7-1.el6oso.noarch
openshift-origin-cartridge-haproxy-1.4-1.9.3-1.el6oso.noarch

How reproducible:
Very

Steps to Reproduce:
1. Create a scaled app
2. Look at the head gear's haproxy config file, which is located here: /var/lib/openshift/$UUID/haproxy/conf/haproxy.cfg
3. Notice that it has IP addresses in it.

Actual results:
haproxy config file uses IP addresses

Expected results:
haproxy config file should use the gear's DNS for exactly the same reason that we tell end users to not use IP addresses and instead use the gear DNS.

Comment 1 Mrunal Patel 2013-05-29 15:23:13 UTC

The reason we went with IPs was because the DNS took time to resolve. Also, AFAIK
the connection hooks are run after a move and they should fix up the haproxy configuration. If they aren't run then they should be run.

Comment 2 Thomas Wiest 2013-05-29 15:34:27 UTC

I see, ok, well that still doesn't address the problem of stopping and starting an ex-node in AWS.

Unfortunately, for a number of reasons, stopping and starting instances in AWS is something we have to be able to do without breaking a bunch of apps / gears.

According to our internal monitoring, Dyn now takes on average 10.15 seconds between when we register new DNS and when it propagates to AWS.

It does sometimes take much longer (like minutes), but that's pretty rare these days.

Comment 3 Mrunal Patel 2013-05-29 18:03:38 UTC

It might be possible to use validate configuration feature to fix the haproxy config.

Comment 4 Clayton Coleman 2013-06-17 15:53:23 UTC

We've talked about this a few times and per gear DNS records may be the only way to do this across all the use cases we are going to need to support.

Comment 5 Andy Goldstein 2013-11-04 15:05:38 UTC

In 2.0.35 we now use the public_hostname instead of the IP address.

Comment 6 Meng Bo 2013-11-05 06:17:36 UTC

Checked on latest STG (devenv-stage_552), issue has been fixed.


$ cat haproxy.cfg
<--->
    server gear-52788aaadbd93cc3ab00004b-bmeng1stg ex-std-node1.stg.rhcloud.com:43401 check fall 2 rise 3 inter 2000 cookie 52788aaadbd93cc3ab00004b-bmeng1stg


Move bug to verified.

Note You need to log in before you can comment on or make changes to this bug.