Bug 1155677 - HA apps add fqdn entries in frontend http configuration causing oo-accept-node to FAIL
Summary: HA apps add fqdn entries in frontend http configuration causing oo-accept-nod...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Containers
Version: 1.x
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: 2.x
Assignee: Luke Meyer
QA Contact: libra bugs
URL:
Whiteboard:
: 1135653 (view as bug list)
Depends On:
Blocks: 1122141
TreeView+ depends on / blocked
 
Reported: 2014-10-22 15:21 UTC by Luke Meyer
Modified: 2015-10-23 16:07 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1122141
Environment:
Last Closed: 2015-02-18 16:52:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Luke Meyer 2014-10-22 15:21:58 UTC
+++ This bug was initially created as a clone of Bug #1122141 +++

Description of problem:
When making an application HA, the node where the app's second gear is created ends up with entries in the frontend httpd configuration for both the app's fqdn and the newly created gear.

This causes oo-accept-node to fail with:
FAIL: httpd config references DNS name without associated gear: 'myapp-mydomain.example.com'

Moreover, when the app is deleted these entries are not cleaned up.

Steps to Reproduce:
1. Configure env for HA apps (external LB routing plugin etc)
2. Create an app (ID 53ce7982e3c9c39293000001 in the example below)
3. Make the app HA (make-ha event)
4. See how oo-accept-node fails
5. Delete app
6. oo-accept-node still fails

Actual results:

In the node where the app was first created we have:
[root@node1 ~]# grep 53ce7982e3c9c39293000001 /etc/httpd/conf.d/openshift/nodes.txt
myapp-mydomain.example.com 127.10.95.2:8080|53ce7982e3c9c39293000001|53ce7982e3c9c39293000001
myapp-mydomain.example.com/health HEALTH|53ce7982e3c9c39293000001|53ce7982e3c9c39293000001
myapp-mydomain.example.com/haproxy-status 127.10.95.3:8080/|53ce7982e3c9c39293000001|53ce7982e3c9c39293000001

After step 3, in the node where the additional gear is created, these entries are created:
[root@node2 ~]# grep 53ce7982e3c9c39293000001 /etc/httpd/conf.d/openshift/nodes.txt
53ce79d3e3c9c39293000020-mydomain.example.com 127.6.237.2:8080|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020
53ce79d3e3c9c39293000020-mydomain.example.com/health HEALTH|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020
myapp-mydomain.example.com 127.6.237.2:8080|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020
myapp-mydomain.example.com/health HEALTH|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020
53ce79d3e3c9c39293000020-mydomain.example.com/haproxy-status 127.6.237.3:8080/|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020
myapp-mydomain.example.com/haproxy-status 127.6.237.3:8080/|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020

and result in:
[root@node2 ~]# oo-accept-node
FAIL: httpd config references DNS name without associated gear: 'myapp-mydomain.example.com'

After deleting the app, this remains in node2's frontend configuration:
[root@node2 ~]# grep 53ce7982e3c9c39293000001 /etc/httpd/conf.d/openshift/nodes.txt
myapp-mydomain.example.com 127.6.237.2:8080|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020
myapp-mydomain.example.com/health HEALTH|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020
myapp-mydomain.example.com/haproxy-status 127.6.237.3:8080/|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020

and oo-accept-node keeps complaining about it.

Expected results:
I believe the entries added with the FQDN are intended, in which case oo-accept-node should not complain, and the entries should be cleaned up on app deletion.

Additional info:
In OSE 2.0 the FQDN entries in the frontend are not created (and oo-accept-node does not complain).

--- Additional comment from Josep 'Pep' Turro Mauri on 2014-10-17 10:56:44 EDT ---


[*] the change I believe was introduced here:

commit 1aa365fb74f5b0f34517a078c5e0dec15d773f2b

    add app-dns to secondary haproxy gears

https://github.com/openshift/origin-server/pull/4631

--- Additional comment from Luke Meyer on 2014-10-22 11:06:05 EDT ---

I agree that httpd frontend entries added for secondary haproxy gears should be there, shouldn't be flagged by oo-accept-node, and should go away when the app does. I don't think they cause any real trouble other than oo-accept-node complaining, but that's bug enough.

Comment 1 Josep 'Pep' Turro Mauri 2014-11-21 19:00:54 UTC
I think this was meant to be an upstream bug, moving to Origin

Comment 2 Luke Meyer 2014-11-21 19:45:39 UTC
If we can address bug 1131642 at the same time as this, that would be perfect. It might also be viable to just add the ha-myapp-mydomain route to the frontend instead of the myapp-mydomain one - it makes sense to add routes for any HA alias of the app to the frontend for haproxy gears, but what is going to ask for the app name at the secondary haproxy gear?

Comment 3 Luke Meyer 2014-11-26 20:19:25 UTC
A little further analysis:

Seems the problem is that the addition of the app_dns route is sort of hacked in:
https://github.com/openshift/origin-server/pull/4631/files#diff-5eab8fd386170dcad042cfabe9f0d082R1072
I.e. the route is added as if it were an fqdn for the app, but not stored with the gear in the geardb.json. So oo-accept-node is complaining because none of the gears in geardb.json claim this route. BTW, if we store the frontends with oo-frontend-plugin-modify --save, this fqdn won't be included.

Then later on, when it is time to remove the gear, the frontend.destroy calls purge_by_fqdn and purge_by_uuid:
https://github.com/openshift/origin-server/blob/4d6c7ffabdb33dd0de03f051d632ecdd2a55cdae/plugins/frontend/apache-mod-rewrite/lib/openshift/runtime/frontend/http/plugins/apache-mod-rewrite.rb#L48-L70

Since the app_dns isn't a fqdn for the gear, purge_by_fqdn doesn't remove it. And since purge_by_uuid only removes aliases, that doesn't remove it either.

We could update purge_by_uuid to remove entries from all the DBs. And we could update oo-accept-node to also search for owner gears by UUID. Or what would probably be cleaner is revert the change that makes the app_dns a route on secondary haproxy gears. If we need one name that is routable for all haproxy gears on HA apps, I would suggest treating the ha-name-domain.<cloud_domain> fqdn as an alias (at least for the purposes of frontend - bug 1131642).

Comment 4 Mrunal Patel 2014-12-17 19:25:42 UTC
Rajat and I discussed this. Suggestions:
1. We should update purge_by_uuid so that fqdns are removed (and not revert the change). 
2. ha-name-domain should be added as an alias.

Comment 6 Luke Meyer 2014-12-18 15:30:23 UTC
Did a little more discovery on this. Here are the problems created by the current approach of creating the secondary haproxy gear route as an unofficial app-fqdn entry under mod-rewrite. These would all need to be fixed:

1. oo-accept-node does not recognize the route in nodes.txt as belonging to any gear.
2. The route does not get cleaned up when the gear is removed.
3. The route is not saved by oo-frontend-plugin-modify --save, so will not survive a frontend migration.
4. The route is lost during a gear move.
5. If the gear idles, access via the route gets a 503 rather than unidling the gear.

I believe, however, all of these inconsistencies would be addressed with a small change by adding the app-fqdn entry to the frontend as an alias instead, and I can't think of any downsides of doing so.

Comment 7 Luke Meyer 2014-12-18 19:03:19 UTC
I mistakenly thought this was not a problem under the vhost frontend. Actually it's pretty much the same (with nodes.txt instead represented by routes.json under vhost) except that #5 does not apply (i.e. gear unidling works with the app route).

Comment 8 openshift-github-bot 2015-01-08 19:09:50 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/4e6cb009757fecf19cf26520645c3e0e839f38fa
node: fix secondary haproxy app fqdn

The secondary HAproxy gear in a HA application should be accessible at
the node httpd proxy via the app fqdn. Previously this extra route was
added as a sort of additional unofficial fqdn for the gear frontend that
resulted in some inconsistencies. This change updates the method used
such that the app fqdn is added as an alias on the gear frontend instead.

Bug 1155677 - HA apps add fqdn entries in frontend http configuration
causing oo-accept-node to FAIL
https://bugzilla.redhat.com/show_bug.cgi?id=1155677

Comment 9 Meng Bo 2015-01-09 07:27:26 UTC
Checked on devenv_5383, issue has been fixed.

With a multi-node env, create scalable app and make it ha, the two gears placed on different nodes.

Run oo-accept-node -v on the 2nd gear's node, there is no error about the gear dns.

Delete the app, the all the records about the gear will be cleaned in routes.json.

After move the gear to another node, the route for the app fqdn will be moved together, and the oo-frontend-plugin-modify --save will contain the app fqdn info.

And for mod-rewrite plugin, it can be unidled when accessing the app fqdn without 503 page.


Move bug to verified.

Comment 10 Vu Dinh 2015-10-23 16:07:13 UTC
*** Bug 1135653 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.