Bug 1122141

Summary:	HA apps add fqdn entries in frontend http configuration causing oo-accept-node to FAIL
Product:	OpenShift Container Platform	Reporter:	Josep 'Pep' Turro Mauri <pep>
Component:	Containers	Assignee:	Luke Meyer <lmeyer>
Status:	CLOSED ERRATA	QA Contact:	libra bugs <libra-bugs>
Severity:	medium	Docs Contact:
Priority:	high
Version:	2.1.0	CC:	bleanhar, erich, gpei, jialiu, jkeck, jokerman, libra-onpremise-devel, lmeyer, ludovic.meurillon, miguel, mmccomas, pruan
Target Milestone:	---	Keywords:	Upstream
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	rubygem-openshift-origin-node-1.34.1.1-1	Doc Type:	Bug Fix
Doc Text:	Cause: HA Applications erroneously created frontend entries for the application FQDN. Consequence: oo-accept-node fails and entries are not cleaned up on application deletion. Fix: Rather than creating a frontend entry an alias is now created to the second HA frontend. Result: oo-accept-node no longer fails and there's no need to clean up additional frontend entries on application deletion.	Story Points:	---
Clone Of:
Clones:	1155677 (view as bug list)		Environment:
Last Closed:	2015-02-12 13:09:18 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1155677
Bug Blocks:

Description Josep 'Pep' Turro Mauri 2014-07-22 15:53:47 UTC

Description of problem:
When making an application HA, the node where the app's second gear is created ends up with entries in the frontend httpd configuration for both the app's fqdn and the newly created gear.

This causes oo-accept-node to fail with:
FAIL: httpd config references DNS name without associated gear: 'myapp-mydomain.example.com'

Moreover, when the app is deleted these entries are not cleaned up.

Version-Release number of selected component (if applicable):
rubygem-openshift-origin-node-1.23.9.11-1.el6op.noarch
openshift-origin-node-util-1.22.9.1-1.el6op.noarch
rubygem-openshift-origin-frontend-apache-mod-rewrite-0.5.2.1-1.el6op.noarch

How reproducible:
Always

Steps to Reproduce:
1. Configure env for HA apps (external LB routing plugin etc)
2. Create an app (ID 53ce7982e3c9c39293000001 in the example below)
3. Make the app HA (make-ha event)
4. See how oo-accept-node fails
5. Delete app
6. oo-accept-node still fails

Actual results:

In the node where the app was first created we have:
[root@node1 ~]# grep 53ce7982e3c9c39293000001 /etc/httpd/conf.d/openshift/nodes.txt
myapp-mydomain.example.com 127.10.95.2:8080|53ce7982e3c9c39293000001|53ce7982e3c9c39293000001
myapp-mydomain.example.com/health HEALTH|53ce7982e3c9c39293000001|53ce7982e3c9c39293000001
myapp-mydomain.example.com/haproxy-status 127.10.95.3:8080/|53ce7982e3c9c39293000001|53ce7982e3c9c39293000001

After step 3, in the node where the additional gear is created, these entries are created:
[root@node2 ~]# grep 53ce7982e3c9c39293000001 /etc/httpd/conf.d/openshift/nodes.txt
53ce79d3e3c9c39293000020-mydomain.example.com 127.6.237.2:8080|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020
53ce79d3e3c9c39293000020-mydomain.example.com/health HEALTH|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020
myapp-mydomain.example.com 127.6.237.2:8080|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020
myapp-mydomain.example.com/health HEALTH|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020
53ce79d3e3c9c39293000020-mydomain.example.com/haproxy-status 127.6.237.3:8080/|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020
myapp-mydomain.example.com/haproxy-status 127.6.237.3:8080/|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020

and result in:
[root@node2 ~]# oo-accept-node
FAIL: httpd config references DNS name without associated gear: 'myapp-mydomain.example.com'

After deleting the app, this remains in node2's frontend configuration:
[root@node2 ~]# grep 53ce7982e3c9c39293000001 /etc/httpd/conf.d/openshift/nodes.txt
myapp-mydomain.example.com 127.6.237.2:8080|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020
myapp-mydomain.example.com/health HEALTH|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020
myapp-mydomain.example.com/haproxy-status 127.6.237.3:8080/|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020

and oo-accept-node keeps complaining about it.

Expected results:
I believe the entries added with the FQDN are intended, in which case oo-accept-node should not complain, and the entries should be cleaned up on app deletion.

Additional info:
In OSE 2.0 the FQDN entries in the frontend are not created (and oo-accept-node does not complain).

Comment 6 Luke Meyer 2014-10-22 15:06:05 UTC

I agree that httpd frontend entries added for secondary haproxy gears should be there, shouldn't be flagged by oo-accept-node, and should go away when the app does. I don't think they cause any real trouble other than oo-accept-node complaining, but that's bug enough.

Comment 7 Luke Meyer 2014-11-21 19:43:33 UTC

If we can address bug 1131642 at the same time as this, that would be perfect. It might also be viable to just add the ha-myapp-mydomain route to the frontend instead of the myapp-mydomain one - it makes sense to add routes for any HA alias of the app to the frontend for haproxy gears, but what is going to ask for the app name at the secondary haproxy gear?

Comment 8 Luke Meyer 2014-12-17 19:50:45 UTC

It's worth noting that this is only a problem for the mod-rewrite frontend. For vhost (default frontend for OSE 2.2 and Online) I believe there is no complaint. Of course mod-rewrite was the default in 2.1 and is where most existing HA deployments probably still sit.

The present plan is to fix (not revert) the existing implementation, with the ha-alias from bug 1131642 being a secondary concern (it requires the broker to actually inform the node about this alias and the node to store it as an alias, so a bit more invasive).

Comment 9 Josep 'Pep' Turro Mauri 2014-12-17 21:44:43 UTC

(In reply to Luke Meyer from comment #8)
> It's worth noting that this is only a problem for the mod-rewrite frontend.
> For vhost (default frontend for OSE 2.2 and Online) I believe there is no
> complaint.

It also happens with the vhost frontend:

$ rhc app enable-ha test
RESULT:
test is now highly available

root@dhcp198 ~ # oo-accept-node 
FAIL: httpd config references DNS name without associated gear: 'test-pep.ose22.example.com'
1 ERRORS
root@dhcp198 ~ # cat /etc/httpd/conf.d/openshift/routes.json | python -m json.tool
{
    "5491f7f72fa4576d0c00076c-pep.ose22.example.com": {
        "endpoints": [
            "127.10.40.130:8080"
        ], 
        "limits": {
            "bandwidth": 100, 
            "connections": 5
        }
    }, 
    "test-pep.ose22.example.com": {
        "endpoints": [
            "127.10.40.130:8080"
        ], 
        "limits": {
            "bandwidth": 100, 
            "connections": -1
        }
    }
}
root@dhcp198 ~ # rpm -q rubygem-openshift-origin-node
rubygem-openshift-origin-node-1.32.3.1-1.el6op.noarch

Comment 10 Luke Meyer 2014-12-18 17:28:18 UTC

You're right, Pep, sorry - I made some bad assumptions. You're right, the problem manifests essentially the same with the vhost frontend.

Comment 11 Ludovic Meurillon 2014-12-19 14:32:02 UTC

It seems that there is 2 separate problems.

 1- oo-accept-node errors on deployed ha apps (on secondary haproxy gears)
 2- oo-accept-node errors on deleted ha apps (on nodes that were hosting former secondary haproxy gears).

For problem 1 (deployed ha apps) :

On our side we fixed check_system_httpd_configs method of oo-accept-node to compare found routes with OPENSHIFT_APP_DNS and OPENSHIFT_GEAR_DNS when a cartridge with 'web_proxy' category if deployed into a gear.

It works very well on deployed ha apps.

For problem 2 (deleted ha apps) : 

It seems located on v2_cart_model ruby file :

connect_frontend method create explicitly a route for <app>-<namespace>.<cloud_domain> on every secondary haproxy gears (gears with a cartridge from 'web_proxy' category and a name different from <app>-<namespace>)

and disconnect_frontend (that do the deletion) seems never called on a "rhc app delete <app>" (no log found).

Comment 12 Luke Meyer 2014-12-19 15:05:21 UTC

There are actually more problems than even that (see upstream bug https://bugzilla.redhat.com/show_bug.cgi?id=1155677#c6). I have a different approach in the works that so far is looking good: https://github.com/openshift/origin-server/pull/6027

Comment 16 Gaoyun Pei 2015-02-03 06:47:00 UTC

Verify this bug on puddle 2.2/2015-02-02.1

1. Create scalable app and make it ha, the two gears are placed on different nodes. Log into the node where the second haproxy gear is created, check the entries in http frontend file. It should only have the entries for the new gear.
For mod_rewrite:
[root@node2 .httpd.d]# cat nodes.txt 
yes-myruby-2-yes.ose22-auto.com.cn 127.3.67.2:8080|54d0519682611d9690000056|yes-myruby-2
yes-myruby-2-yes.ose22-auto.com.cn/health HEALTH|54d0519682611d9690000056|yes-myruby-2
yes-myruby-2-yes.ose22-auto.com.cn/haproxy-status 127.3.67.3:8080/|54d0519682611d9690000056|yes-myruby-2

For vhost:
[root@node2 .httpd.d]# cat routes.json 
{"yes-myruby-2-yes.ose22-auto.com.cn":{"endpoints":["127.12.127.2:8080"],"limits":{"connections":5,"bandwidth":100}},"myruby-yes.ose22-auto.com.cn":{"endpoints":["127.12.127.2:8080"],"limits":{"connections":5,"bandwidth":100},"alias":"yes-myruby-2-yes.ose22-auto.com.cn"}}

And 'oo-accept-node' passed on this node.

2. With mod_rewrite frontend, idle the second gear, and access the gear via the route. The gear could be started though the time takes a little long. We have Bug 1170040 opened for this.

3. After move the second gear to another node, the route for the gear will be moved together, and the oo-frontend-plugin-modify --save will save the route info for the gear.

4. Delete the app, all the records about this gear in http frontend file will be cleaned in nodes.txt or routes.json. And 'oo-accept-node' passed on the nodes.

Comment 18 errata-xmlrpc 2015-02-12 13:09:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0220.html