Description of problem: When making an application HA, the node where the app's second gear is created ends up with entries in the frontend httpd configuration for both the app's fqdn and the newly created gear. This causes oo-accept-node to fail with: FAIL: httpd config references DNS name without associated gear: 'myapp-mydomain.example.com' Moreover, when the app is deleted these entries are not cleaned up. Version-Release number of selected component (if applicable): rubygem-openshift-origin-node-1.23.9.11-1.el6op.noarch openshift-origin-node-util-1.22.9.1-1.el6op.noarch rubygem-openshift-origin-frontend-apache-mod-rewrite-0.5.2.1-1.el6op.noarch How reproducible: Always Steps to Reproduce: 1. Configure env for HA apps (external LB routing plugin etc) 2. Create an app (ID 53ce7982e3c9c39293000001 in the example below) 3. Make the app HA (make-ha event) 4. See how oo-accept-node fails 5. Delete app 6. oo-accept-node still fails Actual results: In the node where the app was first created we have: [root@node1 ~]# grep 53ce7982e3c9c39293000001 /etc/httpd/conf.d/openshift/nodes.txt myapp-mydomain.example.com 127.10.95.2:8080|53ce7982e3c9c39293000001|53ce7982e3c9c39293000001 myapp-mydomain.example.com/health HEALTH|53ce7982e3c9c39293000001|53ce7982e3c9c39293000001 myapp-mydomain.example.com/haproxy-status 127.10.95.3:8080/|53ce7982e3c9c39293000001|53ce7982e3c9c39293000001 After step 3, in the node where the additional gear is created, these entries are created: [root@node2 ~]# grep 53ce7982e3c9c39293000001 /etc/httpd/conf.d/openshift/nodes.txt 53ce79d3e3c9c39293000020-mydomain.example.com 127.6.237.2:8080|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020 53ce79d3e3c9c39293000020-mydomain.example.com/health HEALTH|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020 myapp-mydomain.example.com 127.6.237.2:8080|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020 myapp-mydomain.example.com/health HEALTH|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020 53ce79d3e3c9c39293000020-mydomain.example.com/haproxy-status 127.6.237.3:8080/|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020 myapp-mydomain.example.com/haproxy-status 127.6.237.3:8080/|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020 and result in: [root@node2 ~]# oo-accept-node FAIL: httpd config references DNS name without associated gear: 'myapp-mydomain.example.com' After deleting the app, this remains in node2's frontend configuration: [root@node2 ~]# grep 53ce7982e3c9c39293000001 /etc/httpd/conf.d/openshift/nodes.txt myapp-mydomain.example.com 127.6.237.2:8080|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020 myapp-mydomain.example.com/health HEALTH|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020 myapp-mydomain.example.com/haproxy-status 127.6.237.3:8080/|53ce7982e3c9c39293000001|53ce79d3e3c9c39293000020 and oo-accept-node keeps complaining about it. Expected results: I believe the entries added with the FQDN are intended, in which case oo-accept-node should not complain, and the entries should be cleaned up on app deletion. Additional info: In OSE 2.0 the FQDN entries in the frontend are not created (and oo-accept-node does not complain).
I agree that httpd frontend entries added for secondary haproxy gears should be there, shouldn't be flagged by oo-accept-node, and should go away when the app does. I don't think they cause any real trouble other than oo-accept-node complaining, but that's bug enough.
If we can address bug 1131642 at the same time as this, that would be perfect. It might also be viable to just add the ha-myapp-mydomain route to the frontend instead of the myapp-mydomain one - it makes sense to add routes for any HA alias of the app to the frontend for haproxy gears, but what is going to ask for the app name at the secondary haproxy gear?
It's worth noting that this is only a problem for the mod-rewrite frontend. For vhost (default frontend for OSE 2.2 and Online) I believe there is no complaint. Of course mod-rewrite was the default in 2.1 and is where most existing HA deployments probably still sit. The present plan is to fix (not revert) the existing implementation, with the ha-alias from bug 1131642 being a secondary concern (it requires the broker to actually inform the node about this alias and the node to store it as an alias, so a bit more invasive).
(In reply to Luke Meyer from comment #8) > It's worth noting that this is only a problem for the mod-rewrite frontend. > For vhost (default frontend for OSE 2.2 and Online) I believe there is no > complaint. It also happens with the vhost frontend: $ rhc app enable-ha test RESULT: test is now highly available root@dhcp198 ~ # oo-accept-node FAIL: httpd config references DNS name without associated gear: 'test-pep.ose22.example.com' 1 ERRORS root@dhcp198 ~ # cat /etc/httpd/conf.d/openshift/routes.json | python -m json.tool { "5491f7f72fa4576d0c00076c-pep.ose22.example.com": { "endpoints": [ "127.10.40.130:8080" ], "limits": { "bandwidth": 100, "connections": 5 } }, "test-pep.ose22.example.com": { "endpoints": [ "127.10.40.130:8080" ], "limits": { "bandwidth": 100, "connections": -1 } } } root@dhcp198 ~ # rpm -q rubygem-openshift-origin-node rubygem-openshift-origin-node-1.32.3.1-1.el6op.noarch
You're right, Pep, sorry - I made some bad assumptions. You're right, the problem manifests essentially the same with the vhost frontend.
It seems that there is 2 separate problems. 1- oo-accept-node errors on deployed ha apps (on secondary haproxy gears) 2- oo-accept-node errors on deleted ha apps (on nodes that were hosting former secondary haproxy gears). For problem 1 (deployed ha apps) : On our side we fixed check_system_httpd_configs method of oo-accept-node to compare found routes with OPENSHIFT_APP_DNS and OPENSHIFT_GEAR_DNS when a cartridge with 'web_proxy' category if deployed into a gear. It works very well on deployed ha apps. For problem 2 (deleted ha apps) : It seems located on v2_cart_model ruby file : connect_frontend method create explicitly a route for <app>-<namespace>.<cloud_domain> on every secondary haproxy gears (gears with a cartridge from 'web_proxy' category and a name different from <app>-<namespace>) and disconnect_frontend (that do the deletion) seems never called on a "rhc app delete <app>" (no log found).
There are actually more problems than even that (see upstream bug https://bugzilla.redhat.com/show_bug.cgi?id=1155677#c6). I have a different approach in the works that so far is looking good: https://github.com/openshift/origin-server/pull/6027
Verify this bug on puddle 2.2/2015-02-02.1 1. Create scalable app and make it ha, the two gears are placed on different nodes. Log into the node where the second haproxy gear is created, check the entries in http frontend file. It should only have the entries for the new gear. For mod_rewrite: [root@node2 .httpd.d]# cat nodes.txt yes-myruby-2-yes.ose22-auto.com.cn 127.3.67.2:8080|54d0519682611d9690000056|yes-myruby-2 yes-myruby-2-yes.ose22-auto.com.cn/health HEALTH|54d0519682611d9690000056|yes-myruby-2 yes-myruby-2-yes.ose22-auto.com.cn/haproxy-status 127.3.67.3:8080/|54d0519682611d9690000056|yes-myruby-2 For vhost: [root@node2 .httpd.d]# cat routes.json {"yes-myruby-2-yes.ose22-auto.com.cn":{"endpoints":["127.12.127.2:8080"],"limits":{"connections":5,"bandwidth":100}},"myruby-yes.ose22-auto.com.cn":{"endpoints":["127.12.127.2:8080"],"limits":{"connections":5,"bandwidth":100},"alias":"yes-myruby-2-yes.ose22-auto.com.cn"}} And 'oo-accept-node' passed on this node. 2. With mod_rewrite frontend, idle the second gear, and access the gear via the route. The gear could be started though the time takes a little long. We have Bug 1170040 opened for this. 3. After move the second gear to another node, the route for the gear will be moved together, and the oo-frontend-plugin-modify --save will save the route info for the gear. 4. Delete the app, all the records about this gear in http frontend file will be cleaned in nodes.txt or routes.json. And 'oo-accept-node' passed on the nodes.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0220.html