Description of problem: Customizing the Gear Placement Algorithm to return an invalid nodename. The app creation shall fails and the application shall be rollback. Version-Release number of selected component (if applicable): Puddle-2014-04-03.2 How reproducible: Allways Steps to Reproduce: 1. Customizing the Gear Placement Algorithm according to http://docbuilder.usersys.redhat.com/20822/#Customizing_the_Gear_Placement_Algorithm 2. Modify NodeSelectionPluginTest to return an invalid nodename. 3. restart openshift-broker and oo-admin-broker-catch -c 4. rhc app create unkonwnode php-5.3 5. rhc app show unkonwnode Actual results: For step 4, it prints an error message [ose215@dhcp-9-237 ~]$ rhc app create unkonwnode php-5.4 Application Options ------------------- Domain: hanli2dom Cartridges: php-5.4 Gear Size: default Scaling: no Creating application 'unkonwnode' ... An error occurred while communicating with the server. This problem may only be temporary. Check that you have correctly specified your OpenShift server 'https://br215.ose-201403281.com.cn/broker/rest/domain/hanli2dom/applications'. For step 5. The application can be show [ose215@dhcp-9-237 ~]$ rhc app show unkonwnode unkonwnode @ http://unkonwnode-hanli2dom.ose-201403281.com.cn/ (uuid: 533e0feb307b9babb3000013) -------------------------------------------------------------- Domain: hanli2dom Created: 9:50 AM Gears: 1 (defaults to small) Git URL: ssh://533e0feb307b9babb3000013.com.cn/~/git/unkonwnode.git/ SSH: 533e0feb307b9babb3000013.com.cn Deployment: auto (on git push) php-5.4 (PHP 5.4) ----------------- Gears: 1 small Expected results: In step4, the app failed becuase the node is invalid. The application shall be rollback. Additional info:
Openshift failed to create application due to node unavaiable(the non-exist nodename is provided by the plugin). The app record stored in mongodb shall be cleared by rollback process.
Abhishek, any idea what needs to be fixed? I agree we don't want applications in mongo if they weren't successfully deployed.
Ideally, the plugin implementation by any customer should not have this bug. However, if this issue does happen, one option is to use the oo-admin-repair --removed-node command to detect any missing nodes and get rid of gears on those missing/removed nodes.
The correct flag is "removed-nodes" oo-admin-repair --removed-nodes
I guess it isn't a bug of gear replacement. Shall the rollback feature cover it? By the way, oo-admin-repair --removed-nodes can't remove this type of apps. Get error message as below: [root@br215 openshift]# oo-admin-repair --removed-nodes Started at: 2014-04-08 02:13:09 UTC Total gears found in mongo: 22 Servers that are unresponsive: Server: nd216.ose-201403281.com (district: NONE), Confirm [yes/no]: yes Some servers are unresponsive: nd216.ose-201403281.com Found 1 unresponsive unscalable apps: lessnode (id: 534359df307b9b0d13000001) These apps can not be recovered. Do you want to delete all of them [yes/no]: yes Finished at: 2014-04-08 02:14:35 UTC Total time: 86.694s Unable to delete application with id: 534359df307b9b0d13000001, error: Unable to perform action on app object. Another operation is already running. FAILED
For Comment 6,removed-nodes succeed finally.
(In reply to Anping Li from comment #6) > Unable to delete application with id: 534359df307b9b0d13000001, error: > Unable to perform action on app object. Another operation is already running. That indicates there is a lock on the application or domain, which is in Mongo. It expires after (I think) half an hour. Would be nice if oo-admin-repair could knock that out too.
This may be a reasonable fix to prevent this issue. https://github.com/openshift/origin-server/pull/5366/files
https://github.com/openshift/enterprise-server/pull/283
Cherry-picked from origin-server: commit c2264b5c4adad5a8ac91102492484674efba6000 Author: Abhishek Gupta <abhgupta> Date: Thu May 1 14:17:51 2014 -0700 Bug 1093804: Validating the node returned by the gear-placement plugin
Verified and pass on OSE-2.1.z-2014-06-12.2 1) Customizing the Gear Placement Algorithm and create one app. hanli1@broker ~]$ rhc apps|grep '@ h' php @ http://php-hanli1dom.example.com/ (uuid: 539a8e19be1f289f88000009) 2) Modify gear_placement_plugin.rb and return an invalid node name cat /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-gear-placement-0.1/lib/openshift/gear_placement_plugin.rb|grep return # return server_infos.first return NodeProperties.new("Hostname") 3) service openshift-broker restart and oo-admin-broker-cache -c 4) create new app, app failed due to invalid node. [hanli1@broker ~]$ rhc app create php54 php-5.4 Application Options ------------------- Domain: hanli1dom Cartridges: php-5.4 Gear Size: default Scaling: no Creating application 'php54' ... Unable to complete the requested operation due to: Invalid node selected Reference ID: 05a2c8b1bc4d359ce3f393d5da98a6ad 5) No residual data are left in mongodb and dns server.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0781.html