+++ This bug was initially created as a clone of Bug #1084292 +++ Description of problem: Customizing the Gear Placement Algorithm to return an invalid nodename. The app creation shall fails and the application shall be rollback. Version-Release number of selected component (if applicable): Puddle-2014-04-03.2 How reproducible: Allways Steps to Reproduce: 1. Customizing the Gear Placement Algorithm according to http://docbuilder.usersys.redhat.com/20822/#Customizing_the_Gear_Placement_Algorithm 2. Modify NodeSelectionPluginTest to return an invalid nodename. 3. restart openshift-broker and oo-admin-broker-catch -c 4. rhc app create unkonwnode php-5.3 5. rhc app show unkonwnode Actual results: For step 4, it prints an error message [ose215@dhcp-9-237 ~]$ rhc app create unkonwnode php-5.4 Application Options ------------------- Domain: hanli2dom Cartridges: php-5.4 Gear Size: default Scaling: no Creating application 'unkonwnode' ... An error occurred while communicating with the server. This problem may only be temporary. Check that you have correctly specified your OpenShift server 'https://br215.ose-201403281.com.cn/broker/rest/domain/hanli2dom/applications'. For step 5. The application can be show [ose215@dhcp-9-237 ~]$ rhc app show unkonwnode unkonwnode @ http://unkonwnode-hanli2dom.ose-201403281.com.cn/ (uuid: 533e0feb307b9babb3000013) -------------------------------------------------------------- Domain: hanli2dom Created: 9:50 AM Gears: 1 (defaults to small) Git URL: ssh://533e0feb307b9babb3000013.com.cn/~/git/unkonwnode.git/ SSH: 533e0feb307b9babb3000013.com.cn Deployment: auto (on git push) php-5.4 (PHP 5.4) ----------------- Gears: 1 small Expected results: In step4, the app failed becuase the node is invalid. The application shall be rollback. Additional info: --- Additional comment from RHEL Product and Program Management on 2014-04-04 02:08:56 EDT --- Since this issue was entered in bugzilla, the release flag has been set to ? to ensure that it is properly evaluated for this release. --- Additional comment from Anping Li on 2014-04-04 02:44:14 EDT --- Openshift failed to create application due to node unavaiable(the non-exist nodename is provided by the plugin). The app record stored in mongodb shall be cleared by rollback process. --- Additional comment from Brenton Leanhardt on 2014-04-04 07:49:41 EDT --- Abhishek, any idea what needs to be fixed? I agree we don't want applications in mongo if they weren't successfully deployed. --- Additional comment from Abhishek Gupta on 2014-04-04 16:46:33 EDT --- Ideally, the plugin implementation by any customer should not have this bug. However, if this issue does happen, one option is to use the oo-admin-repair --removed-node command to detect any missing nodes and get rid of gears on those missing/removed nodes. --- Additional comment from Abhishek Gupta on 2014-04-04 16:47:38 EDT --- The correct flag is "removed-nodes" oo-admin-repair --removed-nodes --- Additional comment from Anping Li on 2014-04-07 22:22:42 EDT --- I guess it isn't a bug of gear replacement. Shall the rollback feature cover it? By the way, oo-admin-repair --removed-nodes can't remove this type of apps. Get error message as below: [root@br215 openshift]# oo-admin-repair --removed-nodes Started at: 2014-04-08 02:13:09 UTC Total gears found in mongo: 22 Servers that are unresponsive: Server: nd216.ose-201403281.com (district: NONE), Confirm [yes/no]: yes Some servers are unresponsive: nd216.ose-201403281.com Found 1 unresponsive unscalable apps: lessnode (id: 534359df307b9b0d13000001) These apps can not be recovered. Do you want to delete all of them [yes/no]: yes Finished at: 2014-04-08 02:14:35 UTC Total time: 86.694s Unable to delete application with id: 534359df307b9b0d13000001, error: Unable to perform action on app object. Another operation is already running. FAILED --- Additional comment from Anping Li on 2014-04-07 22:28:27 EDT --- For Comment 6,removed-nodes succeed finally. --- Additional comment from Luke Meyer on 2014-04-08 15:22:03 EDT --- (In reply to Anping Li from comment #6) > Unable to delete application with id: 534359df307b9b0d13000001, error: > Unable to perform action on app object. Another operation is already running. That indicates there is a lock on the application or domain, which is in Mongo. It expires after (I think) half an hour. Would be nice if oo-admin-repair could knock that out too. --- Additional comment from Abhishek Gupta on 2014-05-01 17:30:26 EDT --- This may be a reasonable fix to prevent this issue. https://github.com/openshift/origin-server/pull/5366/files
Fixed with --> https://github.com/openshift/origin-server/pull/5366
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/c2264b5c4adad5a8ac91102492484674efba6000 Bug 1093804: Validating the node returned by the gear-placement plugin
Verified on devenv_4760 Steps are a little different in Online, the gem have to be installed in container and added to Broker Gemfile. These steps are done in oo-broker-modify. When the application failed creation, it is rolled back. There is no app record in mongo. Application Options ------------------- Domain: jhou Cartridges: diy-0.1 Gear Size: default Scaling: no Creating application 'd1' ... Unable to complete the requested operation due to: Invalid node selected Reference ID: 62e8dc2c5027744a98d526f3186d1820 development.log: 2014-05-06 03:05:32.894 [DEBUG] Rollback ReserveGearUidOp gear_id=536889b7ebdea14202000001 (pid:79) 2014-05-06 03:05:32.950 [DEBUG] Rollback NotifyAppCreateOp (pid:79) 2014-05-06 03:05:32.950 [DEBUG] Rollback not implemented: NotifyAppCreateOp (pid:79) 2014-05-06 03:05:32.952 [DEBUG] Rollback InitGearOp comp_specs=[component:diy-0.1/diy-0.1/53686e2ffbe932749b000019] gear_id=536889b7ebdea14202000001 group_instance_id=536889b7ebdea14202000003 (pid:79) 2014-05-06 03:05:33.231 [DEBUG] FAILURE ACTION=ADD_APPLICATION USER_ID=5368841a8774ec36c3000001 LOGIN=jhou APP_UUID=536889b7ebdea14202000001 DOMAIN=jhou Unable to complete the requested operation due to: Invalid node selected Unable to complete the requested operation due to: Invalid node selected Reference ID: 62e8dc2c5027744a98d526f3186d1820 (pid:79)