Bug 1093804 - The app shall be rollback when it failed to be created with unknown nodename
Summary: The app shall be rollback when it failed to be created with unknown nodename
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Pod
Version: 2.x
Hardware: Unspecified
OS: Unspecified
high
low
Target Milestone: ---
: ---
Assignee: Abhishek Gupta
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks: 1084292
TreeView+ depends on / blocked
 
Reported: 2014-05-02 17:32 UTC by Luke Meyer
Modified: 2015-05-15 00:28 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1084292
Environment:
Last Closed: 2014-07-15 10:28:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Luke Meyer 2014-05-02 17:32:52 UTC
+++ This bug was initially created as a clone of Bug #1084292 +++

Description of problem:
Customizing the Gear Placement Algorithm to return an invalid nodename. The app creation shall fails and the application shall be rollback. 

Version-Release number of selected component (if applicable):
Puddle-2014-04-03.2

How reproducible:
Allways

Steps to Reproduce:
1. Customizing the Gear Placement Algorithm according to http://docbuilder.usersys.redhat.com/20822/#Customizing_the_Gear_Placement_Algorithm
2. Modify NodeSelectionPluginTest to return an invalid nodename.
3. restart openshift-broker and oo-admin-broker-catch -c
4. rhc app create  unkonwnode php-5.3
5. rhc app show unkonwnode

Actual results:
For step 4, it prints an error message
[ose215@dhcp-9-237 ~]$ rhc app create unkonwnode php-5.4
Application Options
-------------------
Domain:     hanli2dom
Cartridges: php-5.4
Gear Size:  default
Scaling:    no
Creating application 'unkonwnode' ... 
An error occurred while communicating with the server. This problem may only be
temporary. Check that you have correctly specified your OpenShift server
'https://br215.ose-201403281.com.cn/broker/rest/domain/hanli2dom/applications'.
For step 5. The application can be show
[ose215@dhcp-9-237 ~]$ rhc app show unkonwnode
unkonwnode @ http://unkonwnode-hanli2dom.ose-201403281.com.cn/
  (uuid: 533e0feb307b9babb3000013)
--------------------------------------------------------------
  Domain:     hanli2dom
  Created:    9:50 AM
  Gears:      1 (defaults to small)
  Git URL:    ssh://533e0feb307b9babb3000013.com.cn/~/git/unkonwnode.git/
  SSH:        533e0feb307b9babb3000013.com.cn
  Deployment: auto (on git push)
  php-5.4 (PHP 5.4)
  -----------------
    Gears: 1 small

Expected results:
In step4, the app failed becuase the node is invalid. The application shall be rollback.

Additional info:

--- Additional comment from RHEL Product and Program Management on 2014-04-04 02:08:56 EDT ---

Since this issue was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from Anping Li on 2014-04-04 02:44:14 EDT ---

Openshift failed to create application due to node unavaiable(the non-exist nodename is provided by the plugin). The app record stored in mongodb shall be cleared by rollback process.

--- Additional comment from Brenton Leanhardt on 2014-04-04 07:49:41 EDT ---

Abhishek, any idea what needs to be fixed?  I agree we don't want applications in mongo if they weren't successfully deployed.

--- Additional comment from Abhishek Gupta on 2014-04-04 16:46:33 EDT ---

Ideally, the plugin implementation by any customer should not have this bug. However, if this issue does happen, one option is to use the oo-admin-repair --removed-node command to detect any missing nodes and get rid of gears on those missing/removed nodes.

--- Additional comment from Abhishek Gupta on 2014-04-04 16:47:38 EDT ---

The correct flag is "removed-nodes"

oo-admin-repair --removed-nodes

--- Additional comment from Anping Li on 2014-04-07 22:22:42 EDT ---

I guess it isn't a bug of gear replacement. Shall the rollback feature cover it?

By the way, oo-admin-repair --removed-nodes can't remove this type of apps. Get error message as below:

[root@br215 openshift]# oo-admin-repair --removed-nodes
Started at: 2014-04-08 02:13:09 UTC
Total gears found in mongo: 22
Servers that are unresponsive:
	Server: nd216.ose-201403281.com (district: NONE), Confirm [yes/no]: yes

Some servers are unresponsive: nd216.ose-201403281.com

Found 1 unresponsive unscalable apps:
lessnode (id: 534359df307b9b0d13000001)

These apps can not be recovered. Do you want to delete all of them [yes/no]: yes

Finished at: 2014-04-08 02:14:35 UTC
Total time: 86.694s
Unable to delete application with id: 534359df307b9b0d13000001, error: Unable to perform action on app object. Another operation is already running.
FAILED

--- Additional comment from Anping Li on 2014-04-07 22:28:27 EDT ---

For Comment 6,removed-nodes succeed finally.

--- Additional comment from Luke Meyer on 2014-04-08 15:22:03 EDT ---

(In reply to Anping Li from comment #6)

> Unable to delete application with id: 534359df307b9b0d13000001, error:
> Unable to perform action on app object. Another operation is already running.

That indicates there is a lock on the application or domain, which is in Mongo. It expires after (I think) half an hour. Would be nice if oo-admin-repair could knock that out too.

--- Additional comment from Abhishek Gupta on 2014-05-01 17:30:26 EDT ---

This may be a reasonable fix to prevent this issue. 

https://github.com/openshift/origin-server/pull/5366/files

Comment 1 Abhishek Gupta 2014-05-05 17:54:38 UTC
Fixed with --> https://github.com/openshift/origin-server/pull/5366

Comment 2 openshift-github-bot 2014-05-05 19:01:30 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/c2264b5c4adad5a8ac91102492484674efba6000
Bug 1093804: Validating the node returned by the gear-placement plugin

Comment 3 Jianwei Hou 2014-05-06 03:08:51 UTC
Verified on devenv_4760

Steps are a little different in Online, the gem have to be installed in container and added to Broker Gemfile. These steps are done in oo-broker-modify.

When the application failed creation, it is rolled back. There is no app record in mongo.

Application Options
-------------------
Domain:     jhou
Cartridges: diy-0.1
Gear Size:  default
Scaling:    no

Creating application 'd1' ... 
Unable to complete the requested operation due to: Invalid node selected
Reference ID: 62e8dc2c5027744a98d526f3186d1820

development.log:
2014-05-06 03:05:32.894 [DEBUG] Rollback ReserveGearUidOp gear_id=536889b7ebdea14202000001 (pid:79)
2014-05-06 03:05:32.950 [DEBUG] Rollback NotifyAppCreateOp (pid:79)
2014-05-06 03:05:32.950 [DEBUG] Rollback not implemented: NotifyAppCreateOp (pid:79)
2014-05-06 03:05:32.952 [DEBUG] Rollback InitGearOp comp_specs=[component:diy-0.1/diy-0.1/53686e2ffbe932749b000019] gear_id=536889b7ebdea14202000001 group_instance_id=536889b7ebdea14202000003 (pid:79)
2014-05-06 03:05:33.231 [DEBUG] FAILURE ACTION=ADD_APPLICATION USER_ID=5368841a8774ec36c3000001 LOGIN=jhou APP_UUID=536889b7ebdea14202000001 DOMAIN=jhou Unable to complete the requested operation due to: Invalid node selected Unable to complete the requested operation due to: Invalid node selected
Reference ID: 62e8dc2c5027744a98d526f3186d1820 (pid:79)


Note You need to log in before you can comment on or make changes to this bug.