Bug 1084292 - The app shall be rollback when it failed to be created with unknown nodename
Summary: The app shall be rollback when it failed to be created with unknown nodename
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 2.1.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: ---
Assignee: Luke Meyer
QA Contact: libra bugs
URL:
Whiteboard:
Depends On: 1093804
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-04-04 05:38 UTC by Anping Li
Modified: 2014-06-23 07:37 UTC (History)
5 users (show)

Fixed In Version: rubygem-openshift-origin-controller-1.23.10.2-1.el6op
Doc Type: Bug Fix
Doc Text:
If a customized gear placement plug-in was incorrectly configured and returned an invalid node host name, creating a new application reported a communication error when it could not find the node on which to place gears. However, a record for the failed application was created in the MongoDB datastore, even though related gears did not exist on any nodes. This bug fix adds logic to validate the node host name returned by the gear placement plug-in. If the validation fails, the application creation is rolled back completely and datastore records for failed applications are no longer created.
Clone Of:
: 1093804 (view as bug list)
Environment:
Last Closed: 2014-06-23 07:37:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0781 0 normal SHIPPED_LIVE Red Hat OpenShift Enterprise 2.1.2 bug fix update 2014-06-23 11:36:38 UTC

Description Anping Li 2014-04-04 05:38:10 UTC
Description of problem:
Customizing the Gear Placement Algorithm to return an invalid nodename. The app creation shall fails and the application shall be rollback. 

Version-Release number of selected component (if applicable):
Puddle-2014-04-03.2

How reproducible:
Allways

Steps to Reproduce:
1. Customizing the Gear Placement Algorithm according to http://docbuilder.usersys.redhat.com/20822/#Customizing_the_Gear_Placement_Algorithm
2. Modify NodeSelectionPluginTest to return an invalid nodename.
3. restart openshift-broker and oo-admin-broker-catch -c
4. rhc app create  unkonwnode php-5.3
5. rhc app show unkonwnode

Actual results:
For step 4, it prints an error message
[ose215@dhcp-9-237 ~]$ rhc app create unkonwnode php-5.4
Application Options
-------------------
Domain:     hanli2dom
Cartridges: php-5.4
Gear Size:  default
Scaling:    no
Creating application 'unkonwnode' ... 
An error occurred while communicating with the server. This problem may only be
temporary. Check that you have correctly specified your OpenShift server
'https://br215.ose-201403281.com.cn/broker/rest/domain/hanli2dom/applications'.
For step 5. The application can be show
[ose215@dhcp-9-237 ~]$ rhc app show unkonwnode
unkonwnode @ http://unkonwnode-hanli2dom.ose-201403281.com.cn/
  (uuid: 533e0feb307b9babb3000013)
--------------------------------------------------------------
  Domain:     hanli2dom
  Created:    9:50 AM
  Gears:      1 (defaults to small)
  Git URL:    ssh://533e0feb307b9babb3000013.com.cn/~/git/unkonwnode.git/
  SSH:        533e0feb307b9babb3000013.com.cn
  Deployment: auto (on git push)
  php-5.4 (PHP 5.4)
  -----------------
    Gears: 1 small

Expected results:
In step4, the app failed becuase the node is invalid. The application shall be rollback.

Additional info:

Comment 2 Anping Li 2014-04-04 06:44:14 UTC
Openshift failed to create application due to node unavaiable(the non-exist nodename is provided by the plugin). The app record stored in mongodb shall be cleared by rollback process.

Comment 3 Brenton Leanhardt 2014-04-04 11:49:41 UTC
Abhishek, any idea what needs to be fixed?  I agree we don't want applications in mongo if they weren't successfully deployed.

Comment 4 Abhishek Gupta 2014-04-04 20:46:33 UTC
Ideally, the plugin implementation by any customer should not have this bug. However, if this issue does happen, one option is to use the oo-admin-repair --removed-node command to detect any missing nodes and get rid of gears on those missing/removed nodes.

Comment 5 Abhishek Gupta 2014-04-04 20:47:38 UTC
The correct flag is "removed-nodes"

oo-admin-repair --removed-nodes

Comment 6 Anping Li 2014-04-08 02:22:42 UTC
I guess it isn't a bug of gear replacement. Shall the rollback feature cover it?

By the way, oo-admin-repair --removed-nodes can't remove this type of apps. Get error message as below:

[root@br215 openshift]# oo-admin-repair --removed-nodes
Started at: 2014-04-08 02:13:09 UTC
Total gears found in mongo: 22
Servers that are unresponsive:
	Server: nd216.ose-201403281.com (district: NONE), Confirm [yes/no]: yes

Some servers are unresponsive: nd216.ose-201403281.com

Found 1 unresponsive unscalable apps:
lessnode (id: 534359df307b9b0d13000001)

These apps can not be recovered. Do you want to delete all of them [yes/no]: yes

Finished at: 2014-04-08 02:14:35 UTC
Total time: 86.694s
Unable to delete application with id: 534359df307b9b0d13000001, error: Unable to perform action on app object. Another operation is already running.
FAILED

Comment 7 Anping Li 2014-04-08 02:28:27 UTC
For Comment 6,removed-nodes succeed finally.

Comment 8 Luke Meyer 2014-04-08 19:22:03 UTC
(In reply to Anping Li from comment #6)

> Unable to delete application with id: 534359df307b9b0d13000001, error:
> Unable to perform action on app object. Another operation is already running.

That indicates there is a lock on the application or domain, which is in Mongo. It expires after (I think) half an hour. Would be nice if oo-admin-repair could knock that out too.

Comment 9 Abhishek Gupta 2014-05-01 21:30:26 UTC
This may be a reasonable fix to prevent this issue. 

https://github.com/openshift/origin-server/pull/5366/files

Comment 12 Luke Meyer 2014-06-12 17:02:16 UTC
Cherry-picked from origin-server:

    commit c2264b5c4adad5a8ac91102492484674efba6000
    Author: Abhishek Gupta <abhgupta>
    Date:   Thu May 1 14:17:51 2014 -0700

        Bug 1093804: Validating the node returned by the gear-placement plugin

Comment 13 Anping Li 2014-06-13 06:00:57 UTC
Verified and pass on OSE-2.1.z-2014-06-12.2

1) Customizing the Gear Placement Algorithm and create one app.
hanli1@broker ~]$ rhc apps|grep '@ h'
php @ http://php-hanli1dom.example.com/ (uuid: 539a8e19be1f289f88000009)

2) Modify gear_placement_plugin.rb and return an invalid node name
cat /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-gear-placement-0.1/lib/openshift/gear_placement_plugin.rb|grep return
#      return server_infos.first
       return NodeProperties.new("Hostname")
3) service openshift-broker restart and oo-admin-broker-cache -c

4) create new app, app failed due to invalid node.
[hanli1@broker ~]$ rhc app create  php54 php-5.4
Application Options
-------------------
Domain:     hanli1dom
Cartridges: php-5.4
Gear Size:  default
Scaling:    no

Creating application 'php54' ... 
Unable to complete the requested operation due to: Invalid node selected
Reference ID: 05a2c8b1bc4d359ce3f393d5da98a6ad

5) No residual data are left in mongodb and dns server.

Comment 15 errata-xmlrpc 2014-06-23 07:37:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0781.html


Note You need to log in before you can comment on or make changes to this bug.