Bug 1041628

Summary: Ssh keys for gears are removed even when the gear deletion fails and is rolledback
Product: OpenShift Online Reporter: Abhishek Gupta <abhgupta>
Component: PodAssignee: Abhishek Gupta <abhgupta>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.xCC: jhou
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-30 00:53:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Abhishek Gupta 2013-12-12 18:20:47 UTC
Description of problem:
If a gear deletion operation is requested (either via a scaledown, or removal of a db cartridge in a scalable application) but the operation fails to delete the gear and is rolledback, the ssh key pertaining to the gear is still removed from mongo and an op_group is added to the application to remove the key from all the other gears in the application as well.

Version-Release number of selected component (if applicable):


How reproducible:
Whenever a gear deletion operation fails before the gear deletion and is rolledback.

Steps to Reproduce:
1. Create a scalable application and add a db cartridge to it
2. Modify the code to introduce an error using either 2a or 2b or something else
2a. Raise an exception in the broker code in application.rb run_jobs method
2b. Raise an exception in the mcollective plugin (agent) code in in the oo_app_destroy method
3. Remove the db cartridge

Actual results:
After step 3, the operation fails and the cartridge is not removed. However, the ssh key pertaining to the db gear is removed from mongo and an op_group is added to remove the ssh key from the remaining application gears.

Expected results:
The ssh key for the db gear should not be removed and the op_group to remove it from the other application gears should also not be added.

Additional info:

Comment 1 Abhishek Gupta 2013-12-12 18:21:57 UTC
Fixed with --> https://github.com/openshift/origin-server/pull/4328

Comment 2 Jianwei Hou 2013-12-13 06:40:09 UTC
Verified on devenv-stage_614. Will mark this bug as verified once it's on qa

1. rhc create-app php1s php-5.3 mysql-5.1 -s
2. Induce an error in the run_jobs method of application.rb
def run_jobs(result_io=nil)
raise "Testing bug 1041628"
...
3. Restart broker, mcollective and clear broker cache
4. Remove the mysql gear from app
Got result:
Removing mysql-5.1 from 'php1s' ... 
Unable to complete the requested operation due to: Testing Bug 1041628.
Reference ID: 915ca06ca8418642c6093cb023c7c41c

5. Fix the induced error, and restart services, clear cache
6. Do any operation against the app, eg: restart
App is restarted
7. On broker, verify the pending_ops is cleared, the application ssh key of mysql gear has been removed from both mongo and .ssh/authorized_keys

Result:
After step 7, the mysql gear's app ssh key is removed from mongo and .ssh/authorized_keys

libra_rs:PRIMARY> db.applications.findOne({},{app_ssh_keys:1})
{
        "_id" : ObjectId("52aaa8eb2932ee10c1000006"),
        "app_ssh_keys" : [
                {
                        "_id" : ObjectId("52aaa90e2932ee10c1000026"),
                        "_type" : "ApplicationSshKey",
                        "component_id" : ObjectId("52aaa8eb2932ee10c1000006"),
                        "content" : "AAAAB3NzaC1yc2EAAAABIwAAAQEA7votxIkXKNUX4R62fhmllKDrWPsnelxqjOWqS4ag0zibhD0CpaWwk2hZLqZ3peODzVO6AoRzQfZv55xqFcrpXp9RoW9lZcaOAa0319mty3UeKKou3176A9hvSA74GhqebxXQSa86Z1Gfv7a094hpy5n80WoYgP1pKyAXh2Gy6uuKdtkJ82ogRrM5DtNznf5qNIWy+iwmpA51/HaZo8lB2eRYwjeYq3h4mUsqekIZTekOWgrRANP9g4XeERY9R/BY1wUfmww2c7oqWdRvbxuatErBVIwNU5WAcEKk3KjaCx8eFHz2AnURBQFEE6i8oZ100/2obs9/PKueYnbKZcpLNQ==",
                        "name" : "application-52aaa8eb2932ee10c1000006",
                        "type" : "ssh-rsa"
                }
        ]
}


[root@ip-10-170-10-241 php1s-jhou]# cat .ssh/authorized_keys 
command="/usr/bin/oo-trap-user",no-X11-forwarding ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA7votxIkXKNUX4R62fhmllKDrWPsnelxqjOWqS4ag0zibhD0CpaWwk2hZLqZ3peODzVO6AoRzQfZv55xqFcrpXp9RoW9lZcaOAa0319mty3UeKKou3176A9hvSA74GhqebxXQSa86Z1Gfv7a094hpy5n80WoYgP1pKyAXh2Gy6uuKdtkJ82ogRrM5DtNznf5qNIWy+iwmpA51/HaZo8lB2eRYwjeYq3h4mUsqekIZTekOWgrRANP9g4XeERY9R/BY1wUfmww2c7oqWdRvbxuatErBVIwNU5WAcEKk3KjaCx8eFHz2AnURBQFEE6i8oZ100/2obs9/PKueYnbKZcpLNQ== OPENSHIFT-52aaa8eb2932ee10c1000006-application-52aaa8eb2932ee10c1000006
command="/usr/bin/oo-trap-user",no-X11-forwarding ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDh43XWg7LERxupWx/Ym9hswfA6loRkpMi5JOfX5C49RW6M6JnpyF53u8/VFYYADN8YN+9wwUEPrUTi6LsZAheAfgw5nxc3VTqFaiHuwFP9oc8yPCgclm+vC0Fn3S2foAjHqO1+fRDYDztAciD0uBT+pWqMsMrqdTdpHvgc6U6tYdKRqKCwNjo4K2bueq3MUMOPdxJp1SvrVJK0I+BNhG4iSaGFt++2dYr2X389kQ+6MNYpJpOuqXC734wWm27CQLvYtWRr3QXfKSV3HnybdYZPc/ffV7xE6MLMkbznBnCgdM8b8AvB41xMz5OvnyGf8+tGDrtKN+qL2+pLBPgZbxoF OPENSHIFT-52aaa8eb2932ee10c1000006-52aaa7f42932ee10c1000001-default

Comment 3 Jianwei Hou 2013-12-13 08:36:41 UTC
Correction to my comments to the bug...
The fix has not been merged into devenv-stage_614, I have deployed the pull request to my environment and tested this bug.

1. rhc create-app php1s php-5.3 mysql-5.1 -s
2. Induce an error in the run_jobs method of application.rb
def run_jobs(result_io=nil)
raise "Testing bug 1041628"
...
3. Restart broker, mcollective and clear broker cache
4. Remove the mysql gear from app
Got result:
Removing mysql-5.1 from 'php1s' ... 
Unable to complete the requested operation due to: Testing Bug 1041628.
Reference ID: 915ca06ca8418642c6093cb023c7c41c
5. Query mongo, the mysql ssh key is still present and there is no such op_group to remove the ssh key from the other gears. The following is what was got after step 4:
<-------------------->
"pending_op_groups" : [
                {
                        "_id" : ObjectId("52aac4b32932eef5e6000001"),
                        "num_gears_added" : 0,
                        "num_gears_removed" : 0,
                        "num_gears_created" : 0,
                        "num_gears_destroyed" : 0,
                        "num_gears_rolled_back" : 0,
                        "user_agent" : null,
                        "features" : [
                                "mysql-5.1"
                        ],
                        "group_overrides" : [ ],
                        "remove_all_features" : false,
                        "_type" : "RemoveFeaturesOpGroup",
                        "parent_op_id" : null,
                        "updated_at" : ISODate("2013-12-13T08:26:27.414Z"),
                        "created_at" : ISODate("2013-12-13T08:26:27.414Z")
                }
        ],

<-------------------->
6. Fix the induced error, and restart services, clear cache
7. Do any operation against the app, eg: restart
App is restarted
8. On broker, verify the pending_ops is cleared, the application ssh key of mysql gear has been removed from both mongo and .ssh/authorized_keys

This bug has been fixed according to above results. waiting for merge to STG.

Comment 4 Abhishek Gupta 2013-12-13 18:22:05 UTC
The proposed fix for this bug is undergoing review and changes.

Comment 5 Abhishek Gupta 2013-12-17 21:06:01 UTC
The fix is now merged into master

Comment 6 Jianwei Hou 2013-12-18 06:07:01 UTC
Verified on devenv-stage_619. Steps are described in comment 3

Comment 7 Abhishek Gupta 2013-12-18 19:19:10 UTC
How was this verified in stage? This fix never made it to stage. This fix will be part of the regular release and not a hotfix anymore. Please verify this on a regular devenv.

Comment 8 Jianwei Hou 2013-12-19 02:53:47 UTC
Sorry, verified on devenv_4154

1. rhc create-app php1s php-5.3 mysql-5.1 -s
2. Induce an error in the run_jobs method of application.rb
def run_jobs(result_io=nil)
raise "Testing bug 1041628"
...
3. Restart broker, mcollective and clear broker cache
4. Remove the mysql gear from app
5. Query mongo, the mysql ssh key is still present and there is no such op_group to remove the ssh key from the other gears. 
<--------------------------------->
> db.applications.findOne({},{app_ssh_keys:1,pending_op_groups:1})
{
  "_id": ObjectId("52b255afe551c776b5000006"),
  "app_ssh_keys": [
    {
      "_id": ObjectId("52b255b4e551c776b5000020"),
      "_type": "ApplicationSshKey",
      "component_id": ObjectId("52b255afe551c776b5000006"),
      "content": "AAAAB3NzaC1yc2EAAAABIwAAAQEAveDG6Js6ScgJuv7t6YbaOI6Erk7i0rTFA1FIhioc0AvgMaFBzb0XyKikJy6SHmNu4jUo+yC6YMSFQvg8vPaId4MetkjH5LkFPdXhViPxwodrOQodT8q/NhgmqKln3vJacIHya2kFRJra7B+/EgznqLnn5fgr7v4vzN5mwWi/iux8kroetCC7MhvVhuWYe/P08EjMsJ4tX3ohO8xQcuL4tZkf7P8u1w7MQTVBauCpeufkhA7lGrIXcwzSYx0iVN+JUa/SHhLkmrQwYKqJ7oPxB5BqFWjz/0+iTiJna21NSDRyLzfJlQfHv16+59EwpW/IpiOvPD6DZpZGJwf8ya8NAw==",
      "created_at": ISODate("2013-12-19T02:11:00.972Z"),
      "name": "application-52b255afe551c776b5000006",
      "type": "ssh-rsa"
    },
    {
      "_id": ObjectId("52b255e1e551c776b500003b"),
      "_type": "ApplicationSshKey",
      "component_id": ObjectId("52b255dce551c776b5000028"),
      "content": "AAAAB3NzaC1yc2EAAAABIwAAAQEA1CGPYaEp5Z56rVPEVQTyE5Sey5W/RwPv59o2cPXj0bbPmw+O8ptGQT19QPuWW8BbFKbIi+k/B3g3yEsSrjpetDvaoYV/X12iFYLsmxxOdCnQHKaBQcVvE7/wileBadNWroZio2GkbIsjnSZ6u3r00yVSDt1fc94kXz+0I//cOqW6l8pNe2EVHtSPL558d055xFvOzrL7ng1ky6edkZ5yl5phnoaqtZv/srDPr+KfPscs88O+4jmk1V55iz3dzo7bj+M9FYKwjOutY276Ek2laedKqTrx/+k5rFRMpEziVjY9IIY+f2M7feiIt2R3RXLvfypXPTW1wSKTpXJLQwDk9w==",
      "created_at": ISODate("2013-12-19T02:11:45.508Z"),
      "name": "application-e606c314685211e386b712313907ba57",
      "type": "ssh-rsa"
    }
  ],
  "pending_op_groups": [
    {
      "_id": ObjectId("52b25bdce551c7c808000001"),
      "num_gears_added": 0,
      "num_gears_removed": 0,
      "num_gears_created": 0,
      "num_gears_destroyed": 0,
      "num_gears_rolled_back": 0,
      "user_agent": null,
      "features": [
        "mysql-5.1"
      ],
      "group_overrides": [ ],
      "remove_all_features": false,
      "_type": "RemoveFeaturesOpGroup",
      "parent_op_id": null,
      "updated_at": ISODate("2013-12-19T02:37:16.754Z"),
      "created_at": ISODate("2013-12-19T02:37:16.754Z")
    }
  ]
}
<--------------------------------->
6. Fix the induced error, and restart services, clear cache
7. Do any operation against the app, eg: restart
App is restarted
8. On broker, verify the pending_ops is cleared, the application ssh key of mysql gear has been removed from both mongo and .ssh/authorized_keys