Bug 1039151

Summary: Ssh keys for new gears are left behind when the gear creation/configuration fails and is rolledback
Product: OpenShift Online Reporter: Abhishek Gupta <abhgupta>
Component: PodAssignee: Abhishek Gupta <abhgupta>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.xCC: abhgupta, jhou, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-30 00:52:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Abhishek Gupta 2013-12-06 19:13:30 UTC
Description of problem:
When creating a new gear, an ssh key is added for it. This ssh key is added to the application mongo document and sent over to other gears. However, if the gear creation/configuration fails and is rolled back, the ssh key still persists and is present in both mongo as well as on other gears of the application.

Version-Release number of selected component (if applicable):


How reproducible:
Always, whenever a new gear creation fails and is rolled back

Steps to Reproduce:
1. Create a scalable application
2. Add a db cartridge or scale up the web_framework cartridge
3. Insert an error to induce a failure in the configure/post-configure step.


Actual results:
After step #3, the operation is rolled back and the new gear is removed. However, the ssh key added for this gear is still present in mongo and even added to the other gears. 

Expected results:
The gear's ssh key should be removed from mongo and from all the other gears in case of a rollback in the gear creation/configuration process.

Additional info:

The operation to add the gear's ssh key to the other gears is managed through a separate op_group. In case of a rollback, this new op_group is not triggered/executed and is executed the next time the user performs an action on the application.

Comment 1 Abhishek Gupta 2013-12-06 19:16:31 UTC
Fixed with --> https://github.com/openshift/origin-server/pull/4294

Comment 3 Jianwei Hou 2013-12-09 05:51:43 UTC
Tested on devenv-stage_604

1. Create a scalable application
2. Add mysql to the application
3. Induce an error to the configure method in /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.18.2/lib/openshift-origin-node/model/v2_cart_model.rb
eg:
def configure(cartridge_name, template_git_url = nil, manifest = nil)
  return -1
  exit
....
4. Restart broker/mcollective, clear broker cache
5. Query mongo and there are 2 application sshkeys present
libra_rs:PRIMARY> db.applications.findOne({name:"php1s"},{app_ssh_keys:1})
{
	"_id" : ObjectId("52a55699109f4d2029000004"),
	"app_ssh_keys" : [
		{
			"_id" : ObjectId("52a556bd109f4d2029000024"),
			"_type" : "ApplicationSshKey",
			"component_id" : ObjectId("52a55699109f4d2029000004"),
			"content" : "AAAAB3NzaC1yc2EAAAABIwAAAQEA4dyeqAqSmSY8o6/K36fv36mrsuHD6WPoa22GamFdHINDFyALygZb8m0kO2EUOeIylw0SUtHC/SVR0Ix79yy+MA+wbeIff/ptpVMHj1i6dkxN22cSKjX6VBW/M2K0HR/fNL3QUBHR2ozTlpKzShPnUuFFvUfosqXSO/92W1opYXINEUBBzmaxTB0Iv3KT0pEN+DerG/23nZyj7svMzlYf63sVCcwM7ar5By+3uwYj1qVm2Vz37weHOTPDfFT5IiGfPRkOkxRVZ6LgaedTYH3PFimzRxAfFM3IXREHiBP+8COODn2aVDTwtHrTxBRJpjP4O0RjhNX9C0b7IllEyrWW0Q==",
			"name" : "application-52a55699109f4d2029000004",
			"type" : "ssh-rsa"
		},
		{
			"_id" : ObjectId("52a556ed109f4d2029000040"),
			"_type" : "ApplicationSshKey",
			"component_id" : ObjectId("52a556cd109f4d2029000027"),
			"content" : "AAAAB3NzaC1yc2EAAAABIwAAAQEA/QZtE5/1wuWLk0/+gF2+Wwvpieg/7NG08M89esFjZ/+0BBoiUHJxCjFrDDFfNRpNhWlSgqlq2ALdy84uUKXZZZlLAUnInk5Zw/lMKEhMotpyQGIE/HfivG7wkYOPXsJxnYyrSgTobdhy8HTWyOXnnsc8UNccB7B9PzM9wU2rycih9NstDZoGzNSYwoiCOA/0uZIbae+K0lXNJevNJFkYjfgiENbZZnHMvgfoxEYPdUDBXg8tw4HkX4BVesm4r33eU6+OE8HZUFki9zJpUYtg+e84YtxLCJJMWTaLyP467Mm/GCgWDOzNIaeIVmYUdGeMY11mvibNBYy2hFzrTF225w==",
			"name" : "application-d0df9a2c609311e3a44022000a9704fb",
			"type" : "ssh-rsa"
		}
	]
}


6. Scale up the application:
Failed to create the gear, query mongo again, there are 3 application sshkeys, the pending ops can not be cleared.

Not sure how to correctly induce the error, but according the my testing, looks like the error I have induced has a destructive impact on the application, the pending ops remains there and can't be cleared.

@abhgupta could you please give me more information on how to induce the error? Thanks!

Comment 4 Abhishek Gupta 2013-12-10 21:26:15 UTC
Subsequently, just remove the error that you induced and try to perform any operation on the application. It should clear the stuck pending_op and test this issue.

Comment 5 Abhishek Gupta 2013-12-10 21:26:49 UTC
You'll have to restart the broker after you change the code to remove the error.

Comment 6 Jianwei Hou 2013-12-11 06:02:16 UTC
Thanks, tested on devenv-stage_609. The pending_ops was rolled back. However, after rolling back, the application ssh key of the new gear was still in mongo and can be added to new gears of the app. The test steps were same as comment 3.

The app now has 3 gears, but the authorized_keys of the gear has 4 application ssh keys and 1 user ssh key:
1 master % rhc app-show  php1s --gears
ID                              State   Cartridges          Size  SSH URL
------------------------------- ------- ------------------- ----- ----------------------------------------------------------------------------------
52a7fc136e8f04b70b000026        started php-5.3 haproxy-1.4 small 52a7fc136e8f04b70b000026.rhcloud.com
52a7fc416e8f042f03000002        started mysql-5.1           small 52a7fc416e8f042f03000002.rhcloud.com
13c0b466622911e3a8ed1231390e8c89 started mongodb-2.2         small 13c0b466622911e3a8ed1231390e8c89.rhcloud.com

[root@domU-12-31-39-0E-8C-89 .ssh]# pwd
/var/lib/openshift/13c0b466622911e3a8ed1231390e8c89/.ssh
[root@domU-12-31-39-0E-8C-89 .ssh]# cat authorized_keys 
command="/usr/bin/oo-trap-user",no-X11-forwarding ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAp85i/mFPDGVpKAEDGwAD+qCKqJ8C0UP5tom/TIVlYAWhSyCooxN1mdXXFgzrce7Jw8DdoGB9EsXLIjxDJ7ZHzXHulsdXW9XZTGhMbZN3W3Z+Z1a71qhY9Vd9TimHX2CwI47e/4KLzp/LLL672hchmHRwgQwAIunMy3IKp14/YZptSsPtf5x0XZnN+7PVGB1L6D2amGbOluexaAP7DHxfw0R2AiieoijTvfaLmChN0KckGD4ri3HlmvPhUHYH7qVw7DRSB3ZptL1omWwgGCp0mLU7SC/xXlJK6RW4hhyl3Q6+QHbhGLjGNDTCVaGQ0Cwgz3/HCGya3tEQwUUNrfu1iQ== OPENSHIFT-13c0b466622911e3a8ed1231390e8c89-application-52a7fc136e8f04b70b000026
command="/usr/bin/oo-trap-user",no-X11-forwarding ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA8AYc52f6eDiwcsM/opOpHFjqbltvd8eCzw6alGjZ1TbR0rLW411t5dgx0gXLLrbrratJG4lhlEurKPdGxAOa/SAnyAQ8qGI7aF3zOY5jFhP7Yp+SmWVAR7k6eTWkjhTaFtWX6z8chicDWbnLFOIWucjg6Xst3brzNQrEmuFdp9LEY2+KI2oqGWx2mmiKXBTdDEnpHaHFv/zIGNxkhVzyyIGFBpQBdOeYkDgffcXVpLVfXWZfBhQyI09uvBT/hWseBpcs10S7spA8P5uh0ivGW2PMPDh02wh9VDPlOR5t7zhXb+cxZx9MQN98XIf73jBY0e3RBUYNFD4pbWTtr8ZkAQ== OPENSHIFT-13c0b466622911e3a8ed1231390e8c89-application-52a7fc416e8f042f03000002
command="/usr/bin/oo-trap-user",no-X11-forwarding ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEArtPPE20apufUH7xbjPDJR+fV5b1F1k2J8AOLpm5O/SGC0TnPc0ZqGFzp5+OQ2HnZmlb4HeJag/BHjjpfHarUSgX16SdAvRJ1Xzux4yY3pLWNM5v1DDMUMFUzmEJRCh89itzhtiZoBK/OiWog6Cr8Z28F3FAEeEX3sc2Nnqjb2bc402ODqB2GdeuWTmUWkzGinn5yDT41Pu/hm6/8W8Arh9+bSeGwtktY3jkkJEv9A/taqRulxhVWOzzaoYR/DF94EyXrIwXWtmsHWwSoDDDw01Q4JMKu5/Flb+ktPZK+VaG54KtZL4infWbE/Qk6DaFd9PD/KdKJhUJz8Wia7DUO5w== OPENSHIFT-13c0b466622911e3a8ed1231390e8c89-application-52a7fd086e8f0442dc000001
command="/usr/bin/oo-trap-user",no-X11-forwarding ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAzwG1cSCnAQ7E7ufZ1+W6jhrc9yQWjteoub76/LuGH9kIMljqOZxZPzjDL3z/4TLzhxjv8zAbYrh0Rw/wUsURbe1qVzRSyAmD9LQwPkKzg2rsXMsIfjSiHWHzZ573jY6Q04cVTW3x/Wmf5x4OXIjz9CIYdWKsWGBrd690TdjT7tfbLzUfKnjYgTLPZgPYqKnM1w0Z6v/3JX6NpUmfcdHPAX857kZAdVUf7hZPBxMI2l3/rglyQsj0xYDElFqaAi6jbNSv89YIKfHIEEzjMpWIn3SomoBEEUsu96qtnjQ0MuHZdUTfwf/tUpBuJOXiN4M7eYeFL50PxEjyOsaRsWBAxQ== OPENSHIFT-13c0b466622911e3a8ed1231390e8c89-application-13c0b466622911e3a8ed1231390e8c89
command="/usr/bin/oo-trap-user",no-X11-forwarding ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDh43XWg7LERxupWx/Ym9hswfA6loRkpMi5JOfX5C49RW6M6JnpyF53u8/VFYYADN8YN+9wwUEPrUTi6LsZAheAfgw5nxc3VTqFaiHuwFP9oc8yPCgclm+vC0Fn3S2foAjHqO1+fRDYDztAciD0uBT+pWqMsMrqdTdpHvgc6U6tYdKRqKCwNjo4K2bueq3MUMOPdxJp1SvrVJK0I+BNhG4iSaGFt++2dYr2X389kQ+6MNYpJpOuqXC734wWm27CQLvYtWRr3QXfKSV3HnybdYZPc/ffV7xE6MLMkbznBnCgdM8b8AvB41xMz5OvnyGf8+tGDrtKN+qL2+pLBPgZbxoF OPENSHIFT-13c0b466622911e3a8ed1231390e8c89-52a7d7b96e8f046b24000001-default

Comment 7 Abhishek Gupta 2013-12-12 18:21:59 UTC
Fixed with --> https://github.com/openshift/origin-server/pull/4328

Comment 8 Jianwei Hou 2013-12-13 08:53:26 UTC
The above PR isn't merged in devenv-stage_614, tested it by deploying the fix to my env.

1. Create a scalable application with a db cart
rhc create-app php1s php-5.3 mysql-5.1 -s
2. Induce an error to the configure method in /openshift-origin-node/model/v2_cart_model.rb
eg:
def configure(cartridge_name, template_git_url = nil, manifest = nil)
  return -1
  exit
....
3. Restart broker/mcollective, clear broker cache
4. Scale up this app: rhc cartridge-scale php-5.3 -a php1s --min 2
This steps fails since the error is induced.
5. Query the application document from mongo shell, found there are 2 app ssh keys, the new gear's ssh key is not added
6. Check .ssh/authorized_keys, there are 1 user ssh key and 2 app ssh keys. The result is as expected.
7. Fix the error, and restart services, clear broker cache
8. Do any operation against this app, eg: restart
9. Query the application document from mongo shell again
10. Check the contents of .ssh/authorized_keys
11. oo-admin-chk -l 1

Result:
After step 9,10,11: the new gear's ssh key is removed from mongo and node, oo-admin-chk passed. This bug is fixed by above pull request. Waiting for merge to have it verified.

Comment 9 Abhishek Gupta 2013-12-13 18:22:02 UTC
The proposed fix for this bug is undergoing review and changes.

Comment 10 Abhishek Gupta 2013-12-17 21:06:03 UTC
The fix is now merged into master

Comment 11 Jianwei Hou 2013-12-18 06:40:42 UTC
Tested on devenv_4147 and this issue is reproduced.

I'm afraid this has not been merged into devenv_4147 and devenv-stage_619 yet. I've compared the fixing code from https://github.com/openshift/origin-server/pull/4328 with the actual code on the instance, and they are different. So assigning back

Comment 12 Abhishek Gupta 2013-12-18 19:20:24 UTC
I have just verified that the fix is part of the latest devenv. Please verify.

Comment 13 Jianwei Hou 2013-12-19 03:25:38 UTC
Verified on devenv_4154

1. Create a scalable application with a db cart
rhc create-app php1s php-5.3 mysql-5.1 -s
2. Induce an error to the configure method in /openshift-origin-node/model/v2_cart_model.rb
eg:
def configure(cartridge_name, template_git_url = nil, manifest = nil)
  return -1
  exit
....
3. Restart broker/mcollective, clear broker cache
4. Scale up this app: rhc cartridge-scale php-5.3 -a php1s --min 2
This steps fails since the error is induced.
5. Query the application document from mongo shell, found there are 2 app ssh keys, the new gear's ssh key is not added
domU-12-31-39-07-BA-57(mongod-2.4.6)[PRIMARY] openshift_broker_dev> db.applications.findOne({},{app_ssh_keys:1})
{
  "_id": ObjectId("52b264a6e551c7b047000023"),
  "app_ssh_keys": [
    {
      "_id": ObjectId("52b264aae551c7b04700003d"),
      "_type": "ApplicationSshKey",
      "component_id": ObjectId("52b264a6e551c7b047000023"),
      "content": "AAAAB3NzaC1yc2EAAAABIwAAAQEAwpAtQ5eshd8Ae1zykLtz+RwuI1BiXoWRKTX/qwdpQad3XOACnfWez+F6U3qgIxE5ed5MZPo4X6HzPR20IkwxUvG6Ag8ERV8I/dui3r4XH5Zv9LmrFuv8Q6OroRMQ95pZMMbAk0pV/zhTbRGKAWoCb36ECYxmfTZudgKfmxZAkJKRfU+M6bC/nbMH7v5Ca2OmpRiOFbh4snKP2ZLZB0quGHb+EF+rlWZIcYph96fr2KkR2WHUwbxLCW+UwhjAFlnUhJIUai5gj1/dzrqHOq59n64Sy1xEY6IkNCOD2LYahvZt6Euw9HZAbDSD7mtUQQBw6RlXrahHfBODWkAYv8XURw==",
      "created_at": ISODate("2013-12-19T03:14:50.708Z"),
      "name": "application-52b264a6e551c7b047000023",
      "type": "ssh-rsa"
    },
    {
      "_id": ObjectId("52b264d4e551c7b047000058"),
      "_type": "ApplicationSshKey",
      "component_id": ObjectId("52b264cfe551c7b047000045"),
      "content": "AAAAB3NzaC1yc2EAAAABIwAAAQEA1qGw2Cf8sisacsV0+TWqMVUlVULrJTEai7/3qZtctReVeR9lJ63Ye7MXnN/AlnNHtrFEflzAgShHcfcECP3CleeX437WLtdt9GK3CM0UaeFT7d3YEJL1h7WGX4ggEQP12BjBTcVwDF3P1EvGE8xSZlSVg+wydPzMRn3XNl4ROeoPhd2/4r41W1cdFsh5erlgDl40V8OU3LFxVkqc+IIByRhere36YxAvGhAITdRdC72P1jX3TxqKJLuRjmZXRQGfy+Z9sx0kw3Hlhswt9tRNyq9iOjYOOSMB69GGRjFVSt+VRqksKu61Ntr6LP8jjy8P+5fICz7ghjLQdig97/XB8w==",
      "created_at": ISODate("2013-12-19T03:15:32.092Z"),
      "name": "application-52b264cfe551c75766000004",
      "type": "ssh-rsa"
    }
  ]
}

6. Check .ssh/authorized_keys, there are 1 user ssh key and 2 app ssh keys. The result is as expected.
[root@domU-12-31-39-07-BA-57 ~]# cat /var/lib/openshift/php1s-jhou/.ssh/authorized_keys 
command="/usr/bin/oo-trap-user",no-X11-forwarding ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAwpAtQ5eshd8Ae1zykLtz+RwuI1BiXoWRKTX/qwdpQad3XOACnfWez+F6U3qgIxE5ed5MZPo4X6HzPR20IkwxUvG6Ag8ERV8I/dui3r4XH5Zv9LmrFuv8Q6OroRMQ95pZMMbAk0pV/zhTbRGKAWoCb36ECYxmfTZudgKfmxZAkJKRfU+M6bC/nbMH7v5Ca2OmpRiOFbh4snKP2ZLZB0quGHb+EF+rlWZIcYph96fr2KkR2WHUwbxLCW+UwhjAFlnUhJIUai5gj1/dzrqHOq59n64Sy1xEY6IkNCOD2LYahvZt6Euw9HZAbDSD7mtUQQBw6RlXrahHfBODWkAYv8XURw== OPENSHIFT-52b264a6e551c7b047000023-application-52b264a6e551c7b047000023
command="/usr/bin/oo-trap-user",no-X11-forwarding ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDh43XWg7LERxupWx/Ym9hswfA6loRkpMi5JOfX5C49RW6M6JnpyF53u8/VFYYADN8YN+9wwUEPrUTi6LsZAheAfgw5nxc3VTqFaiHuwFP9oc8yPCgclm+vC0Fn3S2foAjHqO1+fRDYDztAciD0uBT+pWqMsMrqdTdpHvgc6U6tYdKRqKCwNjo4K2bueq3MUMOPdxJp1SvrVJK0I+BNhG4iSaGFt++2dYr2X389kQ+6MNYpJpOuqXC734wWm27CQLvYtWRr3QXfKSV3HnybdYZPc/ffV7xE6MLMkbznBnCgdM8b8AvB41xMz5OvnyGf8+tGDrtKN+qL2+pLBPgZbxoF OPENSHIFT-52b264a6e551c7b047000023-52b25443e551c776b5000001-default
command="/usr/bin/oo-trap-user",no-X11-forwarding ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA1qGw2Cf8sisacsV0+TWqMVUlVULrJTEai7/3qZtctReVeR9lJ63Ye7MXnN/AlnNHtrFEflzAgShHcfcECP3CleeX437WLtdt9GK3CM0UaeFT7d3YEJL1h7WGX4ggEQP12BjBTcVwDF3P1EvGE8xSZlSVg+wydPzMRn3XNl4ROeoPhd2/4r41W1cdFsh5erlgDl40V8OU3LFxVkqc+IIByRhere36YxAvGhAITdRdC72P1jX3TxqKJLuRjmZXRQGfy+Z9sx0kw3Hlhswt9tRNyq9iOjYOOSMB69GGRjFVSt+VRqksKu61Ntr6LP8jjy8P+5fICz7ghjLQdig97/XB8w== OPENSHIFT-52b264a6e551c7b047000023-application-52b264cfe551c75766000004

7. Fix the error, and restart services, clear broker cache
8. Do any operation against this app, eg: restart
9. Query the application document from mongo shell again
10. Check the contents of .ssh/authorized_keys
11. oo-admin-chk -l 1
[root@domU-12-31-39-07-BA-57 ~]# oo-admin-chk -l 1
Started at: 2013-12-19 03:20:20 UTC

Total gears found in mongo: 2
Total gears found on the nodes: 2
Total nodes that responded: 1

Finished at: 2013-12-19 03:21:01 UTC
Total time: 40.854s
SUCCESS


This bug is verified.