Bug 969165 - Deleting an application while the application creation is underway can leave gears behind on node
Summary: Deleting an application while the application creation is underway can leave ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Pod
Version: 2.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Abhishek Gupta
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-05-30 19:37 UTC by Abhishek Gupta
Modified: 2015-05-15 00:17 UTC (History)
4 users (show)

Fixed In Version: devenv_3295
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-06-11 04:14:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Abhishek Gupta 2013-05-30 19:37:25 UTC
Description of problem:
If you try to delete an application right after issuing the application create request, then the create and delete operations both go through without any errors, but the application gear can be left behind on the node.


Version-Release number of selected component (if applicable):


How reproducible:
This is more of a timing related bug though it is easy enough to reproduce manually.


Steps to Reproduce:
1. Use CLI to create a non-scalable application with 2 or more cartridges
2. Shortly after making the first call, make another CLI call to delete the application.


Actual results:
The create and delete operations both complete without errors but the application gear is left behind on the node

Expected results:
The application should be deleted on the node as well.


Additional info:
This happens because even though we use application level locks to perform operations on the application, the components to remove from the application is calculated at the time of receiving the application delete request. So, if at teh time the delete request came in, there were either no components for the application in mongo or just the web_framework component, then its the only one that will be removed. Subsequently, after acquiring the lock, trying to remove the web_framework cartridge and not the others leads to the web_framework cartridge just being removed from mongo and no calls made to the node/gear.

Comment 1 Abhishek Gupta 2013-05-30 20:59:39 UTC
Fixed with --> https://github.com/openshift/origin-server/pull/2697

Comment 3 Abhishek Gupta 2013-05-31 01:36:54 UTC
Fix will be pushed to stage tomorrow.

Comment 4 Jianwei Hou 2013-05-31 10:05:45 UTC
Tested on devenv_3297, looks like this can still be reproduced(reproduced twice out of 5 tries)

Steps:
1. rhc app-create app1 php-5.3 mysql-5.1 mongodb-2.2 phpmyadmin-3.4 -px
2. Open another tab, delete this app while it's being created
rhc app-delete app1
(send the delete request right after the gear home dir is seen on node)
3. Client shows deletion was successful
4. Check node, the gear is still there
[root@ip-10-137-58-187 openshift]# ls
809073081673512109735936  app1-jhou  last_access.log

Comment 5 Abhishek Gupta 2013-05-31 16:50:33 UTC
I have been unable to reproduce this on devenv_3298

When trying to reproduce this on the latest devenv, can you truncate the broker development log before each attempt --> /var/log/openshift/broker/development.log

Then once you are able to reproduce it, please attach the development.log to this bug. This will help me debug the issue, if it still exists, and make sure that the log entries are specific to the failed app create/delete attempt.

Comment 6 Jianwei Hou 2013-06-03 12:14:48 UTC
Tried this again, and couldn't reproduce this either. Have added a test case for this bug and will pay more attention in future tests

Comment 7 Abhishek Gupta 2013-06-03 17:13:02 UTC
Marking this ON_QA for now. So that another quick test can be run to verify that this issue is indeed fixed. 

While testing, please follow the steps described in Comment 5 above and provide the logs if the issue is reproduced.

Comment 8 zhaozhanqi 2013-06-04 02:03:29 UTC
Test this issue on devenv_3312 many time, it did not have reproduce. so change it to verified.


Note You need to log in before you can comment on or make changes to this bug.