Red Hat Bugzilla – Bug 820457
Deleting a scaling app left gears behind
Last modified: 2015-05-14 21:53:28 EDT
Description of problem:
While exploring another bug; a scalable application was created through the REST API and scaled up 30 times. It was then deleted through the REST API.
Deletion failed and reported:
<text>Failed to delete application pscale due to:Application gears already at zero for 'rmillner'</text>
According to "rhc domain show", my application is still alive. The primary gear is gone but one of the scale gears remains:
# ls -l /var/lib/stickshift/
drwxr-x---. 7 root e8b2ef55168440ef864a7334853e6338 4096 May 9 20:21 e8b2ef55168440ef864a7334853e6338
lrwxrwxrwx. 1 root root 52 May 9 20:21 e8b2ef5516-rmillner0140 -> /var/lib/stickshift/e8b2ef55168440ef864a7334853e6338
Neither broker nor mcollective logs show any attempt to deconfigure on the gear.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Upgrade your account to 100 gears on a dev instance.
2. Create a scalable php application.
3. Issue 30 scale-up events.
4. Delete the application
Deletion fails and gears remain. The application can no longer be deleted.
Nice clean deletion.
Severity set as medium but I'm going to tag this as a future feature since we don't expect to offer high scaling apps this (or even next?) sprint.
This behaviour doesn't repeat if you access the REST API directly via curl; just if the rhc tools are used.
Created attachment 585078 [details]
Run this on your dev instance to reproduce the problem
Was able to reproduce this behaviour via the REST API again.
A scalable jboss app was created and scaled up 30 times. After deletion; two gears remained.
Setting severity low since this is well outside our offered scaling.
Was able to show that both lingering gears in one instance of this test were created back-to-back; and both failed to be created on the same error in broker.
[REQ_ID=2ef6a5210b80474e990b0c14ee582dc6] ACTION=SCALE_UP_APPLICATION Application event 'scale-up' failed: Query condition failed to update application 'phtest' for 'firstname.lastname@example.org'
Completed 422 Unprocessable Entity in 9986ms (Views: 2.6ms)
Neither gear is being destroyed afterwards.
The error is being raised in:
Line 389 in StickShift::MongoDataStore.put_app; find_and_modify returns nil.
The problem seems to have changed in this sprint; running it in a loop showed a problem 9 creates/destroys in with a temporary DNS failure.
We should probably handle any exceptions from DYN by finishing gear deletion and then try again.
From the delete failure:
<text>Failed to delete application rlmtmp14508 due to:Error communicating with DNS system. If the problem persists please contact Red Hat support.</text>
Fragments of gears were left behind:
ls -l /var/lib/stickshift/
drwxr-x---. 10 root 05642e020d1148559ff3d923fb061c57 4096 Jun 21 23:26 05642e020d1148559ff3d923fb061c57
lrwxrwxrwx. 1 root root 52 Jun 21 23:26 05642e020d-rlmtmp14508 -> /var/lib/stickshift/05642e020d1148559ff3d923fb061c57
drwxr-x---. 5 root 07bdd02189434c0884a1f99f7fa31a71 4096 Jun 21 23:30 07bdd02189434c0884a1f99f7fa31a71
drwxr-x---. 5 root 1abbbe24f3b1480b968c24385a8939f2 4096 Jun 21 23:29 1abbbe24f3b1480b968c24385a8939f2
drwxr-x---. 5 root 2c35596b893a4fdf94096118925587c5 4096 Jun 21 23:15 2c35596b893a4fdf94096118925587c5
drwxr-x---. 5 root 3321f48dcaed44c6bf742115edce799b 4096 Jun 21 23:20 3321f48dcaed44c6bf742115edce799b
drwxr-x---. 5 root a9841cfb626b42ef87e963b04ea037cf 4096 Jun 21 23:12 a9841cfb626b42ef87e963b04ea037cf
drwxr-x---. 5 root b3c942e6d7c044e19571336fc593b525 4096 Jun 21 23:16 b3c942e6d7c044e19571336fc593b525
drwxr-x---. 5 root b986e06e495849bb98c292c8d1df5674 4096 Jun 21 23:32 b986e06e495849bb98c292c8d1df5674
drwxr-x---. 5 root bfc18ce087dd486b89c3012add46ade2 4096 Jun 21 23:31 bfc18ce087dd486b89c3012add46ade2
ls -l /var/lib/stickshift/bfc18ce087dd486b89c3012add46ade2
drwxr-xr-x. 4 root bfc18ce087dd486b89c3012add46ade2 4096 Jun 21 23:31 app-root
After >100 experiments; the only source of failure I ran into was Bug 834663 .
*** This bug has been marked as a duplicate of bug 834663 ***