Bug 972514

Summary: Application under load, scaling up fails to return gears
Product: OpenShift Online Reporter: Matt Hicks <mhicks>
Component: PodAssignee: Rajat Chopra <rchopra>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.xCC: jhou, rmillner, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-06-24 14:52:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matt Hicks 2013-06-09 19:50:42 UTC
Description of problem:
When I tried to run 'rhc app-show scaledemo --gears' during a GEAR_UP event, the call failed and returns:

"Failed to get gear groups for application scaledemo due to: undefined method `get_show_state_job' for nil:NilClass"

Version-Release number of selected component (if applicable):
In OpenShift Online production as of 6/9/2013

How reproducible:
Only saw it once is production but during the scale up, I was able to get the error for each call.  About 5 minutes later, the command returned the results successfully.


Additional info:

At this point, the application itself was under load as well - around 200-400 hits per second.  The application was: https://github.com/matthicksj/scaling-demo.

Comment 1 Rob Millner 2013-06-13 01:56:24 UTC
Was not able to reproduce on devenv with ab using a concurrency of 400, and apps allowed to scale to 3 gears and to 8 gears.

I did observe that app-show will pause while the app is scaling up or down and only return once it completes.  Wonder if there was some kind of timeout hit that caused a nil to be returned instead.

Comment 2 Rob Millner 2013-06-14 23:31:52 UTC
Still unable to reproduce but its definitely a broker bug.

Rajat, is there any way that gear.get_proxy can temporarily show up as nil during a lot of simultaneous operations.

controller/app/models/gear.rb,  line 208

Comment 3 Rajat Chopra 2013-06-14 23:44:27 UTC
Quite possible that the gear was in the middle of being created. We have two steps... init the gear (in broker) and then create the gear (on the node). In between these two steps we still save the stuff in mongo.. and any GET operation on the application (through another REST call), will result in this kind of error. Basically we have not found a home for the gear yet and the GET call on the application is trying to report the current gears and their states.

We had another similar issue reported when a GET during app creation resulted in a broken skeleton of the app being returned (basically there was no cartridge in it). Bug - https://bugzilla.redhat.com/show_bug.cgi?id=973718

We have two options - either block half baked data, or report back saying the app is being modified. Will try to do as much of the first option.

Comment 4 Rajat Chopra 2013-06-18 19:20:21 UTC
Fixed this particular case. Half created gears will show the state as 'unknown'.

Comment 5 Jianwei Hou 2013-06-19 03:29:03 UTC
I was able to reproduce this problem on devenv_3382

Steps:
1. Create an app using https://github.com/matthicksj/scaling-demo template
2. Auto scale up this application with ab
ab -n 10000000 -c 50 http://YOUR_APP/rest/add
3. On client side, show application status
rhc app-show scaleapp --gears
Saw the error when the app scales to the 3rd gear

$ rhc app show scaleapp --gears
ID                               State   Cartridges               Size  SSH URL
-------------------------------- ------- ------------------------ ----- ----------------------------------------------------------------------
51c11d4c22d09623f7000001         started mongodb-2.2              small 51c11d4c22d09623f7000001.rhcloud.com
8037eaecd88b11e2875222000a8a2291 started jbosseap-6.0 haproxy-1.4 small 8037eaecd88b11e2875222000a8a2291.rhcloud.com
51c1214722d09623f7000012         started jbosseap-6.0 haproxy-1.4 small 51c1214722d09623f7000012.rhcloud.com
602700920594833994678272         started jbosseap-6.0 haproxy-1.4 small 602700920594833994678272.rhcloud.com

$ rhc app show scaleapp --gears
Failed to get gear groups for application scaleapp due to: undefined method `get_show_state_job' for nil:NilClass

Waiting for the fix(https://github.com/rajatchopra/origin-server/commit/712594a0d4c93233bb8646829acd0f38e5cf326b) to merge to verify.

Comment 6 Jianwei Hou 2013-06-19 13:15:19 UTC
Verified on devenv_3384, steps same with comment 5

The gear was first shown as 'unknown', then 'new' and last 'started'

$ rhc app-show scaleapp --gears
ID                            State   Cartridges               Size  SSH URL
----------------------------- ------- ------------------------ ----- ------------------------------------------------------------------------------
da9dd4aad8d711e2978122000a8a200e started mongodb-2.2              small da9dd4aad8d711e2978122000a8a200e.rhcloud.com
da5c6128d8d711e2978122000a8a200e started jbosseap-6.0 haproxy-1.4 small da5c6128d8d711e2978122000a8a200e.rhcloud.com
273704157913335979835392      unknown jbosseap-6.0 haproxy-1.4 small 273704157913335979835392.rhcloud.com

.................................
hjw@hjw-sixmachine migrate$ rhc app-show scaleapp --gears
ID                            State   Cartridges               Size  SSH URL
----------------------------- ------- ------------------------ ----- ------------------------------------------------------------------------------
da9dd4aad8d711e2978122000a8a200e started mongodb-2.2              small da9dd4aad8d711e2978122000a8a200e.rhcloud.com
da5c6128d8d711e2978122000a8a200e started jbosseap-6.0 haproxy-1.4 small da5c6128d8d711e2978122000a8a200e.rhcloud.com
273704157913335979835392      started jbosseap-6.0 haproxy-1.4 small 273704157913335979835392.rhcloud.com
841564996773701578915840      started jbosseap-6.0 haproxy-1.4 small 841564996773701578915840.rhcloud.com
bdcadb3ad8e011e28e9122000a8a200e new     jbosseap-6.0 haproxy-1.4 small bdcadb3ad8e011e28e9122000a8a200e.rhcloud.com