Bug 1084035

Summary:	Scalable app is scaled down automatically when scale up with MIN setting of a large number
Product:	OpenShift Container Platform	Reporter:	Nan Wei <nwei>
Component:	Containers	Assignee:	Brenton Leanhardt <bleanhar>
Status:	CLOSED WORKSFORME	QA Contact:	libra bugs <libra-bugs>
Severity:	high	Docs Contact:
Priority:	medium
Version:	2.1.0	CC:	erich, gpei, jialiu, libra-onpremise-devel, lmeyer, nwei, xtian
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-04-18 11:16:19 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Nan Wei 2014-04-03 13:21:24 UTC

Description of problem:
When scale up a scalable app with large number of gears and check app gears. Before scale-up is finished, new created gears is deleted automatically until the minimum is the same as before scale up.

OpenShift Enterprise puddle: 2.1/2014-04-02.2

How reproducible:
100%

Steps to Reproduce:
1. Create a scale app
#rhc app create pps php-5.3 -s --no-git
2. Scale up the app
#rhc cartridge scale php-5.3 -a pps --min 15
3. The about command will cost a long time, meantime, open another terminal, use the following command to check gear continuously.
#rhc app show -a pps --gears
Found the gear list is always changed, at the begining, gear number is increasing, after about 2 min, the gear number start to decreasing until gear number is 1.
4. check haproxy_ctld.log when scale up, during scale-up, before the above command is finished, haproxy_ctld is already starting to remove gear.
5. After the above command is finished, check the gears' status
# rhc cartridge scale php-5.3 -a pps --min 15
Using php-5.3 (PHP 5.3) for 'php'
This operation will run until the application is at the minimum scale and may take several minutes.
Setting scale range for php-5.3 ...
An error occurred while communicating with the server. This problem may only be temporary. Check that you have correctly specified your OpenShift server
'https://localhost/broker/rest/application/533d5c4ffdae85d8a70008c2/cartridge/php-5.3'.

# rhc app show pps --gears
ID State Cartridges Size SSH URL
------------------------ ------- ------------------- ----- -----------------------------------------------------------
533d46d5fdae85d8a70006f9 started haproxy-1.4 php-5.3 small 533d46d5fdae85d8a70006f9.com.cn

# rhc app show pps
pps @ http://ps-nweidomain.ose21-manual.com.cn/ (uuid: 533d46d5fdae85d8a70006f9)
---------------------------------------------------------------------------------
Domain: nweidomain
Created: 4:32 AM
Gears: 1 (defaults to small)
Git URL: ssh://533d46d5fdae85d8a70006f9.com.cn/~/git/pps.git/
SSH: 533d46d5fdae85d8a70006f9.com.cn
Deployment: auto (on git push)

haproxy-1.4 (Web Load Balancer)
-------------------------------
Gears: Located with php-5.3

php-5.3 (PHP 5.3)
-----------------
Scaling: x1 (minimum: 1, maximum: available) on small gears

Actual results:
After scale up, new created gears are deleted automatically.

Expected results:
App scale up successfully.

Additional info:
When min value is set to a low number, this issue does NOT happend, e.g: min=3 or 6
This issue does NOT happen against online stage env.

Comment 2 Luke Meyer 2014-04-03 16:03:47 UTC

If I had to guess, it might be that the update to the app's MIN value isn't committed until after the scale-up, and in the meantime haproxy sees no traffic and start un-scaling itself?

I would expect that there should be an app lock to block that from happening, though. This will require some digging...

Comment 3 Johnny Liu 2014-04-04 02:17:05 UTC

(In reply to Luke Meyer from comment #2)
> If I had to guess, it might be that the update to the app's MIN value isn't
> committed until after the scale-up, and in the meantime haproxy sees no
> traffic and start un-scaling itself?
+1

Comment 4 Luke Meyer 2014-04-04 15:53:42 UTC

Still happening as described, and not on an online devenv. Hard to think of why they would be different...

On the 2.1 devenv I'm seeing this in the gear haproxy log and don't seem to online:
[WARNING] 093/111220 (8135) : Server express/gear-533eca73037f758e96000029-demo is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 14 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 093/111220 (8135) : Server express/gear-533eca73037f758e9600002a-demo is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 13 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[... etc]

May be some config is keeping the scaled gears from being reachable?

One thing to note also is that the de-scaling only seems to happen after the http request by rhc times out (error 502).

Also interesting that this is the state in the 2.1 gear:
> less app-root/data/scale_limits.txt
scale_min=1
scale_max=-1
(online has scale_min=15, correctly)

openshift-watchman service wasn't started, but with it on, no difference observed.

Maybe because online is docker-ized? Long shot...

Can't immediately find anything to indicate what is going wrong.

Comment 11 Johnny Liu 2016-04-18 11:16:19 UTC

Retest this bug with 2.2/2016-03-29.1 puddle, and no such issue any more. So close this bug.