Bug 1084035 - Scalable app is scaled down automatically when scale up with MIN setting of a large number
Summary: Scalable app is scaled down automatically when scale up with MIN setting of a...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 2.1.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: Brenton Leanhardt
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-04-03 13:21 UTC by Nan Wei
Modified: 2018-12-06 16:14 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-18 11:16:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Nan Wei 2014-04-03 13:21:24 UTC
Description of problem:
When scale up a scalable app with large number of gears and check app gears. Before scale-up is finished, new created gears is deleted automatically until the minimum is the same as before scale up. 
     
OpenShift Enterprise puddle: 2.1/2014-04-02.2

How reproducible:
100%

Steps to Reproduce:
1. Create a scale app 
#rhc app create pps php-5.3 -s --no-git
2. Scale up the app
#rhc cartridge scale php-5.3 -a pps --min 15
3. The about command will cost a long time, meantime, open another terminal, use the following command to check gear continuously.
#rhc app show -a pps --gears 
Found the gear list is always changed, at the begining, gear number is increasing, after about 2 min, the gear number start to decreasing until gear number is 1.
4. check haproxy_ctld.log when scale up, during scale-up, before the above command is finished, haproxy_ctld is already starting to remove gear.
5. After the above command is finished, check the gears' status
# rhc cartridge scale php-5.3 -a pps --min 15
Using php-5.3 (PHP 5.3) for 'php'
This operation will run until the application is at the minimum scale and may take several minutes.
Setting scale range for php-5.3 ... 
An error occurred while communicating with the server. This problem may only be temporary. Check that you have correctly specified your OpenShift server
'https://localhost/broker/rest/application/533d5c4ffdae85d8a70008c2/cartridge/php-5.3'.

# rhc app show pps --gears
ID                       State   Cartridges          Size  SSH URL
------------------------ ------- ------------------- ----- -----------------------------------------------------------
533d46d5fdae85d8a70006f9 started haproxy-1.4 php-5.3 small 533d46d5fdae85d8a70006f9.com.cn

# rhc app show pps
pps @ http://ps-nweidomain.ose21-manual.com.cn/ (uuid: 533d46d5fdae85d8a70006f9)
---------------------------------------------------------------------------------
  Domain:     nweidomain
  Created:    4:32 AM
  Gears:      1 (defaults to small)
  Git URL:    ssh://533d46d5fdae85d8a70006f9.com.cn/~/git/pps.git/
  SSH:        533d46d5fdae85d8a70006f9.com.cn
  Deployment: auto (on git push)

  haproxy-1.4 (Web Load Balancer)
  -------------------------------
    Gears: Located with php-5.3

  php-5.3 (PHP 5.3)
  -----------------
    Scaling: x1 (minimum: 1, maximum: available) on small gears


Actual results:
After scale up, new created gears are deleted automatically.  

Expected results:
App scale up successfully.

Additional info:
When min value is set to a low number, this issue does NOT happend, e.g: min=3 or 6
This issue does NOT happen against online stage env.

Comment 2 Luke Meyer 2014-04-03 16:03:47 UTC
If I had to guess, it might be that the update to the app's MIN value isn't committed until after the scale-up, and in the meantime haproxy sees no traffic and start un-scaling itself?

I would expect that there should be an app lock to block that from happening, though. This will require some digging...

Comment 3 Johnny Liu 2014-04-04 02:17:05 UTC
(In reply to Luke Meyer from comment #2)
> If I had to guess, it might be that the update to the app's MIN value isn't
> committed until after the scale-up, and in the meantime haproxy sees no
> traffic and start un-scaling itself?
+1

Comment 4 Luke Meyer 2014-04-04 15:53:42 UTC
Still happening as described, and not on an online devenv. Hard to think of why they would be different...

On the 2.1 devenv I'm seeing this in the gear haproxy log and don't seem to online:
[WARNING] 093/111220 (8135) : Server express/gear-533eca73037f758e96000029-demo is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 14 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 093/111220 (8135) : Server express/gear-533eca73037f758e9600002a-demo is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 13 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[... etc]

May be some config is keeping the scaled gears from being reachable?

One thing to note also is that the de-scaling only seems to happen after the http request by rhc times out (error 502).

Also interesting that this is the state in the 2.1 gear:
> less app-root/data/scale_limits.txt
scale_min=1
scale_max=-1
(online has scale_min=15, correctly)

openshift-watchman service wasn't started, but with it on, no difference observed.

Maybe because online is docker-ized? Long shot...

Can't immediately find anything to indicate what is going wrong.

Comment 11 Johnny Liu 2016-04-18 11:16:19 UTC
Retest this bug with 2.2/2016-03-29.1 puddle, and no such issue any more. So close this bug.


Note You need to log in before you can comment on or make changes to this bug.