Bug 1257757

Summary: Scaled application takes 4+mins to unidle
Product: OpenShift Container Platform Reporter: Ryan Howe <rhowe>
Component: ContainersAssignee: Timothy Williams <tiwillia>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: high Docs Contact:
Priority: high    
Version: 2.2.0CC: adellape, anli, aos-bugs, erich, jokerman, mmccomas, nicholas_schuetz, tiwillia
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openshift-origin-cartridge-haproxy-1.30.1.1-1.el6op Doc Type: Bug Fix
Doc Text:
When a scaled application is unidled, HAProxy is started first. Previously, HAProxy then made a blocking `curl` request to every gear in its configuration to unidle it. After HAProxy was finished, the rest of the gears received a 'start' from the broker. This caused a loop to be seen when unidling a scaled application that could cause delays and timeouts to be hit: HAProxy attempted to unidle all gears while the broker was already handling the unidling process, starting another unidling process for each gear. This bug fix removes HAProxy's logic where it attempts to unidle all gears in the application, as the broker already handles this operation. As a result, HAProxy no longer attempts to unidle all gears in an application, instead deferring this process to the broker, and unidling a scaled application takes much less time.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-09-30 16:38:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ryan Howe 2015-08-27 21:41:44 UTC
Description of problem:
Scaled application takes 4+mins to unidle after being stopped 

Version-Release number of selected component (if applicable):v2.2


How reproducible:100%


Steps to Reproduce:
1. rhc app create nodejs -a idle -s
2. oo-app-info -a idle
3. ssh node
4. oo-admin-ctl-gears idlegear 55de3ba95a00089d70000641
5. oo-admin-ctl-gears unidlegear 55de3ba95a00089d70000641

Actual results:

4-5mins to wait for gear to move from stopped to running 

Expected results:


Comments:
Seeing the time take exactly 4 minutes on tests. 

- Put echos statements in haproxy/bin/control seeing 4 min gap when haproxy/bin/control start is called 
https://github.com/openshift/origin-server/blob/master/cartridges/openshift-origin-cartridge-haproxy/bin/control#L29-L34


function ping_server_gears() {
    #  Ping the server gears and wake 'em up on startup.
echo "($(date)) - ping server gears" | tee -a $support_logs
    for geardns in $(web_gears | cut -f 3 -d ','); do
          echo "($(date)) - function ping_server_gears" | tee -a $support_logs
         [ -z "$geardns" ]  ||  curl "http://$geardns/" > /dev/null 2>&1  ||  :
echo "($(date)) - pinging gears done" | tee -a $support_logs
    done
}

Comment 14 Anping Li 2015-09-18 07:05:16 UTC
The fix wasn't included in this puddle. 

The bug can be reproduced as following.
[root@broker ~]# time oo-admin-ctl-gears unidlegear anlidom-idle-1
Unidling gear anlidom-idle-1 ... [ OK ]

real	3m58.683s
user	0m0.749s
sys	0m0.196

Comment 16 Anping Li 2015-09-22 01:09:14 UTC
Verify and pass. the unidlegear took less time now.

[root@node2 ~]# time oo-admin-ctl-gears unidlegear  anlidom-sphp-1
Unidling gear anlidom-sphp-1 ... [ OK ]

real	0m2.918s
user	0m0.732s
sys	0m0.190s

Comment 17 Timothy Williams 2015-09-23 21:15:53 UTC
*** Bug 1170040 has been marked as a duplicate of this bug. ***

Comment 20 errata-xmlrpc 2015-09-30 16:38:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1844.html