Bug 1006085 - haproxy_ctld keeps sending scale-up events in error conditions
Summary: haproxy_ctld keeps sending scale-up events in error conditions
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Containers
Version: 2.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Dan McPherson
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-10 01:37 UTC by Rajat Chopra
Modified: 2015-05-14 23:28 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-19 16:50:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Rajat Chopra 2013-09-10 01:37:10 UTC
Description of problem:
The auto-scaler daemon in the haproxy cartridge is designed to send scale-up events in case of traffic. But in case the broker sends back some error, the daemon does not hold back on sending further scale-up requests. 
Not sure what the solution should be, but in case the broker is stuck with the application's pending_op queue, we end up having infinite 10-second interval requests sent to the broker.

Version-Release number of selected component (if applicable):


How reproducible:
Always in a broken app, with lot of traffic.

Steps to Reproduce:
1. Create a scalable app and create a pending op that is broken (type missing or something). Or limit the application's gears by putting a cap on the user's max_gear limit.
2. Send traffic to the application
3. Check scale_events.log, we see the relentless scale-up requests even if they are failing.

Actual results:
haproxy_ctld keeps sending scale-up events every 10 seconds.

Expected results:
haproxy_ctld should be smart in the case when broker is sending back errors on the scale-up requests, and stop flooding the already beleagured broker.


Additional info:

Comment 1 Dan McPherson 2013-09-10 22:49:58 UTC
https://github.com/openshift/origin-server/pull/3611

It now waits for 10 mins to scale up or down if the last request to scale up or down failed.

Comment 3 Meng Bo 2013-09-11 12:23:18 UTC
Checked on devenv_3771, it still will send GEAR_UP event for about every 1 minutes.

I, [2013-09-11T08:17:42.471038 #5516]  INFO -- : GEAR_UP - capacity: 109.375% gear_count: 2 sessions: 35 up_thresh: 90.0%
I, [2013-09-11T08:18:17.831972 #5516]  INFO -- : GEAR_UP - add-gear: exit: 0  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-11T08:18:22.918896 #5516]  INFO -- : GEAR_UP - capacity: 100.0% gear_count: 2 sessions: 32 up_thresh: 90.0%
I, [2013-09-11T08:18:38.093305 #5516]  INFO -- : GEAR_UP - add-gear: exit: 0  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-11T08:19:09.461168 #5516]  INFO -- : GEAR_UP - capacity: 106.25% gear_count: 2 sessions: 34 up_thresh: 90.0%
I, [2013-09-11T08:19:27.447006 #5516]  INFO -- : GEAR_UP - add-gear: exit: 0  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-11T08:20:09.131737 #5516]  INFO -- : GEAR_UP - capacity: 128.125% gear_count: 2 sessions: 41 up_thresh: 90.0%
I, [2013-09-11T08:20:26.942704 #5516]  INFO -- : GEAR_UP - add-gear: exit: 0  stdout: Already at the maximum number of gears allowed for either the app or your account.


Not sure why the add-gear failed exit code is 0, maybe caused by this?

Comment 4 Dan McPherson 2013-09-11 15:49:14 UTC
https://github.com/openshift/origin-server/pull/3618

Comment 6 Meng Bo 2013-09-12 07:57:15 UTC
Tested again on devenv_3776, issue has been fixed.
It will re-trigger GEAR_UP event for every 10 minutes.

I, [2013-09-12T02:58:45.285264 #15431]  INFO -- : Starting haproxy_ctld
I, [2013-09-12T02:58:45.327543 #15431]  INFO -- : GEAR_UP - capacity: 93.75% gear_count: 2 sessions: 30 up_thresh: 90.0%
I, [2013-09-12T02:59:01.987247 #15627]  INFO -- : Starting haproxy_ctld
I, [2013-09-12T03:00:10.709793 #15627]  INFO -- : GEAR_UP - capacity: 109.375% gear_count: 2 sessions: 35 up_thresh: 90.0%
I, [2013-09-12T03:00:27.532465 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-12T03:10:40.333767 #15627]  INFO -- : GEAR_UP - capacity: 93.75% gear_count: 2 sessions: 30 up_thresh: 90.0%
I, [2013-09-12T03:11:21.386182 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-12T03:21:25.224495 #15627]  INFO -- : GEAR_UP - capacity: 93.75% gear_count: 2 sessions: 30 up_thresh: 90.0%
I, [2013-09-12T03:21:40.343013 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-12T03:31:51.258314 #15627]  INFO -- : GEAR_UP - capacity: 106.25% gear_count: 2 sessions: 34 up_thresh: 90.0%
I, [2013-09-12T03:32:17.398757 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-12T03:42:21.763609 #15627]  INFO -- : GEAR_UP - capacity: 153.125% gear_count: 2 sessions: 49 up_thresh: 90.0%
I, [2013-09-12T03:42:36.486227 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.


Note You need to log in before you can comment on or make changes to this bug.