Bug 1006085 - haproxy_ctld keeps sending scale-up events in error conditions
haproxy_ctld keeps sending scale-up events in error conditions
Status: CLOSED CURRENTRELEASE
Product: OpenShift Online
Classification: Red Hat
Component: Containers (Show other bugs)
2.x
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Dan McPherson
libra bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-09 21:37 EDT by Rajat Chopra
Modified: 2015-05-14 19:28 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-19 12:50:19 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Rajat Chopra 2013-09-09 21:37:10 EDT
Description of problem:
The auto-scaler daemon in the haproxy cartridge is designed to send scale-up events in case of traffic. But in case the broker sends back some error, the daemon does not hold back on sending further scale-up requests. 
Not sure what the solution should be, but in case the broker is stuck with the application's pending_op queue, we end up having infinite 10-second interval requests sent to the broker.

Version-Release number of selected component (if applicable):


How reproducible:
Always in a broken app, with lot of traffic.

Steps to Reproduce:
1. Create a scalable app and create a pending op that is broken (type missing or something). Or limit the application's gears by putting a cap on the user's max_gear limit.
2. Send traffic to the application
3. Check scale_events.log, we see the relentless scale-up requests even if they are failing.

Actual results:
haproxy_ctld keeps sending scale-up events every 10 seconds.

Expected results:
haproxy_ctld should be smart in the case when broker is sending back errors on the scale-up requests, and stop flooding the already beleagured broker.


Additional info:
Comment 1 Dan McPherson 2013-09-10 18:49:58 EDT
https://github.com/openshift/origin-server/pull/3611

It now waits for 10 mins to scale up or down if the last request to scale up or down failed.
Comment 3 Meng Bo 2013-09-11 08:23:18 EDT
Checked on devenv_3771, it still will send GEAR_UP event for about every 1 minutes.

I, [2013-09-11T08:17:42.471038 #5516]  INFO -- : GEAR_UP - capacity: 109.375% gear_count: 2 sessions: 35 up_thresh: 90.0%
I, [2013-09-11T08:18:17.831972 #5516]  INFO -- : GEAR_UP - add-gear: exit: 0  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-11T08:18:22.918896 #5516]  INFO -- : GEAR_UP - capacity: 100.0% gear_count: 2 sessions: 32 up_thresh: 90.0%
I, [2013-09-11T08:18:38.093305 #5516]  INFO -- : GEAR_UP - add-gear: exit: 0  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-11T08:19:09.461168 #5516]  INFO -- : GEAR_UP - capacity: 106.25% gear_count: 2 sessions: 34 up_thresh: 90.0%
I, [2013-09-11T08:19:27.447006 #5516]  INFO -- : GEAR_UP - add-gear: exit: 0  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-11T08:20:09.131737 #5516]  INFO -- : GEAR_UP - capacity: 128.125% gear_count: 2 sessions: 41 up_thresh: 90.0%
I, [2013-09-11T08:20:26.942704 #5516]  INFO -- : GEAR_UP - add-gear: exit: 0  stdout: Already at the maximum number of gears allowed for either the app or your account.


Not sure why the add-gear failed exit code is 0, maybe caused by this?
Comment 4 Dan McPherson 2013-09-11 11:49:14 EDT
https://github.com/openshift/origin-server/pull/3618
Comment 6 Meng Bo 2013-09-12 03:57:15 EDT
Tested again on devenv_3776, issue has been fixed.
It will re-trigger GEAR_UP event for every 10 minutes.

I, [2013-09-12T02:58:45.285264 #15431]  INFO -- : Starting haproxy_ctld
I, [2013-09-12T02:58:45.327543 #15431]  INFO -- : GEAR_UP - capacity: 93.75% gear_count: 2 sessions: 30 up_thresh: 90.0%
I, [2013-09-12T02:59:01.987247 #15627]  INFO -- : Starting haproxy_ctld
I, [2013-09-12T03:00:10.709793 #15627]  INFO -- : GEAR_UP - capacity: 109.375% gear_count: 2 sessions: 35 up_thresh: 90.0%
I, [2013-09-12T03:00:27.532465 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-12T03:10:40.333767 #15627]  INFO -- : GEAR_UP - capacity: 93.75% gear_count: 2 sessions: 30 up_thresh: 90.0%
I, [2013-09-12T03:11:21.386182 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-12T03:21:25.224495 #15627]  INFO -- : GEAR_UP - capacity: 93.75% gear_count: 2 sessions: 30 up_thresh: 90.0%
I, [2013-09-12T03:21:40.343013 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-12T03:31:51.258314 #15627]  INFO -- : GEAR_UP - capacity: 106.25% gear_count: 2 sessions: 34 up_thresh: 90.0%
I, [2013-09-12T03:32:17.398757 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-12T03:42:21.763609 #15627]  INFO -- : GEAR_UP - capacity: 153.125% gear_count: 2 sessions: 49 up_thresh: 90.0%
I, [2013-09-12T03:42:36.486227 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.

Note You need to log in before you can comment on or make changes to this bug.