1006085 – haproxy_ctld keeps sending scale-up events in error conditions

Bug 1006085 - haproxy_ctld keeps sending scale-up events in error conditions

Summary: haproxy_ctld keeps sending scale-up events in error conditions

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Online
Classification:	Red Hat
Component:	Containers
Sub Component:
Version:	2.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Dan McPherson
QA Contact:	libra bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-09-10 01:37 UTC by Rajat Chopra
Modified:	2015-05-14 23:28 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-19 16:50:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Rajat Chopra 2013-09-10 01:37:10 UTC

Description of problem:
The auto-scaler daemon in the haproxy cartridge is designed to send scale-up events in case of traffic. But in case the broker sends back some error, the daemon does not hold back on sending further scale-up requests. 
Not sure what the solution should be, but in case the broker is stuck with the application's pending_op queue, we end up having infinite 10-second interval requests sent to the broker.

Version-Release number of selected component (if applicable):


How reproducible:
Always in a broken app, with lot of traffic.

Steps to Reproduce:
1. Create a scalable app and create a pending op that is broken (type missing or something). Or limit the application's gears by putting a cap on the user's max_gear limit.
2. Send traffic to the application
3. Check scale_events.log, we see the relentless scale-up requests even if they are failing.

Actual results:
haproxy_ctld keeps sending scale-up events every 10 seconds.

Expected results:
haproxy_ctld should be smart in the case when broker is sending back errors on the scale-up requests, and stop flooding the already beleagured broker.


Additional info:

Comment 1 Dan McPherson 2013-09-10 22:49:58 UTC

https://github.com/openshift/origin-server/pull/3611

It now waits for 10 mins to scale up or down if the last request to scale up or down failed.

Comment 2 openshift-github-bot 2013-09-11 01:21:49 UTC

Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/db3598a557d0ff71324fd72205c4cc2ed3f3b980
Bug 1006085

Comment 3 Meng Bo 2013-09-11 12:23:18 UTC

Checked on devenv_3771, it still will send GEAR_UP event for about every 1 minutes.

I, [2013-09-11T08:17:42.471038 #5516]  INFO -- : GEAR_UP - capacity: 109.375% gear_count: 2 sessions: 35 up_thresh: 90.0%
I, [2013-09-11T08:18:17.831972 #5516]  INFO -- : GEAR_UP - add-gear: exit: 0  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-11T08:18:22.918896 #5516]  INFO -- : GEAR_UP - capacity: 100.0% gear_count: 2 sessions: 32 up_thresh: 90.0%
I, [2013-09-11T08:18:38.093305 #5516]  INFO -- : GEAR_UP - add-gear: exit: 0  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-11T08:19:09.461168 #5516]  INFO -- : GEAR_UP - capacity: 106.25% gear_count: 2 sessions: 34 up_thresh: 90.0%
I, [2013-09-11T08:19:27.447006 #5516]  INFO -- : GEAR_UP - add-gear: exit: 0  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-11T08:20:09.131737 #5516]  INFO -- : GEAR_UP - capacity: 128.125% gear_count: 2 sessions: 41 up_thresh: 90.0%
I, [2013-09-11T08:20:26.942704 #5516]  INFO -- : GEAR_UP - add-gear: exit: 0  stdout: Already at the maximum number of gears allowed for either the app or your account.


Not sure why the add-gear failed exit code is 0, maybe caused by this?

Comment 4 Dan McPherson 2013-09-11 15:49:14 UTC

https://github.com/openshift/origin-server/pull/3618

Comment 5 openshift-github-bot 2013-09-11 17:33:29 UTC

Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/ad8e95a8b5202f49a67404fa23506cec07a2f85c
Bug 1006085

Comment 6 Meng Bo 2013-09-12 07:57:15 UTC

Tested again on devenv_3776, issue has been fixed.
It will re-trigger GEAR_UP event for every 10 minutes.

I, [2013-09-12T02:58:45.285264 #15431]  INFO -- : Starting haproxy_ctld
I, [2013-09-12T02:58:45.327543 #15431]  INFO -- : GEAR_UP - capacity: 93.75% gear_count: 2 sessions: 30 up_thresh: 90.0%
I, [2013-09-12T02:59:01.987247 #15627]  INFO -- : Starting haproxy_ctld
I, [2013-09-12T03:00:10.709793 #15627]  INFO -- : GEAR_UP - capacity: 109.375% gear_count: 2 sessions: 35 up_thresh: 90.0%
I, [2013-09-12T03:00:27.532465 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-12T03:10:40.333767 #15627]  INFO -- : GEAR_UP - capacity: 93.75% gear_count: 2 sessions: 30 up_thresh: 90.0%
I, [2013-09-12T03:11:21.386182 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-12T03:21:25.224495 #15627]  INFO -- : GEAR_UP - capacity: 93.75% gear_count: 2 sessions: 30 up_thresh: 90.0%
I, [2013-09-12T03:21:40.343013 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-12T03:31:51.258314 #15627]  INFO -- : GEAR_UP - capacity: 106.25% gear_count: 2 sessions: 34 up_thresh: 90.0%
I, [2013-09-12T03:32:17.398757 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-12T03:42:21.763609 #15627]  INFO -- : GEAR_UP - capacity: 153.125% gear_count: 2 sessions: 49 up_thresh: 90.0%
I, [2013-09-12T03:42:36.486227 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.

Note You need to log in before you can comment on or make changes to this bug.