Bug 1006085

Summary:	haproxy_ctld keeps sending scale-up events in error conditions
Product:	OpenShift Online	Reporter:	Rajat Chopra <rchopra>
Component:	Containers	Assignee:	Dan McPherson <dmcphers>
Status:	CLOSED CURRENTRELEASE	QA Contact:	libra bugs <libra-bugs>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	2.x	CC:	bmeng, yadu
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-09-19 16:50:19 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Rajat Chopra 2013-09-10 01:37:10 UTC

Description of problem:
The auto-scaler daemon in the haproxy cartridge is designed to send scale-up events in case of traffic. But in case the broker sends back some error, the daemon does not hold back on sending further scale-up requests. 
Not sure what the solution should be, but in case the broker is stuck with the application's pending_op queue, we end up having infinite 10-second interval requests sent to the broker.

Version-Release number of selected component (if applicable):


How reproducible:
Always in a broken app, with lot of traffic.

Steps to Reproduce:
1. Create a scalable app and create a pending op that is broken (type missing or something). Or limit the application's gears by putting a cap on the user's max_gear limit.
2. Send traffic to the application
3. Check scale_events.log, we see the relentless scale-up requests even if they are failing.

Actual results:
haproxy_ctld keeps sending scale-up events every 10 seconds.

Expected results:
haproxy_ctld should be smart in the case when broker is sending back errors on the scale-up requests, and stop flooding the already beleagured broker.


Additional info:

Comment 1 Dan McPherson 2013-09-10 22:49:58 UTC

https://github.com/openshift/origin-server/pull/3611

It now waits for 10 mins to scale up or down if the last request to scale up or down failed.

Comment 2 openshift-github-bot 2013-09-11 01:21:49 UTC

Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/db3598a557d0ff71324fd72205c4cc2ed3f3b980
Bug 1006085

Comment 3 Meng Bo 2013-09-11 12:23:18 UTC

Checked on devenv_3771, it still will send GEAR_UP event for about every 1 minutes.

I, [2013-09-11T08:17:42.471038 #5516]  INFO -- : GEAR_UP - capacity: 109.375% gear_count: 2 sessions: 35 up_thresh: 90.0%
I, [2013-09-11T08:18:17.831972 #5516]  INFO -- : GEAR_UP - add-gear: exit: 0  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-11T08:18:22.918896 #5516]  INFO -- : GEAR_UP - capacity: 100.0% gear_count: 2 sessions: 32 up_thresh: 90.0%
I, [2013-09-11T08:18:38.093305 #5516]  INFO -- : GEAR_UP - add-gear: exit: 0  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-11T08:19:09.461168 #5516]  INFO -- : GEAR_UP - capacity: 106.25% gear_count: 2 sessions: 34 up_thresh: 90.0%
I, [2013-09-11T08:19:27.447006 #5516]  INFO -- : GEAR_UP - add-gear: exit: 0  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-11T08:20:09.131737 #5516]  INFO -- : GEAR_UP - capacity: 128.125% gear_count: 2 sessions: 41 up_thresh: 90.0%
I, [2013-09-11T08:20:26.942704 #5516]  INFO -- : GEAR_UP - add-gear: exit: 0  stdout: Already at the maximum number of gears allowed for either the app or your account.


Not sure why the add-gear failed exit code is 0, maybe caused by this?

Comment 4 Dan McPherson 2013-09-11 15:49:14 UTC

https://github.com/openshift/origin-server/pull/3618

Comment 5 openshift-github-bot 2013-09-11 17:33:29 UTC

Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/ad8e95a8b5202f49a67404fa23506cec07a2f85c
Bug 1006085

Comment 6 Meng Bo 2013-09-12 07:57:15 UTC

Tested again on devenv_3776, issue has been fixed.
It will re-trigger GEAR_UP event for every 10 minutes.

I, [2013-09-12T02:58:45.285264 #15431]  INFO -- : Starting haproxy_ctld
I, [2013-09-12T02:58:45.327543 #15431]  INFO -- : GEAR_UP - capacity: 93.75% gear_count: 2 sessions: 30 up_thresh: 90.0%
I, [2013-09-12T02:59:01.987247 #15627]  INFO -- : Starting haproxy_ctld
I, [2013-09-12T03:00:10.709793 #15627]  INFO -- : GEAR_UP - capacity: 109.375% gear_count: 2 sessions: 35 up_thresh: 90.0%
I, [2013-09-12T03:00:27.532465 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-12T03:10:40.333767 #15627]  INFO -- : GEAR_UP - capacity: 93.75% gear_count: 2 sessions: 30 up_thresh: 90.0%
I, [2013-09-12T03:11:21.386182 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-12T03:21:25.224495 #15627]  INFO -- : GEAR_UP - capacity: 93.75% gear_count: 2 sessions: 30 up_thresh: 90.0%
I, [2013-09-12T03:21:40.343013 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-12T03:31:51.258314 #15627]  INFO -- : GEAR_UP - capacity: 106.25% gear_count: 2 sessions: 34 up_thresh: 90.0%
I, [2013-09-12T03:32:17.398757 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.

I, [2013-09-12T03:42:21.763609 #15627]  INFO -- : GEAR_UP - capacity: 153.125% gear_count: 2 sessions: 49 up_thresh: 90.0%
I, [2013-09-12T03:42:36.486227 #15627]  INFO -- : GEAR_UP - add-gear: exit: 1  stdout: Already at the maximum number of gears allowed for either the app or your account.