Bug 1102775 - EAP cartridge should scale based on active session count
Summary: EAP cartridge should scale based on active session count
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RFE
Version: 2.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Mike Barrett
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-05-29 14:19 UTC by Ron Šmeral
Modified: 2016-11-01 01:37 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
OSE at console.itos.redhat.com EAP6 cartridge
Last Closed: 2016-01-27 19:10:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ron Šmeral 2014-05-29 14:19:08 UTC
Description of problem:
Current scaling strategy of the EAP cartridge doesn't seem to take into account the number of active sessions, which might easily lead to unavailability of service. This problem grows proportionally with the session timeout of an application.

Version-Release number of selected component (if applicable):
Version of OSE currently at console.itos.redhat.com

How reproducible:
An example situation:
* EAP's default max active session count is 1024
* OpenShift's max concurrent requests per gear (scaling threshold) is 16
* let's say session timeout = 5 minutes = 300 seconds
In this case, just 4 requests per second would result in a HTTP 50x error from EAP in under 5 minutes, lasting as long as the load is sustained. And the problem is, OS would not scale until the load is 4 times higher.

Actual results:
Service unavailable, scaling not triggered

Expected results:
Scale up event triggered based on active sessions

Additional info:
Scaling behaviour is indeed configurable using haproxy_ctld.rb, but we should still try to provide _reasonable_ defaults.

Comment 1 Ron Šmeral 2014-05-29 14:26:30 UTC
And this situation is only made worse by haproxy sustaining many useless sessions for no good reason: #1077727.

Comment 3 Luke Meyer 2014-05-29 15:34:03 UTC
HAproxy does not have some special ability to reach into the internals of the cartridge and determine how many sessions it has created. Does EAP advertise this somehow? I think you'd need mod_cluster and an extra exposed port to access this info. Or maybe it's possible the session replication port provides this information - do you actually know?

You can customize haproxy's scaling in a supported way beginning with OSE 2.1, but it can only scale based on information that is available to it (running on a separate gear).

It would probably make sense to ship haproxy with multiple default scaling strategies and have it select one based on the cartridge being scaled. That seems like a good feature request.

BTW scaling EAP wouldn't reduce session load proportionally, since sessions are shared between multiple instances. Also sessions vary greatly in size - one app's session may have a few strings and another's may have sessions with MB or GB of data in them. Seems to me what you really want is for each platform to have a "load check" (similar to "health check") whereby it can indicate in its own way (customizable by the app) how close it is to needing scaling.

Comment 4 Ron Šmeral 2014-05-30 13:29:11 UTC
Thanks for looking into this.

HAProxy doesn't and shouldn't be aware of cartridge internals. However, that's where the haproxy_ctld comes in, IIUC. It's surely possible to get the information about current number of active sessions from EAP's web subsystem through JMX. There's a short example in #1077727.

Multiple scaling strategies to choose from would be a great improvement from the current single parameter (concurrent requests) based scaling controller. 
A robust scaling controller should, exactly for the reasons you mention, take into account multiple parameters. E.g., concurrent requests _and_ memory, active session count, cpu load, etc., and trigger a scale up when the threshold is hit on any of these parameters.

Per-app user-defined load check sounds too complicated to me, maybe it would make more sense to ship a scaling controller adapted for each web platform cartridge. E.g. the controller for EAP cartridge would know how to query session count, etc.


Note You need to log in before you can comment on or make changes to this bug.