Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1102775

Summary: EAP cartridge should scale based on active session count
Product: OpenShift Container Platform Reporter: Ron Šmeral <rsmeral>
Component: RFEAssignee: Mike Barrett <mbarrett>
Status: CLOSED WONTFIX QA Contact: libra bugs <libra-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.1.0CC: amelicha, bleanhar, erich, jokerman, lmeyer, maschmid, mmccomas, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
OSE at console.itos.redhat.com EAP6 cartridge
Last Closed: 2016-01-27 19:10:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ron Šmeral 2014-05-29 14:19:08 UTC
Description of problem:
Current scaling strategy of the EAP cartridge doesn't seem to take into account the number of active sessions, which might easily lead to unavailability of service. This problem grows proportionally with the session timeout of an application.

Version-Release number of selected component (if applicable):
Version of OSE currently at console.itos.redhat.com

How reproducible:
An example situation:
* EAP's default max active session count is 1024
* OpenShift's max concurrent requests per gear (scaling threshold) is 16
* let's say session timeout = 5 minutes = 300 seconds
In this case, just 4 requests per second would result in a HTTP 50x error from EAP in under 5 minutes, lasting as long as the load is sustained. And the problem is, OS would not scale until the load is 4 times higher.

Actual results:
Service unavailable, scaling not triggered

Expected results:
Scale up event triggered based on active sessions

Additional info:
Scaling behaviour is indeed configurable using haproxy_ctld.rb, but we should still try to provide _reasonable_ defaults.

Comment 1 Ron Šmeral 2014-05-29 14:26:30 UTC
And this situation is only made worse by haproxy sustaining many useless sessions for no good reason: #1077727.

Comment 3 Luke Meyer 2014-05-29 15:34:03 UTC
HAproxy does not have some special ability to reach into the internals of the cartridge and determine how many sessions it has created. Does EAP advertise this somehow? I think you'd need mod_cluster and an extra exposed port to access this info. Or maybe it's possible the session replication port provides this information - do you actually know?

You can customize haproxy's scaling in a supported way beginning with OSE 2.1, but it can only scale based on information that is available to it (running on a separate gear).

It would probably make sense to ship haproxy with multiple default scaling strategies and have it select one based on the cartridge being scaled. That seems like a good feature request.

BTW scaling EAP wouldn't reduce session load proportionally, since sessions are shared between multiple instances. Also sessions vary greatly in size - one app's session may have a few strings and another's may have sessions with MB or GB of data in them. Seems to me what you really want is for each platform to have a "load check" (similar to "health check") whereby it can indicate in its own way (customizable by the app) how close it is to needing scaling.

Comment 4 Ron Šmeral 2014-05-30 13:29:11 UTC
Thanks for looking into this.

HAProxy doesn't and shouldn't be aware of cartridge internals. However, that's where the haproxy_ctld comes in, IIUC. It's surely possible to get the information about current number of active sessions from EAP's web subsystem through JMX. There's a short example in #1077727.

Multiple scaling strategies to choose from would be a great improvement from the current single parameter (concurrent requests) based scaling controller. 
A robust scaling controller should, exactly for the reasons you mention, take into account multiple parameters. E.g., concurrent requests _and_ memory, active session count, cpu load, etc., and trigger a scale up when the threshold is hit on any of these parameters.

Per-app user-defined load check sounds too complicated to me, maybe it would make more sense to ship a scaling controller adapted for each web platform cartridge. E.g. the controller for EAP cartridge would know how to query session count, etc.