Description of problem: Current scaling strategy of the EAP cartridge doesn't seem to take into account the number of active sessions, which might easily lead to unavailability of service. This problem grows proportionally with the session timeout of an application. Version-Release number of selected component (if applicable): Version of OSE currently at console.itos.redhat.com How reproducible: An example situation: * EAP's default max active session count is 1024 * OpenShift's max concurrent requests per gear (scaling threshold) is 16 * let's say session timeout = 5 minutes = 300 seconds In this case, just 4 requests per second would result in a HTTP 50x error from EAP in under 5 minutes, lasting as long as the load is sustained. And the problem is, OS would not scale until the load is 4 times higher. Actual results: Service unavailable, scaling not triggered Expected results: Scale up event triggered based on active sessions Additional info: Scaling behaviour is indeed configurable using haproxy_ctld.rb, but we should still try to provide _reasonable_ defaults.
And this situation is only made worse by haproxy sustaining many useless sessions for no good reason: #1077727.
HAproxy does not have some special ability to reach into the internals of the cartridge and determine how many sessions it has created. Does EAP advertise this somehow? I think you'd need mod_cluster and an extra exposed port to access this info. Or maybe it's possible the session replication port provides this information - do you actually know? You can customize haproxy's scaling in a supported way beginning with OSE 2.1, but it can only scale based on information that is available to it (running on a separate gear). It would probably make sense to ship haproxy with multiple default scaling strategies and have it select one based on the cartridge being scaled. That seems like a good feature request. BTW scaling EAP wouldn't reduce session load proportionally, since sessions are shared between multiple instances. Also sessions vary greatly in size - one app's session may have a few strings and another's may have sessions with MB or GB of data in them. Seems to me what you really want is for each platform to have a "load check" (similar to "health check") whereby it can indicate in its own way (customizable by the app) how close it is to needing scaling.
Thanks for looking into this. HAProxy doesn't and shouldn't be aware of cartridge internals. However, that's where the haproxy_ctld comes in, IIUC. It's surely possible to get the information about current number of active sessions from EAP's web subsystem through JMX. There's a short example in #1077727. Multiple scaling strategies to choose from would be a great improvement from the current single parameter (concurrent requests) based scaling controller. A robust scaling controller should, exactly for the reasons you mention, take into account multiple parameters. E.g., concurrent requests _and_ memory, active session count, cpu load, etc., and trigger a scale up when the threshold is hit on any of these parameters. Per-app user-defined load check sounds too complicated to me, maybe it would make more sense to ship a scaling controller adapted for each web platform cartridge. E.g. the controller for EAP cartridge would know how to query session count, etc.