Description of problem: Create a jbossas-7 scalable app and use "ab" to create network flow to trigger auto scaling up. Sometimes auto scaling up doesn't work fine. Version-Release number of selected component (if applicable): devenv_1869 How reproducible: Sometimes Steps to Reproduce: 1.Create a scalable jbossas-7 app $ rhc app create -a app1 -t jbossas-7 -s -d 2.Create "index.jsp" under <git_repo>/src/main/webapp/. Remove "index.html". $ rm <git_repo>/src/main/webapp/index.html ==> <git_repo>/src/main/webapp/index.jsp <== <%@ page contentType="text/html" language="java" import="java.sql.*" %> <%@ page import="javax.naming.*" %> <%@ page import="javax.sql.*" %> <%@ page import="java.util.*" %> <% Map map = System.getenv(); out.print(map.get("OPENSHIFT_GEAR_DNS")); %> 3.Use "ab" to create network flow $ ab -c 200 -n 100000 <app_url> 4.tail "scale_events.log". $ rhc app tail -a app1 -f app1/logs/scale_events.log -o '-n +1' Use OpenShift shell to check the app's processes. $ ps -ef Check <app_url>/haproxy-status/. Actual results: Sometimes gear up retried for several times. ==> scale_events.log <== I, [2012-07-02T06:32:03.437359 #14154] INFO -- : GEAR_UP - capacity: 630.0% gear_count: 1 sessions: 63 up_thresh: 90.0% D, [2012-07-02T06:34:16.238530 #14154] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 1 sessions: 0 up/remove_thresh: 90.0%/49.9% sec_left_til_remove: 0 gear_remove_thresh: 0/20 I, [2012-07-02T06:34:22.271034 #14154] INFO -- : GEAR_UP - capacity: 1670.0% gear_count: 1 sessions: 167 up_thresh: 90.0% D, [2012-07-02T06:36:27.662974 #14154] DEBUG -- : GEAR_INFO - capacity: 1390.0% gear_count: 1 sessions: 139 up/remove_thresh: 90.0%/49.9% sec_left_til_remove: 0 gear_remove_thresh: 0/20 I, [2012-07-02T06:36:30.686516 #14154] INFO -- : GEAR_UP - capacity: 1020.0% gear_count: 1 sessions: 102 up_thresh: 90.0% I, [2012-07-02T06:37:53.988507 #3249] INFO -- : Starting haproxy_ctld D, [2012-07-02T06:37:54.009961 #3249] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 2 sessions: 0 up/remove_thresh: 90.0%/49.9% sec_left_til_remove: 120 gear_remove_thresh: 0/20 In haproxy-status page, during gear up operation, the default gear went UP and DOWN for several times. After gear up finish, the 2 gears went UP and DOWN. In the output of ps -ef, sometimes /usr/libexec/stickshift/cartridges/embedded/haproxy-1.4/info/bin/add-gear apprears and disappears again and again. Sometimes 3 "/usr/libexec/stickshift/cartridges/jbossas-7/info/bin/sync_gears.sh" processes exist at the same time. <----------output-------------> 500 1087 1072 0 06:36 ? 00:00:00 /bin/bash /usr/libexec/stickshift/cartridges/jbossas-7/info/bin/sync_gears.sh e9853accdc1b4220bf6b0475d594723a.65.181 500 1095 1087 0 06:36 ? 00:00:00 /bin/bash /usr/libexec/stickshift/cartridges/jbossas-7/info/bin/sync_gears.sh e9853accdc1b4220bf6b0475d594723a.65.181 500 1096 1095 0 06:36 ? 00:00:00 /bin/bash /usr/libexec/stickshift/cartridges/jbossas-7/info/bin/sync_gears.sh e9853accdc1b4220bf6b0475d594723a.65.181 500 1365 1096 0 06:36 ? 00:00:00 /bin/bash /usr/libexec/stickshift/cartridges/embedded/haproxy-1.4/info/bin/ssh e9853accdc1b4220bf6b0475d594723a.65.18 500 1366 1365 0 06:36 ? 00:00:00 /usr/bin/ssh -q -o BatchMode=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/var/lib/stickshift/21774ba7ead94cab881d 500 14148 1 1 06:26 ? 00:00:07 /usr/sbin/haproxy -f /var/lib/stickshift/21774ba7ead94cab881da54569e9b324//haproxy-1.4/conf/haproxy.cfg 500 14154 1 0 06:26 ? 00:00:00 haproxy_ctld 500 19628 19609 0 06:28 ? 00:00:00 sshd: 21774ba7ead94cab881da54569e9b324@pts/1 500 19629 19628 0 06:28 ? 00:00:00 /bin/bash --init-file /usr/bin/rhcsh -i 500 20017 19993 0 06:29 ? 00:00:00 sshd: 21774ba7ead94cab881da54569e9b324@pts/2 500 20018 20017 0 06:29 ? 00:00:00 /bin/bash --init-file /usr/bin/rhcsh -i 500 29528 29515 0 06:34 ? 00:00:00 /bin/bash /usr/libexec/stickshift/cartridges/jbossas-7/info/bin/sync_gears.sh e52bea3355594643b801eecaa251212b.65.181 500 31353 29528 0 06:35 ? 00:00:00 cat /var/lib/stickshift/21774ba7ead94cab881da54569e9b324/app1//tmp/sync_gears.output.sbiR2YbbxMHHIV7d 500 32407 14154 2 06:36 ? 00:00:01 /usr/bin/ruby /usr/libexec/stickshift/cartridges/embedded/haproxy-1.4/info/bin/add-gear -n jizhao4 -a app1 -u 21774ba7ead94 <----------------------------> Expected results: The app should be auto scaled up without error. Additional info: I've also tried php-5.3 cartridge and it works fine.
The only way I could reproduce this bug was to reduce the scale-up timeout so I have increased the previous timeout.
Created attachment 597323 [details] haproxy status screenshot During GEAR UP, 2 gears are all up and down.
Still get the same error on devenv_1882. I've attached the haproxy status screenshot. Change bug status back to ASSIGNED.
Sometimes, I can't destroy the app after testing. Here is the development.log on the server: http://pastebin.test.redhat.com/96308
Hi Jiangtao, Hoping you can provide me with some addition info on this bug. I was unable to recreate the original bug (only 1 jboss instance/session after a SCALE_UP event) until I manually decreased the scale_up timeout. I increased that timeout in case that was the cause of the problem you were seeing. Are you still seeing that problem or is there another problem? Are you hitting the backup stats page after a scale-up? I saw the haproxy stats attachment in the bug. I haven't been able to recreate any problem. After a scale-up, I am able to successfully hit ROOT.war and never see the haproxy stats page that would indicate a load-balancing failure. Thanks -Bill
I notice that AS7 is throwing the error below when running the load test. I've asked the jbossweb team to see if there's a way to increase the max active sessions, but I don't see anything in the jbossweb schema. In the meantime, could you: 1) Attach the server.log file for the 1st jboss instance when you start the load test. I would like to see if you are seeing the same error. This may be causing the failure 2) Add the following line to the top of your JSP. This will stop tomcat from creating a new session everytime the JSP is hit <%@ page session="false" %> 2012/07/10 11:53:39,394 FATAL [org.apache.jasper.runtime.JspFactoryImpl] (http--127.0.250.129-8080-47) Exception initializing page context: java.lang.IllegalStateException: createSession: Too many active sessions at org.apache.catalina.session.StandardManager.createSession(StandardManager.java:293) [jbossweb-7.0.10.Final.jar:] at org.apache.catalina.connector.Request.doGetSession(Request.java:2665) [jbossweb-7.0.10.Final.jar:] at org.apache.catalina.connector.Request.getSession(Request.java:2375) [jbossweb-7.0.10.Final.jar:] at org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:841) [jbossweb-7.0.10.Final.jar:] at org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:852) [jbossweb-7.0.10.Final.jar:] at org.apache.jasper.runtime.PageContextImpl._initialize(PageContextImpl.java:146) [jbossweb-7.0.10.Final.jar:] at org.apache.jasper.runtime.PageContextImpl.initialize(PageContextImpl.java:124) [jbossweb-7.0.10.Final.jar:] at org.apache.jasper.runtime.JspFactoryImpl.internalGetPageContext(JspFactoryImpl.java:106) [jbossweb-7.0.10.Final.jar:] at org.apache.jasper.runtime.JspFactoryImpl.getPageContext(JspFactoryImpl.java:62) [jbossweb-7.0.10.Final.jar:] at org.apache.jsp.index_jsp._jspService(index_jsp.java:49) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) [jbossweb-7.0.10.Final.jar:] at javax.servlet.http.HttpServlet.service(HttpServlet.java:847) [jboss-servlet-api_3.0_spec-1.0.0.Final.jar:1.0.0.Final] at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:369) [jbossweb-7.0.10.Final.jar:] at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:326) [jbossweb-7.0.10.Final.jar:] at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:253) [jbossweb-7.0.10.Final.jar:] at javax.servlet.http.HttpServlet.service(HttpServlet.java:847) [jboss-servlet-api_3.0_spec-1.0.0.Final.jar:1.0.0.Final] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:329) [jbossweb-7.0.10.Final.jar:] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248) [jbossweb-7.0.10.Final.jar:] at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275) [jbossweb-7.0.10.Final.jar:] at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161) [jbossweb-7.0.10.Final.jar:] at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:154) [jboss-as-web-7.1.0.Final.jar:7.1.0.Final] at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155) [jbossweb-7.0.10.Final.jar:] at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) [jbossweb-7.0.10.Final.jar:] at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) [jbossweb-7.0.10.Final.jar:] at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:368) [jbossweb-7.0.10.Final.jar:] at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877) [jbossweb-7.0.10.Final.jar:] at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:671) [jbossweb-7.0.10.Final.jar:] at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:930) [jbossweb-7.0.10.Final.jar:] at java.lang.Thread.run(Thread.java:679) [rt.jar:1.6.0_24]
Because of the LOW_MEMORY setting, the number of sessions is limited to 1024 by default. To override, add the following below </extensions> in standalone.xml: <system-properties> <property name="org.apache.catalina.session.StandardManager.MAX_ACTIVE_SESSIONS" value="-1"/> </system-properties> This should resolve the active session limitation
Not able to recreate any scale-up problems but there are consistent problems with scale-down on the latest devenv I, [2012-07-10T16:12:17.469959 #16426] INFO -- : GEAR_UP - capacity: 700.0% gear_count: 1 sessions: 70 up_thresh: 90.0% I, [2012-07-10T16:13:01.933128 #23223] INFO -- : Starting haproxy_ctld D, [2012-07-10T16:13:01.962894 #23223] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 2 sessions: 1 up/remove_thresh: 90.0%/49.9% sec_left_til_remove: 120 gear_remove_thresh: 0/20 I, [2012-07-10T16:13:23.214079 #23223] INFO -- : GEAR_UP - capacity: 110.0% gear_count: 2 sessions: 23 up_thresh: 90.0% D, [2012-07-10T16:13:28.067165 #23223] DEBUG -- : GEAR_UP - add-gear: exit: 0 stdout: Already at the maximum number of gears allowed for either the app or your account. D, [2012-07-10T16:13:28.216436 #23223] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 2 sessions: 0 up/remove_thresh: 90.0%/49.9% sec_left_til_remove: 115 gear_remove_thresh: 0/20 I, [2012-07-10T16:15:25.458342 #23223] INFO -- : GEAR_DOWN - capacity: 0.0% gear_count: 2 sessions: 0 remove_thresh: 49.9% D, [2012-07-10T16:16:38.017702 #23223] DEBUG -- : GEAR_DOWN - remove-gear: exit: 256 stdout: D, [2012-07-10T16:16:38.030316 #23223] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 2 sessions: 0 up/remove_thresh: 90.0%/49.9% sec_left_til_remove: 0 gear_remove_thresh: 20/20 I, [2012-07-10T16:17:38.222018 #23223] INFO -- : GEAR_DOWN - capacity: 0.0% gear_count: 2 sessions: 0 remove_thresh: 49.9%
Another data point ... if there is no traffic and manually scale-up then the scale-down works fine. Looks like the scale-down only fails under load.
The scale-down issue only occurs when JBoss is configured to allow unlimited sessions. When the JSP has <%@ page session="false" %>, then it behaves correctly. I am moving to QA to confirm the original issues and will continue to look into the session/scale-down problem
Created attachment 597505 [details] development.log during auto scaling up I've attached the development.log during auto scaling up. Besides, I also get some error message in scale_events.log of the app: D, [2012-07-11T04:36:22.949483 #5076] DEBUG -- : GEAR_INFO - capacity: 10.0% gear_count: 1 sessions: 1 up/remove_thresh: 90.0%/49.9% sec_left_til_remove: 84 gear_remove_thresh: 0/20 I, [2012-07-11T04:36:26.002934 #5076] INFO -- : GEAR_UP - capacity: 380.0% gear_count: 1 sessions: 38 up_thresh: 90.0% /usr/libexec/stickshift/cartridges/embedded/haproxy-1.4/info/bin/add-gear:61:in `initialize': undefined method `http' for #<RestClient::InternalServerError:0x7fd369f72be8> (NoMethodError) from /usr/libexec/stickshift/cartridges/embedded/haproxy-1.4/info/bin/add-gear:211:in `new' from /usr/libexec/stickshift/cartridges/embedded/haproxy-1.4/info/bin/add-gear:211 D, [2012-07-11T04:38:35.470239 #5076] DEBUG -- : GEAR_UP - add-gear: exit: 256 stdout: D, [2012-07-11T04:38:35.503193 #5076] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 1 sessions: 0 up/remove_thresh: 90.0%/49.9% sec_left_til_remove: 0 gear_remove_thresh: 0/20
I haven't been able to reproduce any error, but I'll keep trying. The error above is one reason why we should not be using Ruby.
Can you please provide the exact steps you used to recreate this problem? Were the session errors corrected and if so, how?
What environment/ami are you using to create this error?
Created attachment 597578 [details] screenshot of haproxy-status
The error I meet is a little different from the original one now. Instead of seeing two gears UP and DOWN, the default gear go DOWN. Version-Release number of selected component (if applicable): devenv_1884 How reproducible: Always Steps to Reproduce: 1.Create a scalable jbossas-7 app $ rhc app create -a app1 -t jbossas-7 -s -d 2.Create "index.jsp" under <git_repo>/src/main/webapp/. Remove "index.html". $ rm <git_repo>/src/main/webapp/index.html ==> <git_repo>/src/main/webapp/index.jsp <== <%@ page session="false" %> <%@ page contentType="text/html" language="java" %> <%@ page import="javax.naming.*" %> <%@ page import="java.util.*" %> <% Map map = System.getenv(); out.print(map.get("OPENSHIFT_GEAR_DNS")); %> 3. Add the following to <git_repo>/.openshift/config/standalone.xml: <system-properties> <property name="org.apache.catalina.session.StandardManager.MAX_ACTIVE_SESSIONS" value="-1"/> </system-properties> 4. Use Openshift shell to login the app, stop haproxy_ctld and re-start haproxy_ctld with --debug. This way we can get detailed haproxy log. $ ssh <app_uuid>@<app_host> $ haproxy_ctld_daemon stop $ haproxy_ctld --debug & 5.Use "ab" to create network flow $ ab -c 200 -n 100000 <app_url> 6.tail "scale_events.log". Check <app_url>/haproxy-status/. $ rhc app tail -a app1 -f app1/logs/scale_events.log -o '-n +1' Actual results: On my side, in <app_url>/haproxy-status/, after total session number growing to 26251, the default gear go DOWN. If I visit <app_url>/, I will get the haproxy-status page. Here is the screenshot: https://bugzilla.redhat.com/attachment.cgi?id=597578 In scale_events.log, I get: .... D, [2012-07-11T09:33:51.146001 #30386] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 1 sessions: 0 up/remove_thresh: 90.0%/49.9% sec_left_til_remove: 24 gear_remove_thresh: 0/20 I, [2012-07-11T09:33:54.160020 #30386] INFO -- : GEAR_UP - capacity: 200.0% gear_count: 1 sessions: 20 up_thresh: 90.0% D, [2012-07-11T09:36:15.816112 #30386] DEBUG -- : GEAR_UP - add-gear: exit: 256 stdout: D, [2012-07-11T09:36:15.824645 #30386] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 1 sessions: 0 up/remove_thresh: 90.0%/49.9% sec_left_til_remove: 0 gear_remove_thresh: 0/20 .... Here's the development.log: http://pastebin.test.redhat.com/96516
Still unable to reproduce (latest AMI tested is ami-1b852a72). The development log shows a failure in parsing an mcollective response.
Have recreated on a medium instance ... stay tuned
The mcollective timeout causes the original JBoss instance to be deleted leaving only an haproxy gear with 0 backing JBoss instances.
http://pastebin.com/J53AsZyN
D, [2012-07-11T17:41:55.197647 #9381] DEBUG -- : GEAR_INFO - capacity: 10.0% gear_count: 1 sessions: 1 up/remove_thresh: 90.0%/49.9% sec_left_til_remove: 0 gear_remove_thresh: 0/20 I, [2012-07-11T17:41:58.237579 #9381] INFO -- : GEAR_UP - capacity: 230.0% gear_count: 1 sessions: 23 up_thresh: 90.0% D, [2012-07-11T17:44:17.593840 #9381] DEBUG -- : GEAR_UP - add-gear: exit: 256 stdout: D, [2012-07-11T17:44:17.623614 #9381] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 1 sessions: 0 up/remove_thresh: 90.0%/49.9% sec_left_til_remove: 0 gear_remove_thresh: 0/20
Post fix, you will still see 256 failures for scale-up at times on a medium instance. The app should keep on rolling however.
fixed with rev#06250abd651f4b4be3fe28cd09834ce4b18ba30e in crankcase.repo
Verified on devenv-stage_223.