Hide Forgot
Created attachment 1219525 [details] javacore from Tomcat while "hung" Description of problem: In an attempt to increase the number of systems able to provision from Satellite at any one time we increase the value of StartServers in: /etc/httpd/conf.d/zz-spacewalk-server.conf From: <IfModule prefork.c> StartServers 8 To a modest: <IfModule prefork.c> StartServers 20 When attempting to kickstart more than ~5 systems in parallel at a time, the first several HTTP requests return but then eventually fails. At which point subsequent HTTP requests fail and the WebUI becomes unavailable. Tomcat appears to be "hung" (or at least waiting for something to happen) and restarting the Satellite services is the only way to restore service. Restarting Apache also seems to restore service but not right away (several minutes). During this time both memory and cpu utilization are at nominal values. Modifying the number of AJP connector maxThreads and ProxyTimeout has no effect. Restoring the default number of StartServers apparently works much better. Version-Release number of selected component (if applicable): Satellite 5.7 spacewalk-schema-2.3.2-27.el6sat.noarch satellite-schema-5.7.0.24-1.el6sat.noarch How reproducible: Easily, on a fresh install of Satellite 5.7. Steps to Reproduce: Change the number of StartServers in /etc/httpd/conf.d/zz-spacewalk-server.conf: <IfModule prefork.c> StartServers 20 Restart Satellite: # rhn-satellite restart Simulate kickstart traffic: [root@sat57 conf.d]# ab -n 1000 -c 20 http://<SATELLITE_FQDN>/ks/dist/org/1/file/does/not/exist This is ApacheBench, Version 2.3 <$Revision: 655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking sat57 (be patient) Completed 100 requests Completed 200 requests Completed 300 requests Completed 400 requests Completed 500 requests Completed 600 requests Completed 700 requests Completed 800 requests Completed 900 requests apr_poll: The timeout specified has expired (70007) Total of 994 requests completed Observe that the WebUI is unavailable and subsequent HTTP requests fail until Satellite services are restarted. Actual results: WebUI is unavailable and Tomcat appears "hung". Expected results: WebUI is available and Tomcat not hanging. Additional info: Kickstarts use the following rewrite rule: RewriteRule ^/ks/dist(.*)$ /rhn/common/DownloadFile.do?url=/ks/dist$1 At least in my testing there appears to be some relation between the number of ESTABLISHED AJP connections and when Tomcat "hangs". Somewhere around ~20 or so but it varies. Will attach javacore while Tomcat is "hung".