Bug 1394012 - Tomcat hangs when number of StartServers increased
Summary: Tomcat hangs when number of StartServers increased
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Satellite 5
Classification: Red Hat
Component: Server
Version: 570
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Tomáš Kašpárek
QA Contact: Red Hat Satellite QA List
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-10 20:09 UTC by Neal Kim
Modified: 2021-03-11 14:48 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-10 21:01:09 UTC
Target Upstream Version:


Attachments (Terms of Use)
javacore from Tomcat while "hung" (978.24 KB, text/plain)
2016-11-10 20:09 UTC, Neal Kim
no flags Details

Description Neal Kim 2016-11-10 20:09:20 UTC
Created attachment 1219525 [details]
javacore from Tomcat while "hung"

Description of problem:

In an attempt to increase the number of systems able to provision from Satellite at any one time we increase the value of StartServers in:

/etc/httpd/conf.d/zz-spacewalk-server.conf

From:

<IfModule prefork.c>
  StartServers         8
  
To a modest:

<IfModule prefork.c>
  StartServers         20
  
When attempting to kickstart more than ~5 systems in parallel at a time, the first several HTTP requests return but then eventually fails. At which point subsequent HTTP requests fail and the WebUI becomes unavailable.

Tomcat appears to be "hung" (or at least waiting for something to happen) and restarting the Satellite services is the only way to restore service. Restarting Apache also seems to restore service but not right away (several minutes).

During this time both memory and cpu utilization are at nominal values.

Modifying the number of AJP connector maxThreads and ProxyTimeout has no effect.

Restoring the default number of StartServers apparently works much better.


Version-Release number of selected component (if applicable):

Satellite 5.7
spacewalk-schema-2.3.2-27.el6sat.noarch
satellite-schema-5.7.0.24-1.el6sat.noarch


How reproducible:

Easily, on a fresh install of Satellite 5.7.


Steps to Reproduce:

Change the number of StartServers in /etc/httpd/conf.d/zz-spacewalk-server.conf:

<IfModule prefork.c>
  StartServers         20
  
Restart Satellite:

# rhn-satellite restart

Simulate kickstart traffic:

[root@sat57 conf.d]# ab -n 1000 -c 20 http://<SATELLITE_FQDN>/ks/dist/org/1/file/does/not/exist
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking sat57 (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
apr_poll: The timeout specified has expired (70007)
Total of 994 requests completed

Observe that the WebUI is unavailable and subsequent HTTP requests fail until Satellite services are restarted.


Actual results:

WebUI is unavailable and Tomcat appears "hung".


Expected results:

WebUI is available and Tomcat not hanging.


Additional info:

Kickstarts use the following rewrite rule:

RewriteRule ^/ks/dist(.*)$ /rhn/common/DownloadFile.do?url=/ks/dist$1

At least in my testing there appears to be some relation between the number of ESTABLISHED AJP connections and when Tomcat "hangs". Somewhere around ~20 or so but it varies.

Will attach javacore while Tomcat is "hung".


Note You need to log in before you can comment on or make changes to this bug.