A denial of service flaw has been made public that can affect Tomcat 5.5 versions prior to 5.5.12 when serving directories containing lots of files. The details of this flaw are sketchy as the reporter did not give many details in the public report: http://secunia.com/advisories/17416/ However there is a message on tomcat-user list that gives a little more background into how to trigger the flaw: http://marc.theaimsgroup.com/?l=tomcat-user&m=112980869008084&w=2 There doesn't seem to be any other details and the list of affected versions should not be trusted without verification. One write up of this flaw mentions it may affect Windows only, but I'm not sure how they came to that conclusion. We'll rate this as moderate for now, although it could be low or not an issue at all depending on your analysis of how this affects RHAPS2.
I've confirmed that this issue is reproducable on linux with tomcat 5.5.11. Running "ab -c 100 -n 10000 http://*server*/test" against a tomcat instance where test is a directory with 100 files will cause the tomcat instance to keel over fairly quickly. With my test case there is no significant impact to the machine other than permantently hosting the tomcat instance. The same test runs fine against tomcat 5.5.12.
I've now successfully reproduced the problem on 5.5.9, 5.5.11, and 5.5.12. I believe the problem is related to the strategy used by the default servlet to cache filesystem access. This strategy is for the most part unchanged between 5.5.9 and 5.5.12, and so I believe the problem is present in all intermediate versions. The best way I've found to demonstrate the problem is to run a series of benchmarks against a directory listing with increasing concurrency levels, but a constant number of total requests. c | total | mean ------------------------------------------------------------- 1 | 12.250582 seconds | 122.506 ms 10 | 8.476202 seconds | 847.620 ms 20 | 7.917447 seconds | 1583.489 ms 30 | 7.818699 seconds | 2345.610 ms 40 | 9.72974 seconds | 3629.190 ms 50 | 9.952440 seconds | 4976.220 ms 60 | 14.325583 seconds | 8595.350 ms 70 | 58.615614 seconds | 41030.931 ms 80 | -- | -- 90 | -- | -- 100 | -- | -- c = number of concurrent requests (see the -c flag for ab) total = total time for benchmark to complete mean = the average time until each request is complete -- = benchmark timed out In all cases the total number of requests is 100 and the test directory being listed contains 1000 empty files. As you can see, the total time for the benchmark to complete increases rapidly as the number of concurrent requests increases from 50 to 70. My first guess was that this is somehow related to the cache used for filesystem access since the drop off in performance happens right around the point where the time to service requests exceeds the default 5 second timeout for the cache. In order to test this theory I reran the same set of benchmarks after adjusting the default timeout to be 10 minutes. This yielded the following results: c | total | mean ------------------------------------------------------------- 1 | 13.869870 seconds | 138.699 ms 10 | 8.692605 seconds | 869.261 ms 20 | 8.722117 seconds | 1744.423 ms 30 | 8.912768 seconds | 2673.831 ms 40 | 8.511701 seconds | 3404.680 ms 50 | 9.987676 seconds | 4993.838 ms 60 | 11.397467 seconds | 6838.480 ms 70 | 10.919906 seconds | 7643.934 ms 80 | 19.230226 seconds | 15384.181 ms 90 | 25.684044 seconds | 23115.639 ms 100 | 26.872654 seconds | 26872.654 ms While this does seem to confirm that the problem is somehow related to the caching strategy, I am unsatisfied with this as a complete explanation because a simple microbenchmark in python or Java easily indicates that performing the given number of file accesses without *any* caching either in serial or in parallel takes significantly less time than the server requires to respond or timeout and recover during the above benchmarks. Also, although it seems to happen rarely under my test conditions, the server does occasionally become permanently unresponsive. This seems to occur when for some reason the server runs out of memory and can no longer service incoming requests. Given these results, I suspect that in addition to a difficult to trigger memory leak there are likely some significant concurrency bottlenecks either in the DefaultServlet implementation or in tomcat itself. Some immediate workarounds to address this issue would be to change the default tomcat install to disable directory listing and/or reduce the number of threads used to service requests. To properly fix the problem will require detailed profiling of both tomcat and the default servlet implementation.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0161.html