From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031007 Description of problem: When HttpHTMLProvider.fetchHTML() is called for a URL which returns 0 bytes (an empty JSP, for example), the ReadConnection thread hangs for 30 seconds before returning an empty string. This has caused us some headaches when a customer ran a load test with empty templates and we had to find out why P2FS took 30s for each content item. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Put some log messages into fetchHTML() before and after rc.start() and rc.join(). 2. Activate P2FS, define an empty template, and publish an Article. 3. Take a look at the log. Additional info: A fix for this can be found in changelist 39168.
on dev @40354
The patch uses getContentLength() which looks at the content length header. this header doesn't appear to be set for stock CMS (it is always returning zero). carsten: any idea why?
In general no dynamically generated web pages ever have the ContentLength header set. Because the web server has no way of knowing how much content a script/app will produce, outputting a ContentLength header would require buffering all output until the response was complete - not allowing any streaming of content. The approach Apache & Resin both take is to instead use 'chunked encoding' whereby the response is split into chunks of some size. Anyway, the upshot is that nothing should rely on ContentLength headers being present. Reading the comments with patch 40354 - i'm not entirely clear why the reading of data from a 0 byte response would hang in the first place. The reader should be getting 'end of file'. Looking at the 'ReadConnection' class i think we should remove the use of calls to 'readLine' and instead read the raw bytes. So instead of while ((line = input.readLine()) != null) { buffer.append(line).append('\n'); } We should do byte buf[] = new byte[4096]; int ret; while ((ret = src.read(buf)) != -1) { line.append(buf); } As an added benefit, this change will preserve line endings in the format in which they were originally generated rather than hardcoding UNIX line ending semantics which are obviously inappropriate if CMS is on WIndows / MAC.
The ticket is about fetching a templated content item, and at least with Resin 2.1.4, I get a Content-Length header when a template has been defined. So Resin seems to gather all bytes before sending anything to the client when it serves a JSP. But you are right in that we should not rely on the optional Content-Length header. The cause for readLine hanging for the timeout period in case of 0 byte responses is a bug in some JDK implementations, I think of URLConnection. Unfortunately, I cannot find the Usenet postings anymore which mention this.