113038 – HttpHTMLProvider.fetchHTML() and 0 Byte Content

Bug 113038 - HttpHTMLProvider.fetchHTML() and 0 Byte Content

Summary: HttpHTMLProvider.fetchHTML() and 0 Byte Content

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise CMS
Classification:	Retired
Component:	other
Sub Component:
Version:	6.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	low
Target Milestone:	---
Assignee:	Richard Li
QA Contact:	Jon Orris
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	108447
TreeView+	depends on / blocked

Reported:	2004-01-07 17:53 UTC by Carsten Clasohm
Modified:	2007-04-18 17:01 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-05-09 14:40:24 UTC
Embargoed:

Attachments	(Terms of Use)

Description Carsten Clasohm 2004-01-07 17:53:30 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031007

Description of problem:
When HttpHTMLProvider.fetchHTML() is called for a URL which returns 0
bytes (an empty JSP, for example), the ReadConnection thread hangs for
30 seconds before returning an empty string. This has caused us some
headaches when a customer ran a load test with empty templates and we
had to find out why P2FS took 30s for each content item.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Put some log messages into fetchHTML() before and after rc.start()
and rc.join().
2. Activate P2FS, define an empty template, and publish an Article.
3. Take a look at the log.


Additional info:

A fix for this can be found in changelist 39168.

Comment 1 Richard Li 2004-02-13 14:24:18 UTC

on dev @40354

Comment 2 Richard Li 2004-02-23 21:27:58 UTC

The patch uses getContentLength() which looks at the content length
header. this header doesn't appear to be set for stock CMS (it is
always returning zero). carsten: any idea why?

Comment 3 Daniel Berrangé 2004-02-24 10:07:54 UTC

In general no dynamically generated web pages ever have the
ContentLength header set. Because the web server has no way of knowing
how much content a script/app will produce, outputting a ContentLength
header would require buffering all output until the response was
complete - not allowing any streaming of content. The approach Apache
& Resin both take is to instead use 'chunked encoding' whereby the
response is split into chunks of some size. 

Anyway, the upshot is that nothing should rely on ContentLength
headers being present.

Reading the comments with patch 40354 - i'm not entirely clear why the
reading of data from a 0 byte response would hang in the first place.
The reader should be getting 'end of file'. Looking at the
'ReadConnection' class i think we should remove the use of calls to
'readLine' and instead read the raw bytes. So instead of

                while ((line = input.readLine()) != null) {
                    buffer.append(line).append('\n');
                }


We should do

        byte buf[] = new byte[4096];
        int ret;

        while ((ret = src.read(buf)) != -1) {
            line.append(buf);
        }

As an added benefit, this change will preserve line endings in the
format in which they were originally generated rather than hardcoding
UNIX line ending semantics which are obviously inappropriate if CMS is
on WIndows / MAC.

Comment 4 Carsten Clasohm 2004-02-24 14:55:11 UTC

The ticket is about fetching a templated content item, and at least
with Resin 2.1.4, I get a Content-Length header when a template has
been defined. So Resin seems to gather all bytes before sending
anything to the client when it serves a JSP. But you are right in that
we should not rely on the optional Content-Length header.

The cause for readLine hanging for the timeout period in case of 0
byte responses is a bug in some JDK implementations, I think of
URLConnection. Unfortunately, I cannot find the Usenet postings
anymore which mention this.

Note You need to log in before you can comment on or make changes to this bug.