Bug 517495

Summary:	"could not get next bucket brigade" while a client is doing a PUT results in data loss
Product:	Red Hat Enterprise Linux 4	Reporter:	Stefan Walter <walteste>
Component:	httpd	Assignee:	Joe Orton <jorton>
Status:	CLOSED ERRATA	QA Contact:	BaseOS QE <qe-baseos-auto>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	4.8	CC:	fnadge, ndevos, tao
Target Milestone:	rc	Keywords:	ZStream
Target Release:	---
Hardware:	All
OS:	Linux
URL:	https://issues.apache.org/bugzilla/show_bug.cgi?id=33098
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause Consequence: Fix: Result:	Story Points:	---
Clone Of:
Clones:	572910 572911 (view as bug list)		Environment:
Last Closed:	2011-02-16 13:58:23 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	572910, 572932

Description Stefan Walter 2009-08-14 11:38:59 UTC

Description of problem:

We have experienced the bug described in the following apache bugzilla
report and lost an iCal calendar file this way (file was deleted
and 'Could not get next bucket brigade [500, #0]' was logged):

  https://issues.apache.org/bugzilla/show_bug.cgi?id=33098

A fix for this bug has only been found recently  We request that it be
backported to the RHEL4 httpd.

Version-Release number of selected component (if applicable):

The latest httpd-2.0.52-41.ent.4 was built before a fix was known.

Comment 1 Joe Orton 2009-08-14 11:45:36 UTC

Thanks for the request.

Comment 6 Joe Orton 2009-10-07 10:12:04 UTC

Having looked at this in more detail:

1) the upstream change made to fix PR 33098 is simple and low-risk to backport to RHEL4.

2) I'm not sure that is going to solve the problem you're suffering from.

Could you explain exactly what the problem is that you're seeing?  A file getting deleted after a PUT request from a client fails? (in the case where the client fails, where the "could not get next bucket brigade" error is logged)?

The upstream change for PR 33098 is not going to fix that problem.  We can do one simple thing:

- change mod_dav such that it does not delete the file if there is an error reading the request body, for an existing file

but this is still going to leave cases where a partial PUT will leave a corrupted file on the DAV server, e.g. where the client does:

a) send PUT request with Content-Length: 20000
b) send 10000 bytes of request body, then disconnect

in that case, it is expected behaviour that you get a file on the server with 10000 bytes.  Clients which do not want that behaviour should be using a PUT then a MOVE - just as a Unix command will use open-temp-file/write/close/rename to achieve atomic updates.

Comment 7 Stefan Walter 2009-10-20 07:03:33 UTC

I agree with you that using PUT and MOVE (actually LOCK, PUT and MOVE) would
be the right way to do. The problem is that iCal clients do not seem to do
that. I only find PUT requests logged on our server, no MOVEs. Our people use
different calendar tools though. It seems common to use the simplistic approach
and use PUTs to upload calendar files.

When I google for problems with corrupted iCal files I do not find much.
People do not seem to have problems. I would have expected that problems
were common because of mobile devices which are prone to communication problems.

Unfortunately the users could not tell us what they were doing when they
triggered the bug in PR 33098. We cannot tell if, after applying the fix for
PR 33098, the calendar file would be corrupted instead of deleted or not.

The HTTP and WebDAV specifications AFAIK do not say anything about what has to
happen on the back-ed file system of a web server when a PUT is done. For iCal
files I personally think that mod_dav should have a configuration option to
implicitly do a open-temp-file/write/close/rename with the rename only happening
if all data is transferred. That would be a feature request for the mod_dav
developers though. :)

Comment 10 Joe Orton 2009-10-20 16:08:29 UTC

I've built a test package with one change to mod_dav:

- if a PUT fails for an existing file, the file will be left in place rather than being deleted.

which you can use for further testing if required - I've asked that this is passed on to you via support.

A configuration option to support an "atomic" PUT might be possible, though it would have to be agreed upstream first.  I would take the view that mod_dav with the filesystem-backed repository should operate in the simplest fashion possible; the protocol does already allow PUT+MOVE to provide an "atomic" replacement, so it's not necessary to duplicate this in mod_dav.

It might be possible to work around the issue by using another, less simplistic, DAV repository backend (Subversion/mod_dav_svn with auto-versioning might work, though I'm not sure how well that works in the version of mod_dav supported in RHEL).

Another option might be to periodically validate+backup the calendar files, and if validation fails, restore from backup; though you might argue this is a workaround for broken software.

A final option would be to use a simple CGI script as a PUT handler rather than mod_dav: http://httpd.apache.org/docs/2.2/mod/mod_actions.html though this answer may also be just as unsatisfactory to you!

Comment 18 Florian Nadge 2011-01-13 13:25:07 UTC

Please be so kind and add a few key words to the technical note of this
bugzilla entry using the following structure:

Cause:

Consequence:

Fix:

Result:

For more details on CCFR texts, see:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes

Comment 19 Florian Nadge 2011-01-13 13:25:07 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause

Consequence:

Fix:

Result:

Comment 20 errata-xmlrpc 2011-02-16 13:58:23 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0237.html