Bug 137194 - NFS short writes cause file corruption with NFS O_DIRECT
Summary: NFS short writes cause file corruption with NFS O_DIRECT
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: Brian Brock
URL: http://client.linux-nfs.org/
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-10-26 16:53 UTC by Chuck Lever
Modified: 2007-11-30 22:07 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-01-19 18:30:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
potential fix for this problem (diff against 2.4.21-20.EL) (396 bytes, patch)
2004-10-26 16:58 UTC, Chuck Lever
no flags Details | Diff
updated patch (312 bytes, patch)
2005-06-10 18:18 UTC, Steve Dickson
no flags Details | Diff

Description Chuck Lever 2004-10-26 16:53:22 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2)
Gecko/20040803

Description of problem:
suppose an application wants to write 96KB directly.  the client turns
this into 3 32KB on-the-wire writes, A, B, and C.  NFS servers do not
have to return an error if they write only some of the requested
bytes: this is called a "short write."

Case 1:  Normal NFS direct writes

|----A----||----B----||----C----|

Case 2:  today: NFS server returns a short write

|----A----||--B--||----C----|hole

Case 3:  possible: NFS server returns a short write

|----A----||--B--|hole|----C----|

Case 4:  preferred: NFS server returns a short write

|----A----||--B--||----hole-----|

the NFS cached path has some recovery (or at least reporting)
capability for short writes, the direct path does not.

see below for analysis.

Version-Release number of selected component (if applicable):
kernel-2.4.21-20.EL

How reproducible:
Didn't try

Steps to Reproduce:
you'd have to rig a server to return an occasional short write, and
then run an application on your client that performed direct writes
and verified the contents of the file afterwards (OraSim, for example).
    

Actual Results:  today's client behaves like case 2.  this can result
in data being written to the wrong offset in the file.  

Expected Results:  case 3 is a possible fix, but i argue that case 4
gives the most flexibility to the application for detection of and
recovery from a short write, without rewriting the NFS direct write
path to retry a short write.

Additional info:

a similar patch is destined for 2.6.  (http://client.linux-nfs.org/).
 i will attach a patch to fix this in the RHEL 3.0 update 3 NFS client.

Comment 1 Chuck Lever 2004-10-26 16:58:04 UTC
Created attachment 105801 [details]
potential fix for this problem (diff against 2.4.21-20.EL)

Comment 5 Steve Dickson 2005-06-09 11:49:01 UTC
Hey Chuck,

Has this type of corruption been reported by any customers?

Comment 6 Chuck Lever 2005-06-09 15:07:11 UTC
no customer reports, the bug was found by code inspection.

Comment 7 Steve Dickson 2005-06-10 18:18:09 UTC
Created attachment 115302 [details]
updated patch

Chuck,

The original patch did not compile in a current RHEL3 kernel. So
I wanted to run this by you to ensure its correct. With my testing
the patch seem not seem to cause any regressions, but, unfortunately, 
I was not able to reproduce the corruption either

Comment 9 Chuck Lever 2005-06-10 20:25:34 UTC
i'm not sure why there is an "args.request" in the patch i attached.  the
2.4.21-20.EL source i have here uses "args.count" just as your new patch does.

looks good.

Comment 10 Peter Staubach 2005-06-14 12:37:15 UTC
This patch does not look right to me.  It is valid for NFS servers to write
less data than was requested.  There is no error implied when an NFS server
does so because it may have done so for its own reasons.

Of course, the NFS server may have written less data than requested because
it did encounter some sort of out of space or exceeded quota limit.  The client
can discover this by generating another request to write the remainder of the
data.  If a real error existed which prevented the server from writing the
full data the first time, then an error will be returned on this additional
request.

An NFS server is responsible for either storing the data that it has indicated
that it has or returning an error to indicate why it could not.  The client is
responsible for storing all of the data requested by an application or
returning an error indicating why it could not.  A short return to the
write(2) system call is generally interpreted by applications as an error
having occurred.  In this case, if the NFS client returns short, when no
error has actually occurred, then the application may misbehave needlessly.

The NFS client should implement proper support to handle short write returns.
It should not matter whether the WRITE requests are being generated from the
data cache or from an O_DIRECT request.

Comment 11 Chuck Lever 2005-06-14 16:40:41 UTC
hi peter-

i agree that a server is allowed to return a short write, and that it is usually
not an error.  given the constraints on resources and ABI compatibility,
however, the patch i have provided is only damage control for RHEL 3, and
nothing more.  if Red Hat has the resources to implement complete and
ABI-compatible support for handling short reads and writes in both the cached
and direct I/O paths in RHEL 3, then by all means, have at it.

as 2.6 kernels are evolving, the NFS client in those kernels will eventually
have complete support for handling short reads and writes in both the cached and
direct I/O paths.

i do not agree, however, that a short return from write(2) will cause "needless
application misbehavior".  if an app can't handle a short write, then it is
poorly written and should be fixed.  short writes will happen no matter what,
and applications must be able to recover properly.

Comment 22 Steve Dickson 2006-01-19 18:30:21 UTC
Due the the fact there has been not one reported
problem of this nature and the proposed patch does
introduce a functionality change (i.e. short writes).
So I am very concern that fix of this type (or any type
for that matter) has a high potently of introducing 
a regression. Therefore, I'm closing this bug as WONTFIX.


Note You need to log in before you can comment on or make changes to this bug.