Red Hat Bugzilla – Bug 137194
NFS short writes cause file corruption with NFS O_DIRECT
Last modified: 2007-11-30 17:07:04 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2)
Description of problem:
suppose an application wants to write 96KB directly. the client turns
this into 3 32KB on-the-wire writes, A, B, and C. NFS servers do not
have to return an error if they write only some of the requested
bytes: this is called a "short write."
Case 1: Normal NFS direct writes
Case 2: today: NFS server returns a short write
Case 3: possible: NFS server returns a short write
Case 4: preferred: NFS server returns a short write
the NFS cached path has some recovery (or at least reporting)
capability for short writes, the direct path does not.
see below for analysis.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
you'd have to rig a server to return an occasional short write, and
then run an application on your client that performed direct writes
and verified the contents of the file afterwards (OraSim, for example).
Actual Results: today's client behaves like case 2. this can result
in data being written to the wrong offset in the file.
Expected Results: case 3 is a possible fix, but i argue that case 4
gives the most flexibility to the application for detection of and
recovery from a short write, without rewriting the NFS direct write
path to retry a short write.
a similar patch is destined for 2.6. (http://client.linux-nfs.org/).
i will attach a patch to fix this in the RHEL 3.0 update 3 NFS client.
Created attachment 105801 [details]
potential fix for this problem (diff against 2.4.21-20.EL)
Has this type of corruption been reported by any customers?
no customer reports, the bug was found by code inspection.
Created attachment 115302 [details]
The original patch did not compile in a current RHEL3 kernel. So
I wanted to run this by you to ensure its correct. With my testing
the patch seem not seem to cause any regressions, but, unfortunately,
I was not able to reproduce the corruption either
i'm not sure why there is an "args.request" in the patch i attached. the
2.4.21-20.EL source i have here uses "args.count" just as your new patch does.
This patch does not look right to me. It is valid for NFS servers to write
less data than was requested. There is no error implied when an NFS server
does so because it may have done so for its own reasons.
Of course, the NFS server may have written less data than requested because
it did encounter some sort of out of space or exceeded quota limit. The client
can discover this by generating another request to write the remainder of the
data. If a real error existed which prevented the server from writing the
full data the first time, then an error will be returned on this additional
An NFS server is responsible for either storing the data that it has indicated
that it has or returning an error to indicate why it could not. The client is
responsible for storing all of the data requested by an application or
returning an error indicating why it could not. A short return to the
write(2) system call is generally interpreted by applications as an error
having occurred. In this case, if the NFS client returns short, when no
error has actually occurred, then the application may misbehave needlessly.
The NFS client should implement proper support to handle short write returns.
It should not matter whether the WRITE requests are being generated from the
data cache or from an O_DIRECT request.
i agree that a server is allowed to return a short write, and that it is usually
not an error. given the constraints on resources and ABI compatibility,
however, the patch i have provided is only damage control for RHEL 3, and
nothing more. if Red Hat has the resources to implement complete and
ABI-compatible support for handling short reads and writes in both the cached
and direct I/O paths in RHEL 3, then by all means, have at it.
as 2.6 kernels are evolving, the NFS client in those kernels will eventually
have complete support for handling short reads and writes in both the cached and
direct I/O paths.
i do not agree, however, that a short return from write(2) will cause "needless
application misbehavior". if an app can't handle a short write, then it is
poorly written and should be fixed. short writes will happen no matter what,
and applications must be able to recover properly.
Due the the fact there has been not one reported
problem of this nature and the proposed patch does
introduce a functionality change (i.e. short writes).
So I am very concern that fix of this type (or any type
for that matter) has a high potently of introducing
a regression. Therefore, I'm closing this bug as WONTFIX.