Bug 570061 - nfs client cached invalidate data.
Summary: nfs client cached invalidate data.
Keywords:
Status: CLOSED DUPLICATE of bug 511170
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Jeff Layton
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-03-03 06:19 UTC by xiaowei.hu
Modified: 2014-06-18 07:39 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-23 10:29:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
make nfs to invalidate page cache after the asynchronous directIO write. (4.97 KB, patch)
2010-03-03 06:19 UTC, xiaowei.hu
no flags Details | Diff

Description xiaowei.hu 2010-03-03 06:19:29 UTC
Created attachment 397478 [details]
make nfs to invalidate page cache after the asynchronous directIO write.

Description of problem:
We found this problem when doing the dm-nfs test which is not included in el5 ,according to the code analysis EL5 has this problem too, but we don't have a test case to reproduce this.

Here is some explanation and scenario which will expose this problem(as Chuck said):

The current nfs client code in EL5 works fine for synchronous direct I/O, since the cache invalidation is always done after the write has completed.

For asynchronous I/O, the write system call can return before the writes have been sent to the server.  There's a window there where the client can cache data after the cache invalidation, but before the direct writes are complete.

So,this is a problem, albeit a pretty subtle one, for existing applications that use asynchronous O_DIRECT writes concurrently with other accessors who use cached reads.

here is a patch,backported by Chuck Lever from the upstream could avoide this problem.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Chuck Lever 2010-03-03 20:48:32 UTC
I should add: the symptom of this problem is when applications update an NFS file with asynchronous direct I/O, the page cache on the same client can sometimes retain stale data.  This would be a problem, for example, if a database accessed it's data files via asynchronous direct I/O, but a backup tool on the same client used an unadorned read(2) to read the files.

The stale cache is rather fleeting.  The next O_DIRECT write to a file should purge any stale data (and naturally could introduce other stale data in the same way).

And to clarify the test case situation: we do have a reproducer, but it requires code that we add to the OEL5 kernel.  We don't have a stand-alone test case that is easy to give you.

The additional code introduces an alternate forward path in the NFS direct I/O engine, but both this new path and the existing path (via write(O_DIRECT)) share the same NFS direct write completion routines, where this bug is fixed.

I asked Xiaowei to open this bugzilla to document the defect in RHEL 5.

Comment 2 Chuck Lever 2010-03-19 20:31:51 UTC
It appears that the upstream commit in the attached patch may already applied in update 5, as part of a fix for RH 511170.

Comment 3 Jeff Layton 2010-03-22 11:52:39 UTC
Agreed, it looks like that patch is in place in the latest RHEL5 kernels. Is there anything more we need to do for this bug?

Comment 4 Chuck Lever 2010-03-22 17:01:31 UTC
It would be reasonable for Xiaowei to test the latest kernel (with the RH version of the patch), before closing this bug.

Comment 5 xiaowei.hu 2010-03-23 01:20:54 UTC
will check and test the 5.5 kernel and reply later.

Comment 6 xiaowei.hu 2010-03-23 02:08:19 UTC
verified , that patch included in 5.5 is exactly the same with this, though I don't have a standalone test case on el5 ,it works ok on a modified kernel with this patch, this bug cloud be closed.
thanks all

Comment 7 Jeff Layton 2010-03-23 10:29:12 UTC
Thanks for testing it, Xiaowei.

*** This bug has been marked as a duplicate of bug 511170 ***


Note You need to log in before you can comment on or make changes to this bug.