Bug 1208065
Summary: | O_TRUNC ignored on NFS file with invalid cache entry | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Brano Zarnovican <zarnovican> | ||||||
Component: | kernel | Assignee: | Benjamin Coddington <bcodding> | ||||||
kernel sub component: | NFS | QA Contact: | JianHong Yin <jiyin> | ||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||
Severity: | medium | ||||||||
Priority: | unspecified | CC: | bcodding, bfields, dhoward, eguan, tlavigne | ||||||
Version: | 6.6 | ||||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | kernel-2.6.32-569.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2015-07-22 08:46:34 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Created attachment 1009604 [details]
Node2 NFS calls in GOOD case
More info on the problem.. * I'm able to reproduce it on Linux NFS server => it's client specific problem * I'm able to reproduce it even if client mount the volume with "sync,noac,lookupcache=none". I was convinced that attribute caching contributes to the problem. What is weird, that the problem is reproducible even if you leave 5min delay between steps 2) and 3). * You can workaround the problem by explicitly calling ftruncate() between open() and write() in step 3) This issue was created as Private Bug by mistake. If someone is reading it that has permission to make it public, please do so. Apparently I cannot.. Regards, Brano Zarnovican That doesn't look like expected behavior to me. I'm surprised we haven't run across this before, but bugzilla searches aren't turning up a relevant bug. We're not setting the size attribute in nfs_open_create() because its expected to be done in nfs_atomic_lookup(). But in this case we do ->lookup without an open intent which creates the dentry, then lookup is skipped to create the file so the size attribute is not set. This was fixed upstream a long time ago by moving to atomic_open(). Probably what needs to be done is to check intent and set attributes appropriately in nfs_open_create just as in nfs_atomic_lookup(). This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. Well, lookup is not skipped, but setting the size attribute which would normally happen in ->lookup if we had an open intent is skipped in nfs_open_revalidate() since nfs_neg_need_reval() is optimizing away revalidation of negative dentries on create. I'd prefer to fix this in nfs_open_create() rather than nfs_neg_need_reval(). Probably all that's needed here is something small and targeted to RHEL6, such as: diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index a7592b4..fb39e53 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1717,6 +1717,11 @@ static int nfs_open_create(struct inode *dir, struct dentry *dentry, int mode, if (IS_ERR(ctx)) goto out_err_drop; + if (open_flags & O_TRUNC) { + attr.ia_valid |= ATTR_SIZE; + attr.ia_size = 0; + } + error = NFS_PROTO(dir)->create(dir, dentry, &attr, open_flags, ctx); if (error != 0) goto out_put_ctx; Patch(es) available on kernel-2.6.32-569.el6 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1272.html |
Created attachment 1009603 [details] Node2 NFS calls Description of problem: If you open an existing file for write and truncate (O_WRONLY|O_CREAT|O_TRUNC) which has a invalid negative cache on this host, file is NOT truncated. Instead it is simply rewriting existing content. The problem is only reproducible if you pre-cache non-existence of the file first. Without this first step, file is truncated as expected. Version-Release number of selected component (if applicable): kernel 2.6.32-504.8.1.el6.x86_64 nfs-utils-1.2.3-54.el6.x86_64 How reproducible: consistently Steps to Reproduce: 1. Node2: do stat() on non-existing file "test_file1" on NFS ls -l test_file1 2. Node1: create and populate "test_file1" with "AAAAAAAA". Close the file. echo -n AAAAAAAA > test_file1 3. Node2: open("test_file1", O_WRONLY|O_CREAT|O_TRUNC, ..), write new content "BBBB" and close the file. echo -n BBBB > test_file1 4. Node1/2: view file's content cat test_file1 Actual results: File's content is "BBBBAAAA" Expected results: File's content is "BBBB" Additional info: Note, that the content created by Node1 is written and committed to server before Node2 calls open(). This is not the case of concurrent writer-writer. I was able to reproduce the problem on NFSv3, NFSv4. I was able to reproduce in on 2.6.32 and 3.12.33-1.el6.x86_64 kernels. I have tested it against Netapp NFS server. Looking at the tcpdump, the problem seems to be on client. So it should be NFS server independent. I'm attaching tcpdump for the BAD case, as well as GOOD case, where the first step was skipped. The tcpdump is from C prog which is using only the minimal number of syscalls to reproduce it. If this is expected behavior, I apologize for your wasted time ;) Brano Zarnovican