Bug 601800
Summary: | NFS-over-GFS out-of-order GETATTR Reply causes corruption | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Abhijith Das <adas> |
Component: | kernel | Assignee: | Jeff Layton <jlayton> |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 5.6 | CC: | adas, bfields, bmarzins, djansa, jlayton, nstraz, rpeterso, rwheeler, sardella, steved, swhiteho, tao |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-01-13 21:36:22 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Abhijith Das
2010-06-08 16:28:16 UTC
Created attachment 422268 [details]
relevant portion of tcpdump on the server when the crash occurs
When viewed with wireshark, this tcpdump shows that an out-of-order GETATTR Reply packet (for inode 79) was sent after a WRITE operation. The WRITE operation wrote 1024@694092 making isize=695116... but the out-of-order GETATTR reply sends the isize as 694092 and the client is unable to read to the actual end of the file because of this.
Created attachment 422269 [details]
nfs log snippet on client when crash occured
It looks like 47c287183705bdbe3c10bf1d57636589d1f336b3 applies cleanly to RHEL5. I think it's the right fix for this. I'm adding this to my test kernels and will plan to do some regression testing with it. I think it's reasonable that we can make 5.6 with this too (PM willing), so I'll go ahead and propose it for that. Created attachment 422625 [details]
patchset1 -- 4 patches from upstream that should correct problem
The only problem with this patch is that it might mean that we occasionally lose attribute updates that are valid. There are a few other patches that went in after this one that help recover some of that. This set reflects the backport of those. I think we'll want all 4.
A nice bonus to this is that it seems to reduce some of the false cache invalidations we see without this patch. Whenever these calls get reordered like this, we not only get the metadata wrong, but we have to invalidate the cache for the inode. By discarding stale updates like this, we minimize this effect to a large degree. That helps performance.
*** Bug 438676 has been marked as a duplicate of this bug. *** This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. *** Bug 245024 has been marked as a duplicate of this bug. *** in kernel-2.6.18-208.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html |